Happy New Year, and stuff

January 4, 2011

Well, another year has come around, and with it, the New Year’s holiday here in Japan. Just got to enjoy a full week off of work, of which I’ve mostly spent just to unwind.

I was planning to do development during this time, but it didn’t really happen – ran into problems with gettextize and MinGW, and didn’t feel like digging too much into things. I’ll see about doing something later. In the near-worst case, I’ll end up recompiling gettext and friends via MinGW or even MSVC and will try again… or in the worst case, I’ll just do the i18n work on Linux and try to see if everything will work on Windows.

I also spent a little time to reflect on my career, and am thinking about what I may need to work on to make myself more marketable. I’ve made great progress in Python, but my C++ skills have been rusting, plus I’m hoping to spend some time writing straight C as well. Python’s great for my current job, but to keep my options more open I should likely spend more time in C land during my free time.

So, although I expect J-Ben work to continue, it may possibly slow down. I’m targeting near-feature-parity with J-Ben 1 by July. (I’ll be dropping the very hokey kanji search function in favor of something easier and more intuitive.) On the side, expect some small little projects. One I’m considering is a rewrite of the JBLite library in C, to make it even more widely usable, although I think the return is rather low for doing so other than C hacking experience. Another is a nonogram solver using an algorithm similar to Peter Norvig’s sudoku solver. Anyway, I’m still brainstorming these ideas. We’ll see what happens.

So, that aside, I guess the only thing left to be said is: Happy New Year!

- Paul

gettext, Python, and Windows: a simple demo

December 23, 2010

Last time, I discussed some of the issues about localizing a Python app in Windows. This time, I’m providing a simple demo app which shows how localization can be accomplished.

To avoid an overly verbose explanation, I’ve set up a GitHub repo with the code, a sample .po file, plus some scripts for easily creating .po/.mo files in the correct hierarchies. (Consider its contents as public domain.)

Basically: MinGW/Msys provide the necessary utilities for extracting strings and compiling .mo files. If you’re in a non-English environment, things may act odd if LANG is not set appropriately, but other than that the tools work fine. So, it’s just a matter of a few minor tweaks, and maybe some scripts or Makefiles to automate things.

The demo app in the above repo contains a minimum of magic: it will check if gettext-compatible environment settings are set (LANG, LC_MESSAGES, etc.), and if not, will create a LANG value based on locale.getdefaultlocale().

Anyway, test output on my system (Japanese Windows with English set via Regional Settings dialog) looks like follows:

LANG/LC_* unspecified:

$ ./simple_test.py
Current locale:
LANGUAGE None
LC_ALL None
LC_MESSAGES None
LANG None

Adjusted locale:
LANGUAGE None
LC_ALL None
LC_MESSAGES None
LANG en_US

Hello world!

LANG=ja:

$ LANG=ja ./simple_test.py
Current locale:
LANGUAGE None
LC_ALL None
LC_MESSAGES None
LANG ja

こんにちは、世界!

And a final note: last time I made comments about encodings, but it seems that LANG values including encodings (“ja_JP.euc-jp”) don’t have any effect – at least in this demo and on Windows. Just the language/country portion is sufficient.

I18N with gettext and Python on Windows 7

December 23, 2010

The next major hurdle with J-Ben, which I have not solved yet, involves multi-language support using gettext.

In general, Python makes this easy.  Python supports gettext-style string substitution via the gettext module.  I’ve tested this, and it works great… on POSIX.

What about Windows?

The problem on Windows is that locale names are different than on POSIX.  What would be “ja_JP” on Ubuntu would translate to “Japanese_Japan.932″ on Windows.

>>> import locale
>>> locale.setlocale(locale.LC_ALL, "")
'Japanese_Japan.932'

This won’t fly with gettext, which expects the POSIX style.

Thankfully, Python does have a way to get the user’s default locale. Windows does allow us to get locales of the same style via the WinAPI GetLocaleInfo command. This is wrapped up and transparent to Python users: all we have to do is call locale.getdefaultlocale().

>>> locale.getdefaultlocale()
('ja_JP', 'cp932')

This locale tuple has the raw data we need. It might be better to get a “locale.encoding”-type string, which the (non-public!) locale._build_localename function provides:

>>> locale._build_localename(locale.getdefaultlocale())
'ja_JP.cp932'

Since this is a non-public function, I’ll paste the source code for it since the function is not promised to be in your Python environment.

(Source: Python 2.7.1 source code, Lib/locale.py.  License: Python Software Foundation License)

def _build_localename(localetuple):

    """ Builds a locale code from the given tuple (language code,
        encoding).

        No aliasing or normalizing takes place.

    """
    language, encoding = localetuple
    if language is None:
        language = 'C'
    if encoding is None:
        return language
    else:
        return language + '.' + encoding

Using one of the above methods should allow you to get a locale string which will make gettext happy. It may be as simple as setting the LC_MESSAGES environment variable if it isn’t externally set, and then maybe gettext will work. I haven’t tested this myself yet though.

(Tested with Python 2.7.1 x86 on Windows 7 Home Premium, Japanese version)

Kanji searching added to JBLite’s kd2 module

September 21, 2010

I’ve started to implement searching for kanji from the command line.  It’s not much so far, but it may be of mild interest.

Japanese search via kunyomi/nanori:

C:\code\projects\jblite>c:\python26\python -m jblite.kd2 data\kd2.sqlite3 ほとり
Searching for "%ほとり%", lang=None...
READINGS:
ID: 2298, literal: 畔
ID: 2396, literal: 瀕
ID: 4368, literal: 滸
ID: 4400, literal: 濆
ID: 5977, literal: 陲
ID: 8495, literal (repr): u'\u6c7b'
ID: 8562, literal (repr): u'\u6d98'
NANORI:
ID: 4, literal: 阿
MEANINGS:
No 'meaning' results found.
Result: [4, 2298, 2396, 4368, 4400, 5977, 8495, 8562]

French search:


C:\code\projects\jblite>c:\python26\python -m jblite.kd2 data\kd2.sqlite3 --lang=fr Asie
Searching for "%Asie%", lang='fr'...
READINGS:
No 'reading' results found.
NANORI:
No 'nanori' results found.
MEANINGS:
ID: 1, literal: 亜
Result: [1]

The values in “Result” are from the “id” column of the “character” table.  Basically, this is the ID needed to look up all data for a given character.  So, once a lookup-by-character-ID function is in place, it may be possible to print detailed character information to the command line.  (Unfortunately, due to encoding limitations, this is not recommended for Windows.)

Searching can no doubt be improved: kunyomi “-” and “.” markers in KANJIDIC are not yet handled, onyomi must be searched via katakana input, and western language lookup is case sensitive for non-ASCII characters (per Python’s sqlite3 defaults) .  However, it’s something which can be built on in the future.

Added KANJIDIC2 support to JBLite

September 20, 2010

Just finished version 0.3 of JBLite.  The big new feature: KANJIDIC2 support.

KANJIDIC2 support was present before, but it was based upon the external jbparse library.  Now, it’s converted very similarly to JMdict.

Now that both of these dictionaries are converted, perhaps I can hook them into J-Ben 2 and drop the jbparse library entirely…

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org