In a probably futile effort to stave off Alzheimers by torturing my brain, I have been trying for a while now to learn to read Chinese. Chinese is crazy difficult, for a number of reasons that I won’t go into here. I don’t have the time to devote to it that it would take to become actually fluent, which as best I can tell would require dying and being reincarnated as a Chinese person plus about 20 years of full time study. So my goal is more modest, just reasonable reading comprehension, mainly for reading patents and doing patent-related text mining. I try to spend half hour or so a day reviewing vocabulary and reading things, to reassure myself that I’m still dumber than a Chinese ten year old.

Trying to learn to read Chinese might seem like a waste of time — why not just use Google translate? It turns out that for most anything of adult reading level and complexity, the output of Google translate for Chinese to English is essentially incomprensible gibberish. Machine translation between Chinese and English is a seriously non-trivial undertaking because the two languages are so different on so many dimensions. Personally, I’m skeptical that the current statistically based approach to machine translation can be made to work here; More »