kindle 3 and chinese text

Nov 09, 2010 22:50

A few weeks ago I started a beginner level Chinese (Mandarin) evening class at the local community college. We're about four weeks in and I'm really enjoying it, will definitely do more.

I thought it would be an interesting project to load a Chinese-English dictionary onto my Kindle for reference. I've already played with kindlegen, which takes a ( Read more... )

Leave a comment

Comments 9

pne November 9 2010, 10:31:35 UTC
a way to load different font files onto the device. I didn't try doing that, becuase I thought there should be a better solution. After all, Chinese text is displayed correctly in the Kindle web browser!

I wonder whether the reason is that the Kindle is not displaying Chinese text, but rather Japanese text!

That would explain why characters that are used in Japanese (such as 他) would display but but characters that aren't used in Japanese (such as 她) would not.

And might explain why setting the locale to P.R. China would work: it might select a different default font, one with coverage of Simplified Chinese characters rather than Japanese ones.

Reply

ghewgill November 9 2010, 18:09:03 UTC
That's an interesting theory. The Japanese class I took years ago didn't introduce any kanji, just hiragana and some katakana. As a result, I know next to nothing about Japanese kanji and didn't realise this difference!

Reply


lithiana November 9 2010, 10:48:35 UTC
My understanding is that Unicode is not unambiguous for CJK characters -- it has a single codepoint that refers to a character written slightly different in each language, and the application is supposed to detect which character to display based on the user or document's indicated locale. However, I don't see why that would result in a character not displaying at all...

Reply

pne November 9 2010, 12:31:18 UTC
My understanding is that Unicode is not unambiguous for CJK characters -- it has a single codepoint that refers to a character written slightly different in each language, and the application is supposed to detect which character to display based on the user or document's indicated locale.

That's true, too, due to "Han unification".

Most shapes are identical or at least very similar, but in some cases, there are noticeable differences. (Perhaps a comparison in Latin script terms, which allows for different character shapes, too, is hand-written vs. printed lower-case "a" or "g".)

However, I don't see why that would result in a character not displaying at all...

Because the font it uses doesn't cover every single Unicode character but only the ones it thinks will be needed?

If that were the reason, though, then it would mean that Amazon-provided eBooks (where the characters show up) must use a different font than self-made ones or plain-text UTF8 files (where certain characters don't).

Reply

ghewgill November 9 2010, 18:18:15 UTC
I have a vague understanding of Han unification, the 28 MB CJK Unified Ideographs PDF at unicode.org shows six columns with slightly different renderings for each code point (and many places where a code point might be shown for just one or a few languages).

Perhaps the Kindle prefers to show characters in a consistent style based on the language setting, rather than mixing up different styles because not all characters are available in each style. Since it's a reading device, I can appreciate that.

Reply

goulo November 10 2010, 09:50:07 UTC
Surely it's better to see some of the characters in a different style/font/whatever than to see an empty box...? (Like sometimes I visit a poorly designed webpage that has text in one font, except all the Esperanto characters are in some other font. Ugly, but far better than not even showing the Esperanto characters.)

I don't know much about this particular issue. It sounds like a disaster, if there are Unicode characters that don't have a single meaning but depend on having barely documented settings set correctly in one's device. Was there really not a better way for Unicode to be designed? I thought the whole point of it was to avoid this code of alternate character set problem...?

Reply


Leave a comment

Up