[KS] formal question - font issues // tool
hoffmann at koreanstudies.com
Thu May 28 03:18:17 EDT 2015
Hi Andrew, and All:
For the children .... I put up a little converter tool as a
demonstration for EXACTLY what the code conversion does (will be
leaving it there for good):
Should be self-explaining, I hope.
On Tue, 26 May 2015 15:11:54 -0700, Frank Hoffmann wrote:
> Andrew, wonderfully argued -- much appreciated.
> To the below issue (quote at end of this message):
> That's easily explained.
> As mentioned, Unicode works like a non-relational database. That means,
> every child sits on his same chair and desk every day, nobody moves.
> Politically very uncool, indeed. So you have one-to-one links from
> 요 --> 遼
> 료 --> 遼
> First, on your keyboard (or that touchy touch thingy), you enter ㅇ
> followed by ㅛ, OR ㄹ followed by ㅛ. While you type this you already
> see each single letter (as in my line right here). Yet, once you typed
> both and hit Enter or type on, the FONT that you use does NOT "assemble
> those two letters" -- instead the syllables 요 and 료 are present as a
> picture, as an image in that font. (The same applies for ã, é, ŏ, Ü,
> and other component characters in other scripts -- depending on your
> keyboard setting you have to hit two or three keys to produce those,
> but they are as a single image in whatever font you use (at least in
> Unicode fonts -- some very outdated fonts work differently).
> So, you have ONE "symbol" (image, picture, whatever you like to call
> it) for 요 and another for 료. The code tables for KOREAN encoding
> within that Unicode font you use then link from 요 to maybe 35 or 40
> Chinese characters -- and these are all specific "Sino-Korean" ones
> (not just ANY set of Chinese characters), and the 료 to maybe 10 or 12
> ones -- some of these are in both groups, others not.
> The same Chinese character (e.g. the Sino-Korean version) is then twice
> present in the actual Unicode font you use -- see below, in the case of:
> 龜 귀 --> #63751
> 龜 균 --> #63752
> So, your computer prints what is at place #63751 OR #63752 -- which is
> in above example both times 龜. However, #63751 is the 龜 you generated
> typing 귀 first and #63752 is you generated typing 균. Two different
> characters from the point of view of your computer -- and computer
> knows best :)
> In short: if you "drop" one of the two characters what the computer
> system "sees" you drop it at is one of the two numbers, and
> "re-translating" that to Han'gŭl gets you the same Han'gŭl you once
> And, of course, if you use the new "Noto" fonts we discussed earlier
> (also see my note from yesterday), those that were done by Google, then
> ... then I suppose that copy of your last online Pizza order and the
> last ten Amazon orders might also be saved together with that character
> and handed on to the larger computer networks that then take better
> care of your needs, and also to ensure you do not use 료 all too often
> (what would that say about you?).
>> Completely separate point I've noticed about fonts:
>> Korean characters where the initial consonant alternates depending on
>> if it's at the start of a word or not, namely ㅇ /ㄹ seem to keep
>> this distinction even after being changed into hanja and this seems
>> to confuse some databases. e.g. 요동 遼東 / 료동 遼東 - China Text
>> Project is happy with both, but Scripta Sinica will only show results
>> for the second one.
> Frank Hoffmann
More information about the Koreanstudies