[KS] formal question - font issues // tool

Frank Hoffmann hoffmann at koreanstudies.com
Thu May 28 03:18:17 EDT 2015

Hi Andrew, and All:

For the children .... I put up a little converter tool as a 
demonstration for EXACTLY what the code conversion does (will be 
leaving it there for good):


Should be self-explaining, I hope. 


On Tue, 26 May 2015 15:11:54 -0700, Frank Hoffmann wrote:
> Andrew, wonderfully argued -- much appreciated. 
> To the below issue (quote at end of this message):
> That's easily explained.
> As mentioned, Unicode works like a non-relational database. That means, 
> every child sits on his same chair and desk every day, nobody moves. 
> Politically very uncool, indeed. So you have one-to-one links from 
>   요 --> 遼
>   료 --> 遼
> First, on your keyboard (or that touchy touch thingy), you enter ㅇ 
> followed by ㅛ, OR ㄹ followed by ㅛ. While you type this you already 
> see each single letter (as in my line right here). Yet, once you typed 
> both and hit Enter or type on, the FONT that you use does NOT "assemble 
> those two letters" -- instead the syllables 요 and 료 are present as a 
> picture, as an image in that font. (The same applies for ã, é, ŏ, Ü, 
> and other component characters in other scripts -- depending on your 
> keyboard setting you have to hit two or three keys to produce those, 
> but they are as a single image in whatever font you use (at least in 
> Unicode fonts -- some very outdated fonts work differently). 
> So, you have ONE "symbol" (image, picture, whatever you like to call 
> it) for 요 and another for 료. The code tables for KOREAN encoding 
> within that Unicode font you use then link from 요 to maybe 35 or 40 
> Chinese characters -- and these are all specific "Sino-Korean" ones 
> (not just ANY set of Chinese characters), and the 료 to maybe 10 or 12 
> ones -- some of these are in both groups, others not. 
> The same Chinese character (e.g. the Sino-Korean version) is then twice 
> present in the actual Unicode font you use -- see below, in the case of:
>   龜 귀  --> #63751
>   龜 균  --> #63752
> So, your computer prints what is at place #63751 OR #63752 -- which is 
> in above example both times 龜. However, #63751 is the 龜 you generated 
> typing 귀 first and #63752 is you generated typing 균. Two different 
> characters from the point of view of your computer -- and computer 
> knows best :)
> In short: if you "drop" one of the two characters what the computer 
> system "sees" you drop it at is one of the two numbers, and 
> "re-translating" that to Han'gŭl gets you the same Han'gŭl you once 
> entered. 
> And, of course, if you use the new "Noto" fonts we discussed earlier 
> (also see my note from yesterday), those that were done by Google, then 
> ... then I suppose that copy of your last online Pizza order and the 
> last ten Amazon orders might also be saved together with that character 
> and handed on to the larger computer networks that then take better 
> care of your needs, and also to ensure you do not use 료 all too often 
> (what would that say about you?).
>> Completely separate point I've noticed about fonts:
>> Korean characters where the initial consonant alternates depending on 
>> if it's at the start of a word or not, namely ㅇ /ㄹ  seem to keep 
>> this distinction even after being changed into hanja and this seems 
>> to confuse some databases. e.g. 요동 遼東 / 료동  遼東 - China Text 
>> Project is happy with both, but Scripta Sinica will only show results 
>> for the second one.
> Best,
> Frank
> ---------------------------------
> Frank Hoffmann
> http://koreanstudies.com

Frank Hoffmann

More information about the Koreanstudies mailing list