[KS] formal question - font issues

Frank Hoffmann hoffmann at koreanstudies.com
Tue May 26 18:11:54 EDT 2015

Andrew, wonderfully argued -- much appreciated. 

To the below issue (quote at end of this message):

That's easily explained.
As mentioned, Unicode works like a non-relational database. That means, 
every child sits on his same chair and desk every day, nobody moves. 
Politically very uncool, indeed. So you have one-to-one links from 
  요 --> 遼
  료 --> 遼
First, on your keyboard (or that touchy touch thingy), you enter ㅇ 
followed by ㅛ, OR ㄹ followed by ㅛ. While you type this you already 
see each single letter (as in my line right here). Yet, once you typed 
both and hit Enter or type on, the FONT that you use does NOT "assemble 
those two letters" -- instead the syllables 요 and 료 are present as a 
picture, as an image in that font. (The same applies for ã, é, ŏ, Ü, 
and other component characters in other scripts -- depending on your 
keyboard setting you have to hit two or three keys to produce those, 
but they are as a single image in whatever font you use (at least in 
Unicode fonts -- some very outdated fonts work differently). 
So, you have ONE "symbol" (image, picture, whatever you like to call 
it) for 요 and another for 료. The code tables for KOREAN encoding 
within that Unicode font you use then link from 요 to maybe 35 or 40 
Chinese characters -- and these are all specific "Sino-Korean" ones 
(not just ANY set of Chinese characters), and the 료 to maybe 10 or 12 
ones -- some of these are in both groups, others not. 

The same Chinese character (e.g. the Sino-Korean version) is then twice 
present in the actual Unicode font you use -- see below, in the case of:
  龜 귀  --> #63751
  龜 균  --> #63752

-------------- next part --------------
A non-text attachment was scrubbed...
Name: cjk.jpg
Type: image/jpeg
Size: 45898 bytes
Desc: not available
URL: <http://koreanstudies.com/pipermail/koreanstudies_koreanstudies.com/attachments/20150526/c93b7945/attachment.jpg>
-------------- next part --------------

So, your computer prints what is at place #63751 OR #63752 -- which is 
in above example both times 龜. However, #63751 is the 龜 you generated 
typing 귀 first and #63752 is you generated typing 균. Two different 
characters from the point of view of your computer -- and computer 
knows best :)

In short: if you "drop" one of the two characters what the computer 
system "sees" you drop it at is one of the two numbers, and 
"re-translating" that to Han'gŭl gets you the same Han'gŭl you once 

And, of course, if you use the new "Noto" fonts we discussed earlier 
(also see my note from yesterday), those that were done by Google, then 
... then I suppose that copy of your last online Pizza order and the 
last ten Amazon orders might also be saved together with that character 
and handed on to the larger computer networks that then take better 
care of your needs, and also to ensure you do not use 료 all too often 
(what would that say about you?).

> Completely separate point I've noticed about fonts:
> Korean characters where the initial consonant alternates depending on 
> if it's at the start of a word or not, namely ㅇ /ㄹ  seem to keep 
> this distinction even after being changed into hanja and this seems 
> to confuse some databases. e.g. 요동 遼東 / 료동  遼東 - China Text 
> Project is happy with both, but Scripta Sinica will only show results 
> for the second one.


Frank Hoffmann

More information about the Koreanstudies mailing list