[KS] unicode

Frank Hoffmann hoffmann at koreanstudies.com
Sat May 30 01:28:10 EDT 2015

This is fun!

Professor Muller, a quick question then:

As Professor Cheong just wrote, the Korean variants (of characters that 
are pronounced in more than one way in Korean) are the ONLY ones that 
are encoded as such in Unicode, by "doubling" the glyphs and then 
putting these grouped together into a separate 'block' within each font 
(and then also assigning the code page accordingly). 

As I just now found, and I also just confirmed this in the CHINESE font 
as well (within Google's "Noto" font) that you introduced us to, these 
extra glyph blocks do physically exist in all three fonts: Korean, 
Japanese, and Chinese. But only the KOREAN and the CHINESE one do have 
the code pages that allow us users to access the various versions of 
these "double-pronunciation" characters. In the Japanese font they are 
there but cannot be accessed, as they are not assigned in the code page.

Through Professor Cheong's mail I now understand that the Japanese (who 
would, in my opinion, benefit the most from this) do not have such 
special glyph blocks for their Kanji with double/tripple/... readings 
--  no "CJK Compatibility Ideographs" block within Unicode -- and after 
he said so I now also clearly see that. For those who do not feel like 
opening a font themselves, please see attached screen shot of that "CJK 
Compatibility Ideographs" block from within one of the Noto fonts, so 
you get a clear idea what we talk about .... the fonts start with the 
ASCII, then all kind of other special symbols, then the very large 
block (maybe 40,000 or more?) of Chinese characters, and then -- you 
see it in that screen shot -- middle Korean letters, followed by the 
block of "CJK Compatibility Ideographs" (as you see, not too many in 
Hanmun. (See end of message!)

My question is this:  The Japanese (or better, the Japanese fonts) 
still have not assigned a code block for those ALREADY EXISTENT Korean 
special characters (to the entire "CJK Compatibility Ideographs" block, 
that is). Why is this, given the Chinese do that? There most obviously 
is no technical reason for not doing that, since it is already there.

The other, FAR MORE IMPORTANT question:
If this doubling of glyphs is something only the Koreans do (if it only 
found its way into Unicode as some sort of compromise, as we can see 
this as a historical decision based of political decisions of dealing 
with people who did not fully understand the system) -- then I wonder 
if there is another technical means (via the code pages) to do a 
complete reversal? I think there is no other way. If you enter, 
Andrew's example earlier, こと and ごと for 事 and both times get the 
same identical glyph assigned (for the same Kanji), with the same 
Unicode encoding, then one can obviously not go back to こと or ごと 
from 事. So, I do not understand yet why Unicode did not "go the Korean 
way" in this case?


-------------- next part --------------
A non-text attachment was scrubbed...
Name: CJK-Compatibility-Ideographs.jpg
Type: image/jpeg
Size: 489253 bytes
Desc: not available
URL: <http://koreanstudies.com/pipermail/koreanstudies_koreanstudies.com/attachments/20150529/fb251ce2/attachment.jpg>
-------------- next part --------------

Frank Hoffmann

More information about the Koreanstudies mailing list