[KS] unicode

Fri May 29 19:33:51 EDT 2015

Getting back to your question below, Andrew.

That's indeed interesting.
Your observation seems correct.
It seems that Korean and Chinese fonts do have these double entries ... 
I take again the example you introduced first:

遼 --> yo  --> \uf9c3 --> 63939 --> 遼
遼 --> ryo --> \u907c --> 36988 --> 遼

If you look at this in email you likely see the code, if then look at 
this message at the Website Archives ...
http://koreanstudies.com/pipermail/koreanstudies_koreanstudies.com/2015-May/date.html
... you will see that the last code, the HTML, shows as Chinese 
character (as it should be).

Anyway, you can download the DEMO version of the font editing program 
"TypeTool" at 
http://old.fontlab.com/font-editor/typetool/
and open (a) a Chinese or Korea Unicode font such as "Batang" and (b) a 
Japanese font such as MS 明朝 or Osaka. (IMPORTANT: If you play with 
this, put a COPY of the font in some text folder, so NOT play with that 
in your system's actual font folder!) The program has a "Find" 
function, just like any text editor, and you can search by code or name 
etc. 

The Korean-made and Chinese-made Unicode fonts do have dual entries in 
cases such as the above one ... you will see that these characters are 
actually there twice when you search for them. But the Japanese font 
has only ONE entry!

Another, real simple test to show the same:
Copy/paste the above yo and ryo characters into MS Word, then format 
them using a JAPANESE font, and afterwards copy them from your MS Word 
document to either the coding tool at 
http://koreanstudies.com/unicode-converter.html or simply into Google 
Translate (https://translate.google.com/) and set the language to 
KOREAN,and you see that it's now the same, identical character (in a 
technical sense). Do the same test with a CHINESE font (in your MS 
Word) and you will see that afterwards it has still preserved its 
original encoding.

The answer why the Japanese do not go for this kind of reversibility -- 
hmmmm... that might be explained somewhere in the thousands of pages at 
the Unicode Consortium. My first guess would be that they just have too 
many different pronunciations for so many (of the same) characters that 
this would mean a font with say 60,000 characters would then have to 
hold possibly 250,000 characters. Can't think of another explanation 
now.

Best,
Frank

On Fri, 29 May 2015 21:26:17 +0900, Andrew wrote:
> Dear Frank,
> 
> Thanks for the explanation and illuminating converter tool.
> 
> It seems in Japanese the input system must work differently because, 
> e.g. koto 事, goto 事 and ji 事 all produce the same 事 \u4e8b result.
> 
> Although this may sound like I'm going back on my previous argument, 
> if it is not for a technical reason (as the Japanese example seems to 
> demonstrate), it would be disturbing if Unicode and digital 
> Sino-Korean civilization starts treating 요 遼 and 료 遼 as different 
> characters. That is,as well as being inconvenient for database 
> searches.
> 
> sincerely
> Andrew
> 

--------------------------------------
Frank Hoffmann
http://koreanstudies.com