[KS] unicode
Frank Hoffmann
hoffmann at koreanstudies.com
Fri May 29 19:33:51 EDT 2015
Getting back to your question below, Andrew.
That's indeed interesting.
Your observation seems correct.
It seems that Korean and Chinese fonts do have these double entries ...
I take again the example you introduced first:
遼 --> yo --> \uf9c3 --> 63939 --> 遼
遼 --> ryo --> \u907c --> 36988 --> 遼
If you look at this in email you likely see the code, if then look at
this message at the Website Archives ...
http://koreanstudies.com/pipermail/koreanstudies_koreanstudies.com/2015-May/date.html
... you will see that the last code, the HTML, shows as Chinese
character (as it should be).
Anyway, you can download the DEMO version of the font editing program
"TypeTool" at
http://old.fontlab.com/font-editor/typetool/
and open (a) a Chinese or Korea Unicode font such as "Batang" and (b) a
Japanese font such as MS 明朝 or Osaka. (IMPORTANT: If you play with
this, put a COPY of the font in some text folder, so NOT play with that
in your system's actual font folder!) The program has a "Find"
function, just like any text editor, and you can search by code or name
etc.
The Korean-made and Chinese-made Unicode fonts do have dual entries in
cases such as the above one ... you will see that these characters are
actually there twice when you search for them. But the Japanese font
has only ONE entry!
Another, real simple test to show the same:
Copy/paste the above yo and ryo characters into MS Word, then format
them using a JAPANESE font, and afterwards copy them from your MS Word
document to either the coding tool at
http://koreanstudies.com/unicode-converter.html or simply into Google
Translate (https://translate.google.com/) and set the language to
KOREAN,and you see that it's now the same, identical character (in a
technical sense). Do the same test with a CHINESE font (in your MS
Word) and you will see that afterwards it has still preserved its
original encoding.
The answer why the Japanese do not go for this kind of reversibility --
hmmmm... that might be explained somewhere in the thousands of pages at
the Unicode Consortium. My first guess would be that they just have too
many different pronunciations for so many (of the same) characters that
this would mean a font with say 60,000 characters would then have to
hold possibly 250,000 characters. Can't think of another explanation
now.
Best,
Frank
On Fri, 29 May 2015 21:26:17 +0900, Andrew wrote:
> Dear Frank,
>
> Thanks for the explanation and illuminating converter tool.
>
> It seems in Japanese the input system must work differently because,
> e.g. koto 事, goto 事 and ji 事 all produce the same 事 \u4e8b result.
>
> Although this may sound like I'm going back on my previous argument,
> if it is not for a technical reason (as the Japanese example seems to
> demonstrate), it would be disturbing if Unicode and digital
> Sino-Korean civilization starts treating 요 遼 and 료 遼 as different
> characters. That is,as well as being inconvenient for database
> searches.
>
> sincerely
> Andrew
>
--------------------------------------
Frank Hoffmann
http://koreanstudies.com
More information about the Koreanstudies
mailing list