[KS] unicode

Frank Hoffmann hoffmann at koreanstudies.com
Sat May 30 17:41:45 EDT 2015


Otfried Cheong wrote:

> Have you really thought your proposal through?

No! 
It's certainly more of a "thinking while doing" or "thinking how to 
make the train run faster while running to still catch it."
I leave the 'proposals' for the wiser ones.


Many thanks for taking the time to reply in detail, also to Professor 
Muller.

Let me emphasize that this is a discussion on a Korean studies list, 
and really not, especially in this case, some issue where one interest 
group or personality tries to push for something very specific. The 
same applies to the Chinese character (regionalized vs. universal) 
discussion this came out of. In the longer run getting input from 
various sides is very educational, and SOME of it may then lead to 
something else, as the outcome of such discussions generates almost 
always to a better understanding of the situation and the mechanisms as 
such. So, very basic stuff ... but I wanted to speak that out loud, so 
it is not misunderstood as some sort of "my position" vs. "your 
position" game. We are just all interested to learn more, and to 
possibly see what we can do with that knowledge -- see what might be 
possible.

Your explanations in your last posting now very clearly show me why 
this double/tripple inclusion of Kanji was not considered.
Let us leave out CHINESE for now -- your argument that the Chinese do 
not enter characters in computing systems by pronunciation is one I 
have indeed not thought of, a simple oversight. 

Your arguments why this was not done -- and why it is not considered 
now -- for Japanese, though, seem ESSENTIALLY to be these:
- (1) Japanese native speakers have no need for that
- (2) people who would copy/paste characters, which they cannot 
visually differentiate, but which are in fact encoded differently, then 
accidentally create some bogus text (NOT visually, but in terms of 
further computing)
- (3) it is against the "spirit" and the aims of Unicode to double and 
tripple information (ideographs) 
- (4) there are just too many characters, some have 5 readings, and 
personal names even have an almost endless variety of readings (many 
not even recorded in Japanese name dictionaries)

The same SHORTER:
  (1) no interest or need
  (2) confusion through copy/paste
  (3) against aims of Unicode
  (4) too many readings/variations

I would *reply* this:

  (1) Interest and needs of people change by the minute. Had you told 
anyone about something like 'Facebook' in 1985, I do not think anyone 
would have thought that this would be on interest to anyone, if it 
could be done. Technical possibilities generate interest and new needs.
  (2) The copy/paste-generated confusion will have a self-educating 
effect: those who do that for printed text (e-files that get printed) 
will never know, and those who do it for e-interactions will figure out 
very quickly what's going on and act accordingly. (That would 
ESPECIALLY be the case with Japanese, as compared to Korean, where one 
would only very rarely see this confusion.)
  (3) Unicode is no religion. We do not need to care about Unicode and 
its aims, *if* the rules or aims are in the way of something better. I 
care about "getting things done" in a smooth, logical, democratic, and 
practical way. If something makes sense, it makes sense, if not then 
not. So, this is a computing issue, a logical issue that can be solved 
by technical means, and it makes no sense to me to come with some sort 
of rule set or even spirit here.
  (4) The "too many readings" argument was, historically, a very strong 
one until a few years ago -- because of the limited capabilities of the 
'personal computers', of space limitations, of RAM, of processor speed. 
At this point in time this argument seems null and void to me. That new 
"Noto" fonts, just as an example, the "all-inclusive" version of it, is 
115 MB large. So, maybe if we triple large numbers of characters for 
the Kanji, it may then be 200 or 300 or 500 MB at most. That would not 
be any problem anymore these days -- that's the size of a quarter of a 
movie in digital format. -- The personal names, yes, you are certainly 
right there. Maybe these get left out, as they are not really part of 
the "language."

Possible advantages of doing this?
- There might be many more that would come up as a result of the 
technical possibility that I cannot think of now -- but what comes to 
mind first is certainly translation software: that now works with 
dictionaries, same as for Chinese and Koreans.
I do not have any of these at hand for Japanese -- but they look just 
like those for Korean:
(...)
규정화	規定化
규제력	規制力
규제법	規制法
규제책	規制策
규찰대	糾察隊
(...)
개교기념일	開校記念日
개발도상국	開發途上國
개인주의적	個人主義的
건축사학자	建築史學者
결혼기념일	結婚記念日
경제사학계	經濟史學界
경제사학자	經濟史學者
경험주의적	經驗主義的
(...)
On the one hand these are used for entering phrases -- typing more than 
one syllable into your computing device to then get the Hanja. On the 
other hand translation software has such phrase dictionary, so that the 
single character gets being "evaluated" according to its "environment" 
-- what word or phrase it is part of. 
However, in the case of Japanese, exactly because the pronunciation of 
a Kanji is often different depending on where it appears (and in what 
word), this would be a huge advantage in how translation software could 
work -- it would get immediately much better! (The same does not work 
for Korean, exactly because there are not that many characters with 
pronunciation variants -- so, the more irregular the script system, the 
better.)

Again, there will likely be far more advantages that do not come to 
mind now, or that will only be come up once this possibility exists. 

Still just thinking out loud -- NOT making any proposals (other than 
considering this).


Best,
Frank


--------------------------------------
Frank Hoffmann
http://koreanstudies.com


More information about the Koreanstudies mailing list