[KS] unicode
Frank Hoffmann
hoffmann at koreanstudies.com
Sat May 30 17:41:45 EDT 2015
Otfried Cheong wrote:
> Have you really thought your proposal through?
No!
It's certainly more of a "thinking while doing" or "thinking how to
make the train run faster while running to still catch it."
I leave the 'proposals' for the wiser ones.
Many thanks for taking the time to reply in detail, also to Professor
Muller.
Let me emphasize that this is a discussion on a Korean studies list,
and really not, especially in this case, some issue where one interest
group or personality tries to push for something very specific. The
same applies to the Chinese character (regionalized vs. universal)
discussion this came out of. In the longer run getting input from
various sides is very educational, and SOME of it may then lead to
something else, as the outcome of such discussions generates almost
always to a better understanding of the situation and the mechanisms as
such. So, very basic stuff ... but I wanted to speak that out loud, so
it is not misunderstood as some sort of "my position" vs. "your
position" game. We are just all interested to learn more, and to
possibly see what we can do with that knowledge -- see what might be
possible.
Your explanations in your last posting now very clearly show me why
this double/tripple inclusion of Kanji was not considered.
Let us leave out CHINESE for now -- your argument that the Chinese do
not enter characters in computing systems by pronunciation is one I
have indeed not thought of, a simple oversight.
Your arguments why this was not done -- and why it is not considered
now -- for Japanese, though, seem ESSENTIALLY to be these:
- (1) Japanese native speakers have no need for that
- (2) people who would copy/paste characters, which they cannot
visually differentiate, but which are in fact encoded differently, then
accidentally create some bogus text (NOT visually, but in terms of
further computing)
- (3) it is against the "spirit" and the aims of Unicode to double and
tripple information (ideographs)
- (4) there are just too many characters, some have 5 readings, and
personal names even have an almost endless variety of readings (many
not even recorded in Japanese name dictionaries)
The same SHORTER:
(1) no interest or need
(2) confusion through copy/paste
(3) against aims of Unicode
(4) too many readings/variations
I would *reply* this:
(1) Interest and needs of people change by the minute. Had you told
anyone about something like 'Facebook' in 1985, I do not think anyone
would have thought that this would be on interest to anyone, if it
could be done. Technical possibilities generate interest and new needs.
(2) The copy/paste-generated confusion will have a self-educating
effect: those who do that for printed text (e-files that get printed)
will never know, and those who do it for e-interactions will figure out
very quickly what's going on and act accordingly. (That would
ESPECIALLY be the case with Japanese, as compared to Korean, where one
would only very rarely see this confusion.)
(3) Unicode is no religion. We do not need to care about Unicode and
its aims, *if* the rules or aims are in the way of something better. I
care about "getting things done" in a smooth, logical, democratic, and
practical way. If something makes sense, it makes sense, if not then
not. So, this is a computing issue, a logical issue that can be solved
by technical means, and it makes no sense to me to come with some sort
of rule set or even spirit here.
(4) The "too many readings" argument was, historically, a very strong
one until a few years ago -- because of the limited capabilities of the
'personal computers', of space limitations, of RAM, of processor speed.
At this point in time this argument seems null and void to me. That new
"Noto" fonts, just as an example, the "all-inclusive" version of it, is
115 MB large. So, maybe if we triple large numbers of characters for
the Kanji, it may then be 200 or 300 or 500 MB at most. That would not
be any problem anymore these days -- that's the size of a quarter of a
movie in digital format. -- The personal names, yes, you are certainly
right there. Maybe these get left out, as they are not really part of
the "language."
Possible advantages of doing this?
- There might be many more that would come up as a result of the
technical possibility that I cannot think of now -- but what comes to
mind first is certainly translation software: that now works with
dictionaries, same as for Chinese and Koreans.
I do not have any of these at hand for Japanese -- but they look just
like those for Korean:
(...)
규정화 規定化
규제력 規制力
규제법 規制法
규제책 規制策
규찰대 糾察隊
(...)
개교기념일 開校記念日
개발도상국 開發途上國
개인주의적 個人主義的
건축사학자 建築史學者
결혼기념일 結婚記念日
경제사학계 經濟史學界
경제사학자 經濟史學者
경험주의적 經驗主義的
(...)
On the one hand these are used for entering phrases -- typing more than
one syllable into your computing device to then get the Hanja. On the
other hand translation software has such phrase dictionary, so that the
single character gets being "evaluated" according to its "environment"
-- what word or phrase it is part of.
However, in the case of Japanese, exactly because the pronunciation of
a Kanji is often different depending on where it appears (and in what
word), this would be a huge advantage in how translation software could
work -- it would get immediately much better! (The same does not work
for Korean, exactly because there are not that many characters with
pronunciation variants -- so, the more irregular the script system, the
better.)
Again, there will likely be far more advantages that do not come to
mind now, or that will only be come up once this possibility exists.
Still just thinking out loud -- NOT making any proposals (other than
considering this).
Best,
Frank
--------------------------------------
Frank Hoffmann
http://koreanstudies.com
More information about the Koreanstudies
mailing list