[KS] Unicode / breves !!!

Sun Feb 13 18:02:27 EST 2000

Dear all!

Please forgive me for yelling by placing three exclamation marks in the
'subject line'. I do so because we have discussed the transcription issue
here back and forth, and most list members may be tired of it. However, I
think the information I will be giving here is indeed new to most scholars
on this list, so you may at least want read the next two paragraphs.

Two of the major arguments made for the "need" to introduce a new
transcription (or transliteration) system for Korean were:
(1) Standard fonts used system-wide by most software programs do not
include breve characters needed for McCune-Reischauer.
(2) A one-to-one conversion between Korean script and Latin letters (ASCII)
is needed for easy international conversion between Korean and ASCII --
e.g., for purposes related to database management.

Both of these arguments have been attacked here before. But today I¹d like
to point out that with the new millenium both of these arguments are now
becoming obsolete, and that they will be completely meaningless by the end
of the year. This is related to the introduction of Unicode standards for
all computer operating systems. What is Unicode? Unicode (ISO 10646) will
make the chaos of mutually incompatible charsets superfluous because it
unifies a superset of all established charsets and is out to cover all the
world's languages. The Unicode standard is an ISO standard for 16-bit
universal worldwide character encoding that replaces individual script
systems' character encodings with one complete 16-bit character encoding
applicable worldwide to all characters in most world languages. Now there
is no need anymore to define script systems, because each character code
(or better glyph code) by itself determines which writing system the
character is part of. Therefore, in Unicode, there is no overlap between
Roman character codes and the codes of the symbols in other fonts (e.g.
Korean fonts). For example, the capital letter ³A² is encoded as ³0041² in
Unicode while the han¹gûl syllable ³han² is encoded as ³d55c². In a given
PC system you may have twohundred different fonts, but if you use a Unicode
savvy program and type the han¹gûl syllable ³han² into your keyboard, then
the program will translate this into ³d55c,² and ³d55c² is only available
in a Korean font, and so it will use the installed Korean font or give you
a a blank, but not garbage. In the past, we had different encodings for
Korean. For example, if a Mac user goes to the KOREAN Alta Vista search
engine site, he or she will not be able to read the Korean script of any of
the resulting search results when using a browser like Netscape 4.5 (Mac).
However, if the same user uses the new Unicode savvy Internet Explorer 5.0
(Mac) that will officially be released later this month, he or she will be
able to read these pages (technical speaking, as for the Web and HTML, this
is actually a retranslation of 16 bit Unicode into 8 bit UTF-8 format). --
Now, the good news is that the new Mac OS 8.5 and 9 fonts coming with this
new operating system are already Unicode encoded, and that they already
include the brève and macron characters we need for transcription of Korean
and Japanese. Furthermore, Microsoft¹s ³Office 2000² will also have Unicode
encoded fonts (and I guess Windows 2000 as well) which will include these
special characters. Further details below.

M a c i n t o s h :
Since the release of OS 8.5 Macintosh operating systems come with Unicode
encoded fonts -- Times, Arial, Courier, Palatino, etc.  Actually, I don¹t
know of a single Mac program that is Unicode savvy, but am sure that we
will see updates and new versions of major programs by major software
companies until year¹s end that will use Unicode. Because of this
transitional period all these new fonts include two keymapping code
resources (³cmap²) -- one Unicode table and one for the traditional Mac
keyboard encoding system. On a practical level that means that for the time
being we have no means to input the special characters above the 255 higher
ASCII level (were the breve and macron glyphs are located), although
Apple¹s fonts have now been upgraded from 226+32 glyphs (the standard for
TrueType fonts in the past) to more than 300 glyphs. Fonts such as Times or
Courier, for example, now include 378 glyphs (including non-visible
formatting control characters), and the new font Gadget even has 453
glyphs. In most of these new fonts we now also find the special characters
needed for our transcription purposes when using McCune-Reischauer and
Hepburn. You just aren¹t YET able to actually type these into your
wordprocessor -- unless I missed something here (don¹t have an OS 9
handbook). However, we are on the way, and we won¹t need to wait for too
long. The new Mac edition of Microsoft¹s Web browser Internet Explorer
(version 5.0) that will officially be released later this months (but is
already available in beta versions) is Unicode savvy and can reproduce 100%
of HTML 4.0 standards -- Netscape 4.7 can only reproduce a little abovce
60% of all characters in HTML 4.0. I believe that neither the o- and
u-macrons nor the o- and u-breves are in HTML 4, but IE 5.0 (Mac) can
nevertheless reproduce the o- and u-macrons correctly, but not yet the
breves.
Please visit my following Web page to get a *visual* impression of how this
works -- and more info:
http://www.fas.harvard.edu/~hoffmann/unicode.html

W i n d o w s :
Microsoft will release Windows 2000 this month. I am no IBM person and
don¹t know much about Windows 2000 -- but Microsoft also offers ³Office
2000² (not out yet, I believe). In Office 2000 all fonts are also Unicode
encoded, and I believe that this must be the case with Windows 2000 fonts
as well. Actually, Microsoft has gone a few steps further than Apple (but
Apple will certainly soon keep up). Office 2000 offers language support for
23 languages -- in one and the same software package. Together with Unicode
encoded fonts that come with the new Office 2000 come larger basic Roman
fonts (please allow me to use plain language here). There is a very
informative short overview article about the language support at
Microsoft¹s Web site:
http://msdn.microsoft.com/library/officedev/off2krk/85t2_3.htm
This article states that Office 2000 even has one general all-in-all
Unicode font that includes scripts for all 23 languages supported by the
package (called ³Arial Unicode MS²). Please also note that another font,
³Batang,² includes all glyphs for all European languages + Korean (sic!). I
am not sure how the input systems will work exactly in Windows 2000, but
please note that ONE font for all major scripts together with Unicode means
a very great advancement for database development (databases work with
plain unformatted script). Even though I haven¹t seen this myself, I can
say that this and other Microsoft fonts will also include our famous
transcription characters, since they are part of the standard set by
Unicode 2.0. And since these are Unicode encoded fonts that guy at the New
York Times who doesn¹t know a word of Korean and has never seen the o-breve
before will now (that is later this years, I guess) see it on his computer
monitor (together with original han¹gûl word, if you like to impress him)
when you as Korea specialist send him a book review via e-mail, or if he
surfes the Web ...  Okay, everyone got it ... we can keep our breves and
live happily ever after.

(2) Now point two -- the one-to-one conversion issue.
Knowing that in Unicode each syllable of the Korean language has it¹s code
(I mentioned the example ³d55c² for ³han²), we now have the recipe for a
rather simple conversion program -- this may already exist, no idea. All we
need to do is to make lists -- a list of all codes ending in a certain
letter (accoring to McCune-Reischauer), a list of all syllables beginning
with a certain letter, ... and then we need a parable: if list A (that ends
in, for example, ³n²) kisses list B (that begins with, for example, ³k²),
then this will result in ³n¹g² (and NOT in ³nk²) ... well, that would just
be part of the parable, but you get the idea. I think this is as simple as
it sounds. This is of course also possible without Unicode, but now there
are no problems anymore if people use different platforms (Windows, Mac,
...). Such a program could be integrated into a operating system, or it
could be an add-on that THE KOREAN GOVERNEMENT could be so friendly to
develop and let people allow to download for free.
On the other hand, I do still not believe that such kind of transcription
utility is that important. With Unicode Koreans anywhere in the world can
now (later this year) communicate via e-mail and Internet without the
problems that different operating systems created, and Korean will
virtually be at almost any computer with these new Unicode all-in-all world
font(s) and with the Korean Lang. Kit included in OS 9.

Sorry, this was somewhat lengthy.

Frank

       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Frank Hoffmann * 4903 Manitoba Dr.#202 * Alexandria, VA 22312 * USA
E-MAIL: hoffmann at fas.harvard.edu
W W W : http://www.fas.harvard.edu/~hoffmann/

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%