[KS] old MS Word / Korean font question
Frank Hoffmann
hoffmann at koreanstudies.com
Wed Nov 20 10:46:22 EST 2013
Many thanks to Andrew McCullough, who send me a solution in a private
mail!!
Since he did not post it here, let me at least put it down here, just
in case someone else might later look for such a solution and finds the
messages in the list archives.
The issue that causes the problem of not being able to read Korean
(both, Han'gŭl and Hanja) formatted text in MS Word versions of the
1990s (on pre-Intel Macs, classic Mac OS 7/8/9), Andrew pointed out, is
that "EUC-KR" (known as 완성 in Korea) encoding is being used. That is
still being popular in Korea now, and I still see Websites that use it.
All the standard Web browsers have the option to use "EUC-KR"--but MS
Word does not anymore.
Andrew's solution--on a PC, under Windows:
Download the free program Notepad++ (http://notepad-plus-plus.org/), go
to the "Encoding" menu and select "Character Sets"->"Korean"->"EUC-KR".
I tried that, and it works! I can now copy/paste that text into e.g. a
MS Word .docx file. That was PLAIN TEXT, that means no BOLD or ITALICS,
etc. Also, since "EUC-KR" does not have accented characters, ü, é,
etc. will have disappeared. Still, I can now have a regular formatted
text (old MS Word to new MS Word) where only Korean is missing, and
this newly created file, where all the Han'gŭl and Hanja is there (but
some accented characters and Italics etc. are missing)--far from
perfect, but still workable.
On the Mac the same can be done with programs like "BBEdit"
(commercial) and jEdit (free -- Nick Spencer mention that before).
However, the Windows Notepad++ program does it more seamlessly (I save
myself the details here).
Alternatively, if one is ONLY looking for some Hanja in old texts, then
this can be done in a few seconds:
Make a COPY of the old MS Word text, rename the .doc type to .html.
Then open that file with a PLAIN TEXT editor and add on top:
<html>
<body>
And add at the very end:
</body>
</html>
You can then look at it using any Web browser -- "File" --> "Open File"
and then choose "EUC-KR" as the text encding. HOWEVER, that does not
get you any line breaks, shows the text as an endless line.
FURTHER EXPLANATION:
The main reason that the problem occurs seems that with Unicode a
document can now be encoded in one single encoding--while before, in a
mixed text, you would encode Han'gŭl/Hanja in EUC-KR and then anything
else in other encodings. Now you just use Unicode (UTF-8) for all
fonts. Because of that Microsoft has now left out the "EUC-KR" code
page from MS Word (not sure since which version). You can actually see
this if you open the TEST document I uploaded (from 1995) and then go
to the "Save as.." dialog in MS Word and then choose "PLAIN TEXT .txt"
as the format. When you click the "OK" or "Save" button, you see the
below code saving options, and EUC-KR is missing! That also means it
won't anymore auto-convert such old texts to Unicode.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SaveAs1.jpg
Type: image/jpeg
Size: 106667 bytes
Desc: not available
URL: <http://koreanstudies.com/pipermail/koreanstudies_koreanstudies.com/attachments/20131120/02a70dbe/attachment.jpg>
-------------- next part --------------
Best,
Frank
--------------------------------------
Frank Hoffmann
http://koreanstudies.com
More information about the Koreanstudies
mailing list