[KS] old MS Word / Korean font question

Frank Hoffmann hoffmann at koreanstudies.com
Wed Nov 20 10:46:22 EST 2013


Many thanks to Andrew McCullough, who send me a solution in a private 
mail!!

Since he did not post it here, let me at least put it down here, just 
in case someone else might later look for such a solution and finds the 
messages in the list archives.

The issue that causes the problem of not being able to read Korean 
(both, Han'gŭl and Hanja) formatted text in MS Word versions of the 
1990s (on pre-Intel Macs, classic Mac OS 7/8/9), Andrew pointed out, is 
that "EUC-KR" (known as 완성 in Korea) encoding is being used. That is 
still being popular in Korea now, and I still see Websites that use it. 
All the standard Web browsers have the option to use "EUC-KR"--but MS 
Word does not anymore.

Andrew's solution--on a PC, under Windows:
Download the free program Notepad++ (http://notepad-plus-plus.org/), go 
to the "Encoding" menu and select "Character Sets"->"Korean"->"EUC-KR". 
I tried that, and it works! I can now copy/paste that text into e.g. a 
MS Word .docx file. That was PLAIN TEXT, that means no BOLD or ITALICS, 
etc. Also, since "EUC-KR" does not have accented characters, ü, é, 
etc. will have disappeared. Still, I can now have a regular formatted 
text (old MS Word to new MS Word) where only Korean is missing, and 
this newly created file, where all the Han'gŭl and Hanja is there (but 
some accented characters and Italics etc. are missing)--far from 
perfect, but still workable.

On the Mac the same can be done with programs like "BBEdit" 
(commercial) and jEdit (free -- Nick Spencer mention that before). 
However, the Windows Notepad++ program does it more seamlessly (I save 
myself the details here).

Alternatively, if one is ONLY looking for some Hanja in old texts, then 
this can be done in a few seconds:
Make a COPY of the old MS Word text, rename the .doc type to .html. 
Then open that file with a PLAIN TEXT editor and add on top:
 <html>
 <body>
And add at the very end:
 </body>
 </html>
You can then look at it using any Web browser -- "File" --> "Open File" 
and then choose "EUC-KR" as the text encding. HOWEVER, that does not 
get you any line breaks, shows the text as an endless line.

FURTHER EXPLANATION:
The main reason that the problem occurs seems that with Unicode a 
document can now be encoded in one single encoding--while before, in a 
mixed text, you would encode Han'gŭl/Hanja in EUC-KR and then anything 
else in other encodings. Now you just use Unicode (UTF-8) for all 
fonts. Because of that Microsoft has now left out the "EUC-KR" code 
page from MS Word (not sure since which version). You can actually see 
this if you open the TEST document I uploaded (from 1995) and then go 
to the "Save as.." dialog in MS Word and then choose "PLAIN TEXT .txt" 
as the format. When you click the "OK" or "Save" button, you see the 
below code saving options, and EUC-KR is missing! That also means it 
won't anymore auto-convert such old texts to Unicode. 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: SaveAs1.jpg
Type: image/jpeg
Size: 106667 bytes
Desc: not available
URL: <http://koreanstudies.com/pipermail/koreanstudies_koreanstudies.com/attachments/20131120/02a70dbe/attachment.jpg>
-------------- next part --------------


Best,
Frank




--------------------------------------
Frank Hoffmann
http://koreanstudies.com


More information about the Koreanstudies mailing list