[KS] old MS Word / Korean font question

Andrew McCullough naesung at gmail.com
Wed Nov 20 12:52:00 EST 2013


Thank you Frank for posting. Actually I like your solution a good deal
more! One suggestion: use <html><body><pre> ... </pre></body></html>
instead of just the html tag and it will preserve the newlines.

Best wishes,

Andrew McCullough


On Wed, Nov 20, 2013 at 10:46 AM, Frank Hoffmann <hoffmann at koreanstudies.com
> wrote:

> Many thanks to Andrew McCullough, who send me a solution in a private
> mail!!
>
> Since he did not post it here, let me at least put it down here, just
> in case someone else might later look for such a solution and finds the
> messages in the list archives.
>
> The issue that causes the problem of not being able to read Korean
> (both, Han'gŭl and Hanja) formatted text in MS Word versions of the
> 1990s (on pre-Intel Macs, classic Mac OS 7/8/9), Andrew pointed out, is
> that "EUC-KR" (known as 완성 in Korea) encoding is being used. That is
> still being popular in Korea now, and I still see Websites that use it.
> All the standard Web browsers have the option to use "EUC-KR"--but MS
> Word does not anymore.
>
> Andrew's solution--on a PC, under Windows:
> Download the free program Notepad++ (http://notepad-plus-plus.org/), go
> to the "Encoding" menu and select "Character Sets"->"Korean"->"EUC-KR".
> I tried that, and it works! I can now copy/paste that text into e.g. a
> MS Word .docx file. That was PLAIN TEXT, that means no BOLD or ITALICS,
> etc. Also, since "EUC-KR" does not have accented characters, ü, é,
> etc. will have disappeared. Still, I can now have a regular formatted
> text (old MS Word to new MS Word) where only Korean is missing, and
> this newly created file, where all the Han'gŭl and Hanja is there (but
> some accented characters and Italics etc. are missing)--far from
> perfect, but still workable.
>
> On the Mac the same can be done with programs like "BBEdit"
> (commercial) and jEdit (free -- Nick Spencer mention that before).
> However, the Windows Notepad++ program does it more seamlessly (I save
> myself the details here).
>
> Alternatively, if one is ONLY looking for some Hanja in old texts, then
> this can be done in a few seconds:
> Make a COPY of the old MS Word text, rename the .doc type to .html.
> Then open that file with a PLAIN TEXT editor and add on top:
>  <html>
>  <body>
> And add at the very end:
>  </body>
>  </html>
> You can then look at it using any Web browser -- "File" --> "Open File"
> and then choose "EUC-KR" as the text encding. HOWEVER, that does not
> get you any line breaks, shows the text as an endless line.
>
> FURTHER EXPLANATION:
> The main reason that the problem occurs seems that with Unicode a
> document can now be encoded in one single encoding--while before, in a
> mixed text, you would encode Han'gŭl/Hanja in EUC-KR and then anything
> else in other encodings. Now you just use Unicode (UTF-8) for all
> fonts. Because of that Microsoft has now left out the "EUC-KR" code
> page from MS Word (not sure since which version). You can actually see
> this if you open the TEST document I uploaded (from 1995) and then go
> to the "Save as.." dialog in MS Word and then choose "PLAIN TEXT .txt"
> as the format. When you click the "OK" or "Save" button, you see the
> below code saving options, and EUC-KR is missing! That also means it
> won't anymore auto-convert such old texts to Unicode.
>
>
>
> Best,
> Frank
>
>
>
>
> --------------------------------------
> Frank Hoffmann
> http://koreanstudies.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://koreanstudies.com/pipermail/koreanstudies_koreanstudies.com/attachments/20131120/c760a0cf/attachment.html>


More information about the Koreanstudies mailing list