Sunday, 22 October 2006

Manchu Letter LHA

I have been trying to learn, in an on-and-off fashion, literary Mongolian for several years now, and I had to learn Manchu (a much more pleasing language than Mongolian to my mind) when, many years ago, I was researching the Manchu translation of the great Chinese historical novel, Sanguo Yanyi 三國演義 "Romance of the Three Kingdoms" (Ilan Guran i Bithe ᡳᠯᠠᠨ ᡤᡠᡵᡠᠨ ᡳ ᠪᡳᡨᡥᡝ in Manchu), so it was inevitable that I would eventually get round to discussing the Mongolian and Manchu scripts, which in Unicode are unified (together with the "Todo" reformation of the Mongolian script and Sibe extensions for Manchu) as a single "Mongolian" script, although their user communities view Mongolian and Manchu as distinct scripts in their own right.

I have nothing against the unification of Mongolian and Manchu at the character encoding level, but the Byzantine complexity of the Mongolian encoding model that was chosen has in my opinion severly hindered the development of fonts and software support for the Mongolian and Manchu scripts. Although Mongolian has been encoded in Unicode since version 3.0 (1999), up until now there has been no realistic support for Mongolian from any of the major vendors, mainly because the rules defining Mongolian shaping behaviour have never been fully and openly defined. With Vista, for the first time, there will be support for Mongolian, including an almost-working font ("Mongolian Baiti"), but the shaping behaviour implemented by Microsoft has been largely based upon a private and undocumented interpretation of the rules for shaping behaviour, which do not always accord with the definition of Standardized Variants in the Unicode Standard (i.e. the font is not conformant to the Unicode Standard). To be fair to Microsoft, they had very little choice if they were to provide some sort of support for Mongolian, given that nobody seemed willing to do the work necessary to define the finer details of Mongolian shaping behaviour.

No doubt I will be returning to the problems of Mongolian shaping behaviour at a future date, but in the meantime if you are interested, do buy a copy of the new Unicode 5.0 book, and take a read of the section on Mongolian (13.2), which has been completely rewritten by me, and is hopefully an improvement on the previous text -- you will notice that we are still hoping to eventually get out a Unicode Technical Report documenting exactly how Mongolian shaping behaviour should work, but it may still be a while yet. And if you do buy the book, don't forget to also have a read of the section on Phags-pa (10.3) which was written by me, and the section on Yi (12.6) which has been thoroughly revised by me for the new edition.

Anyway, today I'm going to discuss the first and only addition to have been proposed for the Mongolian block since it was introduced seven years ago, MONGOLIAN LETTER MANCHU ALI GALI LHA, which is a character required for representing Tibetan LH (as in "Lhasa") in the Manchu script ("ali gali" is a Mongolian term used to refer to special letters that are used for representing Sanskrit and other foreign languages). What is interesting is why this letter was missed from the original repertoire of characters included in the Mongolian block. Well, the main source for "ali gali" letters was, I think, Tongwen Yuntong 同文韻統, a work on the Chinese transcription of Sanskrit and Tibetan that was first published by imperial order in 1749, and later reissued in a much expanded edition. The original 1749 edition does not show Tibetan LHA, but the later edition (I don't know the date) has this entry on Tibetan LHA :

This shows the syllable LHA written, from top to bottom, as :

  • Tibetan ལྷ་
  • Manchu ᠯᡥᠠ
  • Mongolian ᠯᠠᠾᠠ᠋ (!)
  • Chinese 拉

In Mongolian LH is written as a ligature of the letters LA (U+182F ) and HA (U+183E ), although here it is a bit weird as it seems to be written with an extra tooth (as LAHA rather than LHA). From an encoding point of view, it may be noted that the Mongolian LH ligature is encoded as a distinct letter (U+1840 ), which is probably unnecessary, and opens up the possibility of multiple spellings for LHA (either <1840 1820> ᡀᠠ᠋ or <182F 183E 1820> ᠯᠾᠠ᠋).

As with Mongolian, the Manchu LH here is a ligature of the letters LA (U+182F ) and HA (U+1865 ). Thus, in Tongwen Yuntong Manchu LHA is not written using a special letter, which I think is the reason why no Manchu letter LHA was encoded originally. However, in other Qing dynasty texts Tibetan LHA is not represented as a ligature of LA and HA, but by means of a special letter created by adding a circle diacritic to the right of the letter LA. For example, in the imperial vocabulary in five scripts (Manchu, Tibetan, Mongolian, Uighur and Chinese), Wuti Qingwen Jian 五體清文鑒, the special letter LHA is used for the Manchu transliteration of Tibetan words, as can be seen in this example showing the Tibetan words lha dril ལྷ་དྲིལ་ "spirit bell" and lha rnga ལྷ་རྔ་ "spirit drum" :

Wuti Qingwen Jian 五體清文鑒 (Beijing: Minzu Chubanshe, 1957) p.662

Notice how in the example from Tongwen Yuntong the syllable LHA is written as the sequence l (the head), h (two teeth and a circle diacritic) and a (the tail), whereas here it is written as the sequence lh (the head, with a circle diacritic on the stem) and a (the tail).

Another example of this special letter can be seen in this Tibetan Buddhist text entitled "Praises to the Green Saviouress [Tara]", which is written in Chinese and Manchu transliteration :

At the top of the last line of the page (i.e. the rightmost line) the Tibetan phrase lha dang lha min ལྷ་དང་ལྷ་མིན་ "gods and demi-gods" is written in Manchu and Chinese transliteration. As in Wuti Qingwen Jian, the syllable LHA is represented by the addition of a circle diacritic next to the letter LA. In fact, the circle diacritic is used productively to generate aspirated letters in Manchu, for example in U+189A MONGOLIAN LETTER MANCHU ALI GALI GHA , U+189D MONGOLIAN LETTER MANCHU ALI GALI JHA , U+189F MONGOLIAN LETTER MANCHU ALI GALI DDHA , U+18A1 MONGOLIAN LETTER MANCHU ALI GALI DHA and U+18A8 MONGOLIAN LETTER MANCHU ALI GALI BHA . This diacritic circle could have been encoded separately, and these letters represented as combining sequences, but the circle diacritic wasn't encoded and these letters are encoded as precomposed letters, so it is necessary to encode a new character to represent Manchu LHA (see N3041). This new letter will be making its début as U+18AA in Unicode 5.1.


Anonymous said...

Thanks much for providing this report on the current status of representing Mongolian/Manchu in Windows! Tom Gewecke

John Cowan said...

Your text in TUS5.0 is indeed much improved. Tell me, now that it's too late: do you think it was a mistake to code glyphically identical letters such as o, u using different codes for the underlying different letters? We do not, after all, code a dozen versions of U+0061 for all the a-sounds in various Latin-script languages.

Andrew West said...

Your text in TUS5.0 is indeed much improved.

Thanks. As mentioned on page 450 there will (eventually, in the distant future) be a techical report that documents the precise shaping behaviour of the Mongolian script, which should allow standard implementations of Unicode Mongolian to be developed. But as an interim measure TUS now at least clearly indicates what issues are involved.

Tell me, now that it's too late: do you think it was a mistake to code glyphically identical letters such as o, u using different codes for the underlying different letters? We do not, after all, code a dozen versions of U+0061 for all the a-sounds in various Latin-script languages.

Emphatically, YES, it was very wrong in my opinion. Not only is the encoding of different phonetic values of the same abstract character as separate characters inconsistent with the Unicode character encoding philosophy, but it makes life very problematic for users in some situations, as you often only know what Unicode characters to use to represent a particular Mongolian word if you already know how to read it, which you may not if, for example, you are only learning Mongolian. Another example would be a scholar, who is proficient in Mongolian, trying to transcribe an incomplete manuscript text into Unicode -- when he encounters an incomplete or damaged word that cannot be fully recognised he may have to make an arbitrary decision as whether the vowels are o/ö or u/ü; and then later someone looks at the electronic text and claims that Professor X has read the word in question as, say, "ord[...]" rather than "urt[...]", and what was originally an arbitrary guess enforced by the encoding model becomes textual evidence.

As far as I am aware, before the 20th century, Mongolian grammarians did not consider o/ö or u/ü to be separate letters, and it was only during the past century when they came into contact with Western languages that some (but not all) Mongolians began to believe that if Western scholars transcribed Mongolian using the four vowels o/ö/u/ü then the Mongolian script must indeed have four corresponding letters. I think this is a misguided view, but one that was strongly held by some of those responsible for the Mongolian encoding model, and it may have been difficult or impossible to get Chinese and Mongolian agreement on a Mongolian encoding model that did not distinguish the phonetic values of letters. It took a long time and a lot of hard work to persuade China (actually one influential professor of Mongolian) that the Mongolian phonetic encoding model should not also be applied to Phags-pa.