Wednesday, 20 September 2006

Tibetan Shorthand Contractions

When discussing Balti extensions for Tibetan recently I talked a little about the use of U+0F39 TIBETAN MARK TSA -PHRU for writing shorthand contractions, and whilst I'm still in a Tibetanish mood I thought I might discuss Tibetan shorthand contractions in some more detail, especially as I have just made available a List of Tibetan Shorthand Contractions that I have garnered from various sources.

Shorthand contractions (bskungs yig བསྐུངས་ཡིག་ "concealed writing" or bsdu yig བསྡུ་ཡིག་ "amalgamated writing") are informal contractions of words, created by conjoining two or more "syllable units" into a single unit. For example. the word bkra shis བཀྲ་ཤིས་ "auspicious" (also the common personal name Tashi) may be contracted to bkris བཀྲིས་, which looks like it ought to be an authentic Tibetan word but isn't. Although in this case the resultant shorthand contraction conforms to Tibetan spelling rules, this need not be so, and in very many cases the shorthand contractions break the normal rules of Tibetan spelling, as can be seen in the example below :



This is a Tibetan 1½ srang coin of 1937 in which the word bcu gcig བཅུ་གཅིག་ "eleven" is contracted to bcuig བཅིུག་, with the letter ca taking a 'u' vowel sign below and an 'i' vowel sign above. This is contrary to the rules of Tibetan spelling, under which a consonant can only take a single vowel sign (diphthongs are represented by putting the second vowel sign on a following letter 'a, e.g. spre'u སྤྲེའུ་ "monkey").

In the above example the vowel signs on the two syllables being combined together are above and below, so there is no typographical interaction between the vowels, and the contraction should be rendered correctly by most Tibetan fonts. However, some Tibetan fonts do not cope well with the cases where multiple vowel signs occur above the same consonant, as in this example :



This is a detail from a prayer flag in which the common formula ཀི་ཀི་སྭོ་སྭོ་ལྷ་རྒྱལ་ལོ། "All Hail, Glory be to the Gods !" has been contracted to kii soo ཀིི་སོོ་ (kii swoo ཀིི་སྭོོ་ would be the expected form), with two 'i' vowel signs over the letter ka and two 'o' vowel signs over the letter sa. None of the Tibetan fonts on my system render the vowel signs correctly, either overlaying the double vowel signs on each other or rendering the second one on a dotted circle (Jomolhari renders the double 'o' correctly, but not the double 'i'). Note that the double 'o' vowel could be represented by U+0F7D TIBETAN VOWEL SIGN OO, but I prefer to restrict U+0F7D and U+0F7B TIBETAN VOWEL SIGN EE to transliterating Sanskrit au and ai respectively.

This example also illustrates one of the techniques of shorthand contractions, that is representing syllable reduplication by doubling the vowel sign. Another example of syllable repetition is frequently seen on prayer flags, which often end with the word bskyed བསྐྱེད་ "increase, prosper" written once (e.g. here), twice (e.g. here) or even three times (e.g. here) for added emphasis. In shorthand contractions, the number of vowel signs is used to indicate the number of syllable repetitions, so in the examples below the contraction bskyeed བསྐྱེེད་ (with two 'e' vowels) represents bskyed bskyed བསྐྱེད་བསྐྱེད་ and the contraction bskyeeed བསྐྱེེེད་ (with three 'e' vowels) represents bskyed bskyed bskyed བསྐྱེད་བསྐྱེད་བསྐྱེད་ :





Rules of Contractions

I guess that it is obvious by now to any readers who may have read my posts on Long S and R Rotunda that I am obsessed with orthographic rules, so it should be no surprise that I have attempted to look for order amongst the rule-breaking of Tibetan contractions. However, my first observation is that contractions are often idiosyncratic, and the same word may be contracted differently in different sources. For example, rgya mtsho རྒྱ་མཚོ་ "ocean" is variously contracted as རྒྱོ༹་, རྒྱམོ་ or རྪོ་. In most cases it is not possible to systematically reverse engineer the uncontracted form from a contraction, and contractions may perhaps be best considered as mnemonic abbreviations. What any individual contraction should be expanded to is usually evident from its context. Nevertheless, there are a few general principles that I have gleaned from the examples in my List of Tibetan Shorthand Contractions :

1. If the final letter of the first syllable unit is the same as the first letter of the following syllable unit, then the two letters are combined :

  • mkha' 'gro མཁའ་འགྲོ་ = mkha'gro མཁའགྲོ་
  • lcags sgrog ལྕགས་སྒྲོག་ = lcag.sgrog ལྕགསྒྲོག་
  • gtum mo གཏུམ་མོ་ = gtu.mo གཏུམོ་
  • dpal ldan དཔལ་ལྡན་ = dpa.ldan དཔལྡན་
  • 'od dkar འོད་དཀར་ = 'odkar འོདཀར་
  • gnon nu གནོན་ནུ་ = gno.nu གནོནུ་
  • gzug gin 'dug གཟུག་གིན་འདུག་ = gzu.gin 'dug གཟུགིན་འདུག་
  • khyab bdag ཁྱབ་བདག་ = khyabdag ཁྱབདག་

2. An anusvara sign is used to represent a final letter ma somewhere in the uncontracted word :

  • khams gsum ཁམས་གསུམ་ = ཁམསུཾ་
  • khrums smad ཁྲུམས་སྨད་ = ཁྲུཾད་
  • mnyam nyid མཉམ་ཉིད་ = མཉིཾད་
  • mnyam bzhag མཉམ་བཞག་ = མཉཾག་
  • thams cad ཐམས་ཅད་ = ཐཾད་
  • rnam grangs རྣམ་གྲངས་ = རྣངཾས་
  • lha mtshams ལྷ་མཚམས་ = ལྷ༹ཾས་

3. A tsa 'phru sign is used to indicate a letter tsa , tsha , dza or za somewhere in the uncontracted word :

  • kun bzang ཀུན་བཟང་ = ཀུན༹ང་
  • kun rdzob ཀུན་རྫོབ་ = ཀོུབ༹་
  • skal bzang སྐལ་བཟང་ = སྐལ༹ང་
  • rgya mtsho རྒྱ་མཚོ་ = རྒྱོ༹་
  • khur tshos ཁུར་ཚོས་ = ཁོུས༹་
  • rgyal mtshan རྒྱལ་མཚན་ = རྒྱལ༹ན་
  • chu tshod ཆུ་ཚོད་ = ཆོུ༹ད་
  • rje btsun རྗེ་བཙུན་ = རྗེུན༹་
  • ting 'dzin ཏིང་འཛིན་ = ཏིངི་ན༹་
  • thugs brtse ཐུགས་བརྩེ་ = ཐེུག༹ས་
  • bdud rtsi བདུད་རྩི་ = བདིུད༹་
  • sno tshogs སྣོ་ཚོགས་ = སྣོག༹ས་
  • phyag 'tshal lo ཕྱག་འཚལ་ལོ་ = ཕྱ༹ལོ་
  • phun tshogs ཕུན་ཚོགས་ = ཕོུག༹ས་
  • bya tshogs བྱ་ཚོགས་ = བྱོ༹གས་
  • sgrang rtsi སྦྲང་རྩི་ = སྦྲིང༹་
  • lha mtshams ལྷ་མཚམས་ = ལྷ༹ཾས་

Note that when tsa 'phru occurs on a stack with a head letter (e.g. rgy^o རྒྱོ༹་), it attaches to the head letter not the base consonant. I have not seen a single example of tsa 'phru attaching to a consonant that is not the top of the stack (i.e. in Unicode terms, U+0F39 never seems to attach to a subjoined letter <0F90..0FBC>).

4. Final -gs གས is represented by a reversed letter ta  :

  • lcags ལྕགས་ = ལྕཊ་
  • chags thogs ཆགས་ཐོགས་ = ཆཊ་ཐོཊ་
  • thugs rje ཐུགས་རྗེ་ = ཐུཊེ་ (also contracted as ཐེུགས་)
  • thugs brtse ཐུགས་བརྩེ་ = ཐེུ༹ཊ་ (also contracted as ཐེུག༹ས་)
  • de bzhin gshegs pa དེ་བཞིན་གཤེགས་པ་ = དེནིཊེ་པ་
  • lhan tshogs ལྷན་ཚོགས་ = ལྷནོ༹ཊ་

5. Syllable repetition is represented by multiple vowel signs (as discussed above) :

  • bskyed bskyed བསྐྱེད་ = bskyeed བསྐྱེེད་
  • bskyed bskyed bskyed བསྐྱེད་ = bskyeeed བསྐྱེེེད་


Unicode Issues

Here's what the Unicode Standard has to say about Tibetan Shorthand Abbreviations :

Tibetan Shorthand Abbreviations (bskungs-yig) and Limitations of the Encoding.

A consonant functioning as a word-base (ming-gzhi) is allowed to take only one vowel sign according to Tibetan grammar. The Tibetan shorthand writing technique called bskungs-yig does allow one or more words to be constructed into a single, very unusual combination of consonants and vowels. This construction frequently entails the application of more than one vowel sign to a single consonant or stack, and the composition of the stacks themselves can break the rules of normal Tibetan grammar. For this reason, vowel signs do sometimes interact typographically, which accounts for their particular combining classes.

The Unicode Standard accounts for plain text compounds of Tibetan that contain at most one base consonant, any number of of subjoined consonants, followed by any number of vowel signs. This coverage constitutes the vast majority of Tibetan text. Rarely, stacks are seen that contain more than one such consonant-vowel combination in vertical arrangement. These stacks are highly unusual and are considered beyond the scope of plain text rendering. They may be handled by higher-level mechanisms.


What the standard does not say is that the "particular combining classes" cause all sorts of problems for dealing with shorthand contractions.

Firstly, all the vowel signs that are positioned above the stack ('i', 'e', 'double e', 'o' and 'double o') have a CCC (Canonical Combining Class) of 130, whereas the only vowel sign that is positioned below the stack ('u') has a CCC of 132, which means that when normalized a 'u' vowel sign will be reordered after any other vowel sign. However, the logical Tibetan order is to write the 'u' vowel sign first before any vowel signs above, and this is the order expected by most Tibetan fonts, with the result that a word such as bcuig may not be rendered correctly in normalized form (on my computer at least, the normalized version renders incorrectly with whatever font I use, but note that paradoxically on pre-Vista systems without Uniscribe Tibetan support both sequences may render correctly) :

  • bcuig བཅིུག་ <0F56 0F45 0F74 0F72 0F42>
  • bciug བཅིུག་ <0F56 0F45 0F72 0F74 0F42> (NFC/NFD)


As can be seen from the above screenshot, the normalized version renders incorrectly. With all of the Tibetan fonts on my system a dotted circle is inserted into the glyph sequence; I believe that this is done by Uniscribe (version 1.0606.5112.0 on my computer), presumably because it has been taught that a consonant only takes one vowel sign, and so two vowel signs must be invalid. With Jomolhari (my favourite Tibetan font), the problem is doubly bad, as the font also adds a dotted circle of its own into the glyph sequence, with the result that 'u' is assisted by two dotted circles. Personally, I dislike the Uniscribe philosophy of trying to restrict script-specific rendering logic to the rendering engine, so that OpenType logic in the font is often ignored or circumvented. I would much prefer it if rendering engines such as Uniscribe did not try to impose their interpretation of correct rendering behaviour on the font, but just let the font do what is specified in its OpenType tables (although I understand that Microsoft prefers to keep the logic in the rendering engine so that rendering is uniform across fonts). I suspect that if Uniscribe didn't insert the spurious dotted circle into the glyph sequence in the first place, then perhaps at least some of my Tibetan fonts would deal correctly with the normalized sequence of multiple vowels.

The second problem is that U+0F39 TIBETAN MARK TSA -PHRU has a CCC of 216, which means that when normalized it will be reordered after all vowel signs. However, Tibetan fonts expect the tsa 'phru to occur immediately after the consonant stack that it modifies and before any vowel signs, and thus sequences with one or more vowel signs between a consonant and a tsa 'phru will not render correctly. This is not specifically a problem with shorthand contractions, but is a problem that is most frequently encountered with shorthand contractions due to the common use of tsa 'phru in constructing contractions. For example, the contraction for nyin mtshan ཉིན་མཚན་ "day and night" is ny^in ཉི༹ན་ (^ represents tsa 'phru in EWTS transliteration). The contraction written in logical order and normalized order is shown below, and again, on my system, the normalized form does not render correctly (same caveat as above for pre-Vista systems) :

  • ny^in ཉི༹ན་ <0F49 0F39 0F72 0F53>
  • nyi^n ཉི༹ན་ <0F49 0F72 0F39 0F53> (NFC/NFD)


5 comments:

John Cowan said...

Fonts and rendering software that rely on receiving combining characters in a particular order, and don't work with any canonically equivalent order (of which the canonical order is just a particular one chosen for convenience) are clearly not conformant to Unicode. They should be ruthlessly stamped out.

That is my opinion, and it is further my opinion ....

Andrew West said...

Well, I wouldn't disagree. I have now revised the original post slightly to suggest that in this case the rendering engine (Uniscribe) is probably to blame for inserting a dotted circle into the glyph sequence, and without the mistaken dotted circle the fonts would probably render multiple vowels on a single stack correctly whether in logical order or normalized order.

Michael Everson said...

Meitei Mayek can also write PA + I + U for PIPU.

Andrew West said...

Intersting. I also notice in your Meithei Mayek proposal an example with three vowels, pepupā written as p+e+u+ā.

What I can't see is the supposed relationship between Meithei and Gupta Brahmi. The evolution of most letters from early Brahmic to Tibetan is reasonably clear, and letters such as CA, CHA and JA are pretty much the same in all the early Brahmic scripts, but I can't see any obvious correspondence between Gupta Brahmi and Meithai Mayak.

CFynn said...

The Dzongkha Development Commission has recently published a book on chos skad contractions: བསྡུ་ཡིག་གསེར་གྱི་ཨ་ལོང།