Thursday, 23 February 2006

Stacking Diacritics and Complex Tibetan Stacks

Michael Kaplan thinks that stacking diacritics up to the ceiling and down to the basement is really cool. I think so too, and was disappointed to find that it doesn't work with the current release of BabelPad. Well, with a couple of tweaks (allowing large line spacing values and centring the output vertically within the space between the previous and next lines), I've got stacking diacritics to display correctly in BabelPad, as you can see from the following screenshots of the letter a with 72 combining diacritics (33 above and 39 below). In both cases the font is Doulos SIL at 24 points, but the first screenshot shows what you get if you turn off Uniscribe and render everything as spacing characters, whilst the second one shows monumental stacking when you turn on Uniscribe (version 1.420.2600.2180 on my computer) and set BabelPad's line spacing to 12.0.

Screenshot 1 : Combining diacritics laid out horizontally (Uniscribe off)

Screenshot 2 : Combining diacritics stacked vertically above and below (Uniscribe on)

Unfortunately the new improved version of BabelPad used for these screenshots won't be coming out until May. It was scheduled for release at the end of March, as soon as Unicode 5.0 is released, but we now hear that the release of Unicode 5.0 is being delayed until May. As my working versions of BabelPad and BabelMap have already been upgraded internally to support 5.0 I can't release them until after 5.0 is out. On the one hand, this is rather annoying, as there are lots of bug fixes and improvements that I want to release as soon as possible; on the other hand, it gives me some desperately needed time to get everything else ready for 5.0, including my suite of Phags-pa fonts.

Anyhow, back to stacking diacritics. a with 72 dicaritics is certainly impressive, but not very useful in the real world. However, there is one script that I can think of that does occasionally require multiple stacking characters. This is Tibetan. Tibetan is normally written horizontally with consonant clusters stacking vertically (implemented as one full-sized consonant from the range 0F40..0F69 and zero or many subjoined consonants from the range 0F90..0FB9). Ordinary Tibetan text only has limited vertical stacking (usually just one subjoined consonant, but sometimes two), and can be rendered correctly using Uniscribe version 1.453.3665.0 or later and a competent OpenType Tibetan font (of which there are now several freely available). However, occasionally, in esoteric texts, consonants are piled up (or rather down) like some crazy Yertle stack. With the first version of Uniscribe to support Tibetan OpenType features, Tibetan stacks with many subjoined consonants do not render correctly with any Tibetan OpenType font (I'm using Microsoft's not-yet-released Ximalaya font for these examples).

Screenshot 3 : Complex non-standard Tibetan stacks (Uniscribe version 1.453.3665.0)

But with the latest versions of Uniscribe (version 1.468.4011.0 or later), in conjunction with the Ximalaya font, it is possible to correctly render Tibetan stacks with many subjoined consonants. It's not terribly pretty, but I think it is pretty amazing.

Screenshot 4 : Complex non-standard Tibetan stacks (Uniscribe version 1.468.4011.0)

The above are all real examples of complex stacking, taken from sngags kyi klog thabs shes rab mig 'byed སྔགས་ཀྱི་ཀློག་ཐབས་ཤེས་རབ་མིག་འབྱེད. However, there are still some complex stacks that cannot yet be rendered in plain text. For example, in some of the complex stacks in this text there is also a horizontal element, where one or more of the subjoined letters is followed horizontally by the letter NGA to make a subjoined syllable such as yang ཡང (the YA is vertically in line with the stack, but the NGA protrudes forward). At present there is no way of indicating horizontal progression at the subjoined level (and there probably never will be).

Also, with the "Ximalya" font (or at least my version of it, maybe the version shipping with Vista [called Microsoft Himilaya] will have been improved) non-standard multiple vowel signs do not work well, for example two i vowel signs will overlay each other, when they should each occupy their own space. As double i vowel signs are found in some abbreviations (which do not use abnormal stacking), the failure to render multiple vowel signs correctly is a little disappointing.

Wednesday, 15 February 2006

When is a Swastika not a Swastika ?

When it's encoded in Unicode, when it is a CJK Unified Ideograph ... or two CJK ideographs to be precise:

  • U+534D CJK UNIFIED IDEOGRAPH-534D 卍 (left-facing or anticlockwise swastika)
  • U+5350 CJK UNIFIED IDEOGRAPH-5350 卐 (right-facing or clockwise swastika)

This comes as a suprise to most people, who do not naturally associate the swastika with the Chinese script. Of course, the swastika is not a Chinese invention, but was originally an ancient Indian religious symbol. It was introduced into China along with Buddhism, as the swastika was supposed to be one of the thirty-two marks of a Buddha. In the year 693 Empress Wu decreed that the swastika should henceforth be regarded as a Chinese character, to be pronounced the same as the character 萬 wàn "ten thousand".

The swastika thus entered the vast corpus of Chinese characters. The left-facing form is most common in Chinese usage, but both forms are found, as there was some disagreement amongst Chinese authorities as to which form was correct. The swastika, in either or both forms, is duly recorded in most large modern dictionaries (although only the left-facing form is found in the Kangxi Dictionary 康熙字典, where it has a very meagre entry). The two swastika characters were included in early Chinese encodings such as CNS 11643-1986, and so also included in the earliest version of Unicode as part of the CJK unified ideograph repertoire derived from the various legacy encodings.

The swastika character in Chinese does not have any meaning other than its own shape as an auspicious symbol, and so it is usually only used in the compound word wàn zì 卍字 (also often written as 萬字) "swastika character" to describe the swastika motif in the decorative arts. The following excerpt from the great 18th century novel Honglou Meng 紅樓夢 "A Dream of Red Mansions" illustrates the use of the swastika character in running text (the novel also includes a maid with the name of Wan'er 卍兒) :


Yesterday when I opened the storeroom I saw quite a few rolls of vermilion cicada-wing patterned gauze in some big chests. There were all sorts of designs with sprigs of flowers, as well as designs with floating clouds and patterns of swastika and good fortune characters, and designs with butterflies fluttering amongst the flowers. The colours are bright, and the gauze is soft and light, the like of which I have never seen before.

Honglou Meng 紅樓夢 (Beijing: Renmin Wenxue Chubanshe, 1982) ch.40 p.547.

N.B. In some editions the word biānfú 蝙蝠 "bat" is found in place of wànfú 卍福 "swastika and good fortune", the bat also being an auspicious emblem in Chinese. The name of the maid Wan'er 卍兒 is also written 萬兒 in some editions.

The swastika is also an important symbol in other cultures, particularly in Tibet, where the swastika 卐 is a symbol of changelessness and eternity for Buddhists, and the left-facing swastika 卍 is the main emblem of the native Bön བོན religion. The most common name for the swastika symbol in Tibetan is g.yung drung གཡུང་དྲུང་ (silent initial g), which is a word of uncertain etymology. By themselves, g.yung གཡུང་ means a cross between a cow and a yak, and drung དྲུང་ means "near to, in front of or beside", so literally the word g.yung drung would mean something like "in front of the cow-yak", which obviously makes no sense. However, in the ancient Zhang Zhung language that is partially preserved in the Bön tradition, the word for the swastika is drung mu, which obviously has some relationship to Tibetan g.yung drung, although the etymology of the Zhang Zhung word is equally obscure (mu means "sky, heaven" in Zhang Zhung, but the root meaning of drung is not clear).

As the swastika is not confined to Han usage, but is a symbol used by many other cultures, some would argue that the swastika signs should be encoded in Unicode as symbols for general usage, in the same way that U+262F YIN YANG ☯, U+262A STAR AND CRESCENT ☪, U+262D HAMMER AND SICKLE ☭, U+2629 CROSS OF JERUSALEM ☩ and many other such religious or political symbols are, and that U+534D and U+5350 should then be restricted to Han usage. This is unlikely to happen due to sensitivities over the misuse of the swastika symbol by one particular culture. Nevertheless, there are several problems that I see with only encoding the swastika ideographs and not encoding swastika symbols in their own right.

Firstly, the swastika ideographs are given a Unicode script property of "Han", which indicates that they are only intended for use in a Han ideographic context. However, other scripts have a legitimate claim to the use of the swastika, and the Unicode Standard explictly states that the Tibetan script uses U+534D and U+5350 (TUS 4.0 p.257). This suggests to me that, out of the 70,000+ CJK ideographs currently encoded, U+534D and U+5350 alone should perhaps be given a script property of "Common". Michael Kaplan has suggested that it is a deficiency in the Unicode script property that characters must either belong to a single script only or else belong to all scripts, and thus it is not possible to specify that a character belongs to a particular subset of scripts, such as "Han and Tibetan" in the case of U+534D and U+5350. I guess that for many characters it is difficult to define the boundaries of script usage, and it is a lot simpler to just use "common" rather than a potentially controversial or changing list of scripts.

Secondly, the glyphs for the ideographic swastikas are often drawn in an ideographic style which may not be suitable for non-Han usage.

Thirdly, because U+534D and U+5350 are hidden amongst the thousands of anonymous CJK ideographs, it is not easy for users to find them if they do not already know where to look. For example, searching for "swastika" in either Windows Character Map or BabelMap will not produce any results (though this will change in the next version of BabelMap), which would probably lead most people to suppose that there are no swastika symbols encoded in Unicode ... and perhaps they would be half right.

[This blog follows on from Michael Kaplan's recent post Every character has a story #17: U+534d and U+5350]


Looking through some old files I have just rediscovered some images of bon head marks, which are formed from the left-facing swastika. [2007-05-27 : these headmarks are actually used in the sMar-chen script, as discussed in my Zhang Zhung Scripts post.]

These marks are the equivalent of the head mark character U+0F04 TIBETAN MARK INITIAL YIG MGO MDUN MA, or perhaps more accurately the recently proposed archaic-style head mark character, pencilled in as U+0FD3 TIBETAN MARK INITIAL BRDA RNYING YIG MGO MDUN MA, with the curl styled into a swastika, and are used in bon religious texts. (I think that Tibetan head marks should perhaps be the topic for my next Tibetan Extensions blog.)