Sunday, 26 December 2010

The Mystery of Two Khitan Scripts

The Khitan people who lived in northern China during the 10th through 13th centuries spoke the now extinct, and poorly understood, Khitan language. For me, the most intriguing thing about the Khitan language is that it was written using two different writing systems.

Khitan Large Script (契丹大字)

The "Khitan large script" is a logographic/syllabic script derived from and imitating Chinese characters (see How Complex is Tangut ? for a discussion of the relative complexities of the Tangut, Jurchen and Khitan large scripts). It comprises several hundred characters that each have a distinct logographic meaning or syllabic pronunciation. These characters are not only similar in form and construction to Chinese characters, but up to 30% of the known Khitan large characters are borrowed directly from Chinese for use as phonetic borrowings or to represent borrowed words from Chinese (see for example the borrowed characters 皇帝 "emperor" and 囯 "country" in the Memorial of the Prince of the North shown below). Some of the characters in the later Jurchen script appear to be derived from Khitan large characters, and have a common pronunciation or meaning.

Part of a rubbing of the memorial stone of the Prince of the North (Khitan large script)

Khitan Small Script (契丹小字)

The "Khitan small script" is a mixed writing system that mostly comprises phonetic elements that are joined together in a rectangular phonographic block that represents the pronunciation of a word, together with a relatively small number of logographic characters that are used to represent frequently used vocabulary such as numbers and the cardinal directions. The phonetic elements that make up phonograms appear to be derived from Chinese characters, and some of them are the same as Khitan large characters, although none of the Khitan small script logographic characters are the same as the corresponding Khitan large script logographic characters.

Part of a rubbing of the memorial tablet of Emperor Daozong (Khitan small script)

The phonetic elements are arranged in groups of one through seven characters as shown below.

Arrangement of phonetic elements making up phonograms in the Khitan small script

What's the Difference ?

Our understanding of the two Khitan scripts, and the language they were used to write, is severely limited by the lack of any contemporary dictionaries or glossaries of the Khitan language. The Tangut people were contemporaneous with the Khitan, but they lived in the more arid region to the west of the Khitan territory, and large numbers of manuscripts and printed texts in the Tangut script have been found, buried in the sands of the ruins of the Tangut fortress city of Khara-Khoto and elsewhere, including several dictionaries and glossaries that have enabled the Tangut language to be largely deciphered. It is almost certain that the Khitans would also have produced dictionaries and phonological texts, but due to the accidents of geography not a single manuscript or printed text in either Khitan script has survived [not correct — see Addendum A at the end of this post]. What have survived, however, are a fairly large number of stone memorial tablets for members of the Khitan nobility, as well as a number of short inscriptions on various portable artefacts.

There are about fifty known monumental inscriptions in the two Khitan scripts, of which about 17 are in the Khitan large script and about 33 are in the Khitan small script, which suggests that the small script was more widely used than the large script, but it is not known why the Khitan people used these two different scripts, or what determined the choice of which script to use. Japanese uses multiple different scripts (kanji, hiragana and katakana), but these are differentiated functionally, and are normally used in conjunction within the same text; whereas the two Khitan scripts appear to be mutually exclusive as they never occur together on the same monument or artefact. Why then are there two Khitan scripts ?

Hypothesis A : Chronological Variation

The first idea that springs to mind is the possibility that the two scripts were not used at the same time. Perhaps one script was used first, but was later displaced by the other script. According to the History of the Liao (see juan 2, 75 and 89), the Khitan large script was created by order of Emperor Taizu of Liao with the assistance of Yelü Tulübu 耶律突呂不 and Yelü Lubugu 耶律魯不古, and was introduced at the start of the year 920. The small script was reputedly devised four or five years later, influenced by the Uyghur script, by Yelü Diela 耶律迭剌, the younger son of Emperor Taizu. We might therefore expect that the large script was used during the reign of Emperor Taizu, and the phonetic small script gradually become more widely used after the death of the emperor in 926, eventually displacing the more cumbersome large script. However, this is not borne out by the extant corpus of inscriptions.

Khitan Large Script Inscriptions by Date
986Memorial for Yelü Yanning 耶律延寧 (946–985)
1041Memorial for the Prince of the North 北大王 (Yelü Wanxin 耶律萬辛, 972–1041)
1056Memorial for an unknown person
1056Memorial for the Grand Preceptor (太師)
1058Stone inscription from Dornogovi Province, Mongolia
1062Memorial for Yelü Changyun 耶律昌允 (1000–1061)
1072Memorial at Jing'an Temple (靜安寺) errected by the Lady of Lanling Commandery (蘭陵郡夫人)
1081Memorial for Lord Dorlipun 多羅里本郎君 (1037–1080)
1084Stone inscription from Khentii Province, Mongolia, commemorating a battle victory by Hutenu (Yelü Zhaosan 耶律趙三)
1087Memorial for the Princess of Yongning Commandery 永寧郡公主
1089Memorial for Xiao Xiaozhong 蕭孝忠 (d.1089)
1090Memorial for Xiao Paolu 蕭袍魯 (1018–1089)
1094Bronze seal from Panshan
1108Memorial for Yelü Ji 耶律褀 (1033–1108)
1114Memorial for Yelü Xinie 耶律習涅 (1063–1114)
1176Memorial for Lord Li Ai 李爱郎君

Khitan Small Script Inscriptions by Date
1053Memorial for Yelü Zongjiao 耶律宗教 (992–1053)
1055Memorial for Emperor Xingzong of Liao 興宗 (1015–1054)
1057Memorial for Xiao Linggong 蕭令公 (Xiao Fuliu 蕭富留)
1058Memorial for an unknown person
1068Memorial for Xiao Tuguci 蕭圖古辭
1072Memorial for Yelü Renxian 耶律仁先
1076Memorial for Empress Renyi 仁懿皇后 (?–1076)
1076Memorial for Yelü Gaoshi 耶律高十 or Han Gaoshi 韓高十 (1015–?)
1078Memorial for Madam Han 韓氏, second wife of the imperial son-in-law, Xiao Temei 蕭特每
1082Memorial for Yelü Cite 耶律慈特 (1043–1081)
1085Memorial for Lord Yelü Yongning 耶律永寧郎君 (1059–1085)
1092Memorial for Yelü Dilie 耶律迪烈 or Han Dilie 韓迪烈 (1026–1092)
1094Memorial for Yelü Zhixian 耶律智先
1099Memorial for Yelü Nu 耶律奴 (1041–1098)
1100Memorial for the Grand Preceptor Shilu 室魯太師
1100Memorial for Yelü Hongbian 耶律弘辨 or Yelü Hongyong 耶律弘用
1101Memorial for Emperor Daozong of Liao 道宗 (1032–1101)
1101Memorial for Empress Xuanyi 宣懿皇后 (1040–1075)
1101Memorial for Yelü Dilie 耶律敵烈 or Han Dilie 韓敵烈 (1034–1100)
1102Memorial for Yelü Jiuli 耶律糺里 or Yelü Gui 耶律貴 (1061–1102)
1102Memorial for Yelü Fubushu 耶律副部署 or Yelü Fushu 耶律副署 (1031–1077)
1105Memorial for the Prince of Xu 許王
1107Memorial for the Prince of Liang 梁國王
1108Memorial for the Inspector of Zezhou 澤州刺史
1110Memorial for the Imperial Consort of Song and Wei 宋魏國妃 (?–1090)
1110Memorial for Yelü Hongben 耶律弘本 (1041–1110), the Imperial Grand Uncle 皇太叔祖
1115Memorial for Madam Yelü 耶律氏 (Yelü Tabuye 耶律挞不也)
1134Record of the Younger Brother of the Emperor of the Great Jin Dynasty (Da Jin huangdi dutong jinglüe Langjun xingji 大金皇弟都統經略郎君行記)
1150Memorial for Xiao Zhonggong 蕭仲恭
1171Memorial for the Jin Dynasty Defense Commissioner of Bozhou 金代博州防禦使

The first noticable feature of the above tables is that there are no dated Khitan inscriptions dating to the time of Emperor Taizu or any time soon after. Except for a single large Khitan inscription dating to 986, the earliest dated inscriptions only date back to the mid 11th century, over a hundred years after the recorded creation of both scripts. Clearly the large script was not displaced soon after the death of Emperor Taizu. On the contrary, the two scripts seem to have coexisted happily for at least two hundred years, from the mid 11th century, through the fall of the Liao dynasty (907–1125), and into the first half of the Jin dynasty (1115–1234). Both scripts seem to have continued in use up to at least the 1170s, with neither displacing the other, and it was only with the proscription of Khitan by the Jurchen court in 1191–1192 that both scripts finally fell out of use.

Distribution of Khitan Inscriptions by Date

Hypothesis B : Geographic Variation

If the Khitan scripts do not show any significant chronological variation, then perhaps they show a different geographical distribution, with the Khitan small script used in one part of the Khitan territory, and the large script in another part of the Khitan territory. But this does not appear to be supported by the distribution map shown below (click on the map to explore it in greater detail). Although there does seem to be some clustering of small script inscriptions, there is no obvious geographical distinction between the two scripts.

Location of Khitan Inscriptions (yellow = large script, green = small script)

Hypothesis C : Different Functional Usage

Perhaps the two scripts had different functions, for example one for writing religious texts and one for writing secular texts, or one for writing official and court documents and one for writing private and personal documents ? But as both scripts were commonly used for exactly the same function (writing memorials for the dead) this theory seems to be a non-starter.

Hypothesis D : Different Social Usage

Maybe the two different scripts were used by two different sections of the Khitan population. Was one script used by men and the other script used by women ? This seems not to be the case, as both scripts are used to write memorials for both men and women. Was one script used by royalty and nobility, and the other script used by commoners ? Probably not, as there are memorials to princes and princesses in both scripts, although the only memorials to emperors and empresses found so far are in the small script. Were the scripts used by different clans ? Again, there is no evidence for this, as both scripts were used to write memorials for members of the Yelü 耶律 clan.

Hypothesis E : Different Linguistic Usage

A final possibility is that the two scripts were used to write two different languages or dialects. Although there is no evidence that the Khitans spoke more than a single language, it is a possibility that cannot be discounted. But it is a theory that is difficult to prove or disprove as most of the Khitan words that have been identified in the small script are borrowings from Chinese, and almost all the large Khitan script words for which a reading has been proposed are also borrowings from Chinese.

Having looked at and discounted the various possibilities outlined above, we seem to be none the wiser about why there were two completely different ways of writing the Khitan language. Both scripts are complex enough to require a considerable investment of time and effort to learn to read and write, so how is it possible that both scripts managed to coexist and flourish for so long ? Did the Khitan education system require students to learn both scripts, or were Khitan scholars only able to read and write one or other of the two scripts ? It makes no sense to me ...

Addendum A [2011-10-15]

Unbeknownst to me at the time I wrote this post, less than a month earlier, on the 29 November 2010, Viacheslav Zaytsev of the Institute of Oriental Manuscripts [IOM] in Saint Petersburg had announced his identification of a 100+ page manuscript codex as being written in the Large Khitan script. This manuscript had been held at the IOM for many years, but as it was written in a cursive hand no-one had been able to identify the script with certainty. Most experts who had seen the manuscript had thought it was probably written in the Jurchen script, but by carefully comparing the text of the manuscript with memorial inscriptions written in Large Khitan, Zaytsev had been able to identify stretches of text that occured in both, and he was thereby able to prove for the first time that the manuscript was written in the Large Khitan script. This is the first and only manuscript written in either Large or Small Khitan to have been identified.

Addendum B [2011-10-21]

Viacheslav Zaytsev has drawn my attention to the fact that a fragment of a Khitan large script inscription was identified by Wang Ding in 2002. I discuss this fragment in Khitan Miscellanea 1.

Sunday, 24 October 2010

BabelPad and BabelMap Version

New versions of BabelPad and BabelMap that support Unicode 6.0 have been released today, and can be downloaded directly :

  • (simply unzip the file BabelPad.exe and run it from wherever you like)
  • (simply unzip the file BabelMap.exe and run it from wherever you like)

Creative Commons License
This screenshot of BabelMap is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA-3.0) by Andrew West.

Important Technical Note

BabelPad and BabelMap were scheduled for release on 11 October, to coincide with the release of Unicode 6.0 on that day, but their release was delayed due to a Blue Screen of Death crash that occured with the beta versions of both BabelMap and BabelPad when the Windows function ExtTextOutW is called within a path bracket, and the selected font is Symbola font version 6.00, and the ETO_GLYPH_INDEX flag is set, and the glyph index passed to the function corresponds to U+1F5FD STATUE OF LIBERTY (this problem only occurs in BabelPad when in Simple Rendering mode, which bypasses Microsoft's Uniscribe rendering engine). The glyph for U+1F5FD in the Symbola font has a mega-complex glyph outline (which oddly enough is the glyph for an angel, whilst the glyph for the Statue of Liberty is actually at U+FFFED), which probably results in a buffer overrun somewhere within Windows GDI. In order to work around this problem I have had to rewrite, refactor and retest core sections of the source code.

The newly released versions of BabelPad and BabelMap fix the problem described above, and should be safe for use with the Symbola font under normal usage scenarios, but the glyphs for U+1F5FB (MOUNT FUJI) through U+1F5FF (MOYAI) are rendered very slowly because of their extreme complexity (several thousand points for each glyph), resulting in sluggish response in BabelMap when scrolling through the Miscellaneous Symbols And Pictographs block, and potentially extremely sluggish performance in BabelPad.

Moreover, the Symbola font may still cause a Blue Screen of Death crash (reporting an infinite loop) on some systems when rendering U+1F5FD STATUE OF LIBERTY at high point sizes with standard Windows applications such as Notepad (my test case is to set Notepad to use the Symbola font at 72 points, and then paste in a string comprising twelve instances of U+1F5FD — my XP machine then blue screens, although my Vista machine is OK). This general Windows-level vulnerability to Symbola version 6.00 means that BabelMap may still blue screen if you insert multiple instances of U+1F5FD into the BabelMap edit buffer, and BabelPad may still blue screen if you attempt to display a document with multiple instances of U+1F5FD at a large point size in Complex Rendering mode (i.e. using Uniscribe). For this reason, you are strongly advised not to install Symbola version 6.00, but if you do install this font I cannot be responsible for any loss or damage incurred due to a system crash when running either BabelPad or BabelMap.

BabelPad Enhancements

  • BabelPad now emulates the Alt-X functionality found in Microsoft Word and WordPad (position the caret after a hexadecimal code pont value and hit Alt-X to convert it to the corresponding Unicode character; and position the caret after a Unicode character and hit Alt-X to convert it to its corresponding hexadecimal code point value)
  • Convert Unicode character names to their corresponding Unicode character (due to the difficulty of disambiguating strings such as "bell symbol for bell with cancellation stroke" where "bell", "bell symbol", "symbol for bell" and "bell with cancellation stroke" are all Unicode character names, the selected text must be an exact Unicode name or formal alias, and not a partial name or a longer text string containing a Unicode name; although you can use the contextual convert utility to convert structured data such as <UnicodeName>Vulgar Fraction Three Quarters</UnicodeName> to <UnicodeName>¾</UnicodeName>) ["Convert : Unicode Name to Character" from the main menu or the right-click menu]
  • Import Shift-JIS encoded documents with emoji extensions defined by DoCoMo, KDDI or SoftBank [select "Shift-JIS plus DoCoMo/KDDI/SoftBank emoji" from the "Encoding" dropdown list of the "Open File" dialog]
  • Convert Han ideographs to their pinyin or jyutping readings (not perfect as characters with multiple readings are converted to a slash-separated list of readings, even when one reading is considerably more common than another, but this feature may be useful for some users in some situations)
  • Title casing options for either Script Neutral title casing (e.g. The Owl And The Pussy-Cat Went To Sea) or English title casing (e.g. The Owl and the Pussy-Cat Went to Sea) ["Options : Title Casing" from the menu]
  • The default script colours when colour coding by script is selected ["Options : Display Colours : Colour Code by Script" from the menu] have been harmonized with the default script colours used for BabelMap, and an option to reset all script colours to their default values has been added to the "Configure Script Colours" dialog (this needs to be selected for the new default colours to be used).

BabelMap Enhancements

  • Script colours when colour coding by script has been selected are now user configurable ["Options : Customize Colours..." from the menu]
  • When colour coding of characters has been selected, the character with focus is no longer highlighted in red
  • The character with focus in the character grid is now indicated by its cell having an inset appearance
  • Option to rotate of not rotate the glyphs for vertical scripts (Mongolian and Phags-pa) where the selected font has rotated glyphs for vertical layout (in previous versions of BabelMap the glyphs are always rotated) ["Options : Other Options : Rotate Vertical Scripts" from the menu]
  • The Export Font Glyphs utility has been improved to ensure glyphs are not accidentally clipped in some cases
  • The Han Radical Lookup utility has been updated to cover CJK-D (now covers all all 74,616 CJK unified ideographs)
  • The Advanced Character Search utility now has an option to only give the total number of characters matching the selected criteria, and not list them all (this makes searches which return a large number of results, for example when querying how many characters were introduced in a particular version of Unicode, very fast)

Monday, 24 May 2010

Prototyping Tangut IMEs, or Why Windows 7 Sucks

Why Windows 7 No Longer Sucks [2011-03-01]

On 22nd February 2011 Windows 7 Service Pack 1 (SP1) was released, and I am very pleased to say that all the rendering issues discussed below are now solved.

Typing PUA Tangut under Windows 7 plus SP1

Internet Explorer 8 under Windows 7 plus SP1

Original Post [2010-05-24]

In many ways Windows 7 is a great improvement on Vista, but this is the sad story of why my children have the use my shiny new Windows 7 laptop, and I am sticking to the old, not very user-friendly and not very reliable Vista laptop. I hope that one day I will be able to write a blog extolling the virtues of Windows 7, but given the contents of the forthcoming Service Pack 1 it seems very unlikely to happen any time soon, and at the current rate of (lack of) progress, I am afraid that Microsoft will lose more and more of the few remaining loyal customers like myself who find it impossible to do cutting edge Unicode stuff with an operating system that values gimmicks over functionality, and for every step forward takes two steps backwards.

Prototyping Tangut IMEs

In anticipation of the eventual encoding of the Tangut script in Unicode, I have been prototyping a couple of Input Methods for Tangut that use the table driven text service that is available in Windows Vista and Windows 7 (see Michael Kaplan's twelve-part series Behold the Table Driven Text Service for a tutorial).

I have created two mapping tables for Tangut :

Installing on my Windows Vista laptop I get the following results (the icons, StrokeCode.ico and Alphacode.ico, are a little degraded in the jpgs) :

Tangut Stroke Code IME under Windows Vista

Tangut Alphacode IME under Windows Vista

Hmm, the IMEs both work just fine, but the Tangut characters in the candidate list show up as little squares, which means that if two or more characters share the same alphabetic code sequence you have to guess which character to choose, and even if it is a unique alphabetic code sequence it would be nice to see what the character looks like. Unfortunately, for Vista there is no way to specify what font to use for the candidate window, but as explained by Michael Kaplan in Can't I pick the candidate list font if I don't speak fluent square box?, Windows 7 introduces new FontFaceName and FontSize parameters for the TableTextService file format. So let's install these two IMEs (with Unicode Tangut specified at 16 points) and the Unicode Tangut font on my Windows 7 laptop and see what happens.

Why Windows 7 Sucks

Tangut Stroke Code IME under Windows 7 (using BabelPad)

Tangut Alphacode IME under Windows 7 (using BabelPad)

D'oh, that's one step forward and two steps backwards. The candidate window is now using the Unicode Tangut font as specified, but both in the candidate window and in BabelPad the Tangut characters (currently reserved code points) are displayed as little square boxes, in fact two square boxes per character, which suggests that surrogate code points are being rendered separately rather than combined as a single character. But perhaps this a problem with BabelPad; let's see what it looks like with Notepad :

Tangut Stroke Code IME under Windows 7 (using Notepad)

Hmm, that's no better. Just to show that there is nothing intrinsically wrong with the Unicode Tangut font or the table driven text service, here's a screenshot from Windows 7 of a Tangut Components IME that maps the Tangut components listed here to PUA codepoints (TableTextServiceTangutRadicalsPUA.txt and TangutRadicals.ico) :

Tangut Component PUA IME under Windows 7

Now, at this point there will be some people who will be saying, "of course your so-called Tangut text doesn't display properly, because you are using unassigned Unicode codepoints". Ignoring the fact that it does display OK in Windows Vista, Windows XP and even Windows 2000 if the Unicode Tangut font is installed (as Tangut is not a complex script from a rendering perspective, it does not need support from Uniscribe to render correctly), let's take a look and see how Windows 7 copes with a recently-encoded script like Egyptian Hieroglyphs which does have officially assigned Unicode characters (NB Egyptian Hieroglyphs render fine under Windows Vista with a font like Aegyptus) :

Egyptian Hieroglyphs rendered in Notepad under Windows 7

Egyptian Hieroglyphs rendered in BabelPad (in Complex Rendering mode) under Windows 7

Well, that's not any good. How does BabelMap cope?

Egyptian Hieroglyphs rendered in BabelMap under Windows 7

The Egyptian hieroglyphs render OK in the character grid and in the popup window, because BabelMap does not use Uniscribe, but renders character directly using their glyph ID, as read from the font's CMAP table. But the edit buffer is a standard Windows edit control, which uses Uniscribe, and the Egyptian characters render as square boxes. Let's try again with BabelPad, this time with "Simple Rendering" mode selected, which uses the same method as BabelMap to render characters :

Egyptian Hieroglyphs rendered in BabelPad (in Simple Rendering mode) under Windows 7

That's better! Let's do the same thing for my unofficial Tangut text :

Tangut text rendered in BabelPad (in Simple Rendering mode) under Windows 7

Tangut text (one character per line) rendered in BabelPad (in Simple Rendering mode) under Windows 7

Hmm, that's weird, it only renders the first character in each line correctly. And exactly the same problem is seen in Windows Vista (screenshot omitted), so it is almost certainly a bug in BabelPad (fixed in version released 2010-06-06). But what we have learnt is that if you use Uniscribe under Windows 7 (whether in Notepad or in an edit control or in BabelPad), then you won't see any Egyptian Hieroglyphs. The bottom line is that Windows 7 proudly supports Unicode 5.1, but is not forwardly compatible with later versions of Unicode, including Unicode 5.2 which was released in the same month that Windows 7 was released to the general public. Thus, for example, Phaistos Disc symbols (encoded in Unicode 5.1) render OK under Windows 7 (as evidenced by the fact that they display in the edit buffer of BabelMap) :

Phaistos Disc symbols (encoded in Unicode 5.1) rendered in BabelMap under Windows 7

All previous versions of Uniscribe have passively allowed text encoded in Unicode characters that it does not recognise to render OK as long as there is font support, but the version of Uniscribe that ships with Windows 7 appears to actively disallow Unicode text that it does not recognise ... or at least, characters in Unicode ranges that it does not recognise (post-Unicode 5.1 characters in existing Unicode blocks will be rendered OK under Windows 7 if there is font support). There is, however, one exception to this: CJK unified ideograph blocks added to the Supplementary Ideographic Plane (SIP) post Unicode 5.1 will render OK if there is font support (presumably Uniscribe treats the SIP as a single range) :

CJK Unified Ideographs Extension C (encoded in Unicode 5.2) rendered with the BabelStone Han font in Notepad under Windows 7

I wonder if Internet Explorer 8 does any better on Windows 7 than Notepad?

Internet Explorer 8 under Windows 7

Nope, just like in Notepad, Unicode 5.1 scripts and CJK Unified Ideographs Extension C render OK, but Egyptian Hieroglyphs and currently reserved character ranges come out as little square boxes. So there you have it, if you want to write in any of the fifteen new scripts added in Unicode 5.2 (Avestan, Bamum, Egyptian Hieroglyphs, Imperial Aramaic, Inscriptional Pahlavi, Inscriptional Parthian, Javanese, Kaithi, Lisu, Meetei Mayek, Old South Arabian, Old Turkic, Samaritan, Tai Tham, and Tai Viet) or any of the various new scripts and symbol blocks that will be added in the forthcoming Unicode 6.0 (Mandaic, Batak and Brahmi scripts, and Playing Cards, Miscellaneous Pictographic Symbols, Emoticons, Transport and Map symbols, and Alchemical Symbols), then my recommendation is to avoid Windows 7.

Phags-pa Rendering on Windows 7

Whilst we are on the subject of Windows 7, let's have a quick look at the rendering of the Phags-pa script in Windows Vista and Windows 7.

Phags-pa is a complex script in rendering terms, and Windows Vista does not actively support the script. Nevertheless, under Windows Vista, Unicode Phags-pa text renders correctly in all respects (joining, contextual shaping and variation sequences) in BabelPad and Notepad using my BabelStone Phags-pa Book font :

Phags-pa text rendered in BabelPad with the BabelStone Phags-pa Book font under Windows Vista

However, under Windows 7, the font is next to useless, as no joining or shaping behaviour is applied :

Phags-pa text rendered in BabelPad with the BabelStone Phags-pa Book font under Windows 7

On the other hand, the same Phags-pa text does render correctly using the Microsoft PhagsPa font that ships with Windows 7 :

Phags-pa text rendered in BabelPad with the Microsoft PhagsPa font under Windows 7

Now, the Microsoft PhagsPa font is in many respects (and not coincidentally) very similar to my BabelStone Phags-pa Book font, but the one crucial difference between the two fonts is the set of OpenType features that are used to control the joining and shaping behaviour of characters. The BabelStone Phags-pa Book font uses the Contextual Ligatures <clig> and Glyph Composition Decomposition <ccmp> features to enable it to do all the joining and shaping stuff, including variation sequences, internally without any need for assistance from Uniscribe. On the other hand, the Microsoft PhagsPa font uses the Initial Forms <init>, Medial Forms <medi> and Terminal Forms <fina> features to do the joining behaviour, and these features rely on Uniscribe. For this reason, the Microsoft PhagsPa font won't work correctly under Windows Vista (no Uniscribe support for Phags-pa), and conversely, the BabelStone Phags-pa Book font won't work correctly under Windows 7 (too much Uniscribe support for Phags-pa). I can't really complain about this, as Microsoft support for Phags-pa would almost inevitably mean making Uniscribe instrumental in the rendering process and using a different set of OpenType features than I used (of necessity) in my font. What I will do, when and if I ever get some free time from Tangut, is release new versions of my Phags-pa fonts that use the same OpenType features as the Microsoft PhagsPa font does.

But there is one added complication. Starting with Windows 7, Microsoft now use the newly defined Format 14 cmap subtable (Unicode Variation Sequences) to process variation sequences, thus by-passing OpenType entirely. In Windows Vista and earlier, variation sequences would work without any special support from Uniscribe by defining glyph substitutions in the font under the Glyph Composition Decomposition <ccmp> OpenType feature. Thus, under Windows Vista it is possible to correctly render Mathematical Variation Sequences by using James Kass' Code2000 font, or Phags-pa variation sequences using my Phags-pa fonts. But under Windows 7, variation sequences no longer render correctly using these fonts. Instead, under Windows 7, Microsoft's Cambria Math font supports Mathematical Variation Sequences, and Microsoft PhagsPa supports Phags-pa variation sequences, by including variation sequence mappings in an additional Format 14 cmap subtable which is accessed by Uniscribe. In my opinion, the use of a cmap subtable to apply variation sequences rather than use simple OpenType features is a very bad idea, as it overcomplicates what is essentially a very simple task, and makes variation sequence support not backwards compatible with versions of Windows prior to Windows 7. Moreover (and from my perspective, more importantly), there is not yet widespread support for the new Format 14 cmap subtable, and the font editor that I use have no short term plans to add support for this subtable, which makes it difficult for amateur font developers like myself to create fonts that use the Windows 7 model for variation sequences.

Finally, the screenshot above shows a variation sequence <U+A86A U+A85E U+FE00> (ꡪꡞ︀) rendered correctly with the Microsoft PhagsPa font on BabelPad (NB this only works on BabelPad version or later, as applications need to set an undocumented flag in Uniscribe [SCRIPT_CONTROL.fMergeNeutralItems = TRUE] for the Format 14 cmap substitutions to work), but take a look what happens when we display the same text on Internet Explorer 8 under Windows 7 :

Phags-pa text rendered in Internet Explorer 8 with the Microsoft PhagsPa font under Windows 7

... the variation sequence (highlighted) is rendered incorrectly as two disconnected glyphs. Looks like Internet Explorer 8 does not yet support the new Format 14 cmap subtable for variation sequences; yet one more example of Microsoft's disconnected thinking across different development teams, and the appalling lack of testing that seems to be par for the course with Microsoft.

Tuesday, 27 April 2010

Untangling the Web of Characters

Notes for an introductory talk on the Tangut script given at SOAS on 21st May 2009

2.1 The Sea of Characters

𘝞𗗚 ·jwɨr ŋjow
文海 wén hǎi

  • The mid 12th century monoglot Tangut rhyming dictionary, the "Sea of Characters" (Wén Hǎi 文海 in Chinese), provides a compositional analysis of each character
  • It explains each character in terms of other Tangut characters from which its constituent elements have been borrowed
  • E.g. explains Character A as being derived from the left side of Character X and the right side of Character Y
  • The source characters from which an element is said to be derived may have a phonetic or a semantic relationship with the target character (note it is the source character which has the semantic or phonetic function, not the element itself)
  • Creates a network of interrelated characters—a web of characters rather than a sea of characters

The four characters under the large head character give the character's structural composition, using the following terms (and sometimes others) to indicate what part of the source character is being referred to (as no more than four characters are ever used to describe a character's structural composition, often one or more of these terms are elided) :

  • 𘓳 *ŋowr = "whole"
  • 𘊱 *pha̱ = "left"
  • 𘁝 *nji̱j = "middle"
  • 𗡼 *bji̱r = "right"
  • 𗥦 *ɣu = "head, top"
  • 𗘡 *tśhjɨj = "bottom"
  • 𘍞 *iọ = "surrounding, enclosing"

The description of a character's composition does not always make sense.

Questions :

  • Can the Sea of Characters analysis be relied on ?
  • Does its analysis of the structure of Tangut characters accurately reflect the principles by which the Tangut script’s creator or creators devised the individual characters of the script ?
  • Or is it a later, spurious attempt to explain and rationalize the structure of characters ?

2.2 A Case Study : The "Sun" Radical 𘤊

This component is Nishida Tatsuo's Radical No. 211, which he calls the "sun radical" 日部 (see Seikago no kenkyū 西夏語の研究 [A Study of the Hsi-Hsia Language] page 244). However, very few characters with this component are in any way related to the sun, and so Nishida's radical name is a misnomer (by far the largest semantic group of characters with this component is the "Bird-related" group, but Nishida already has a "bird" radical). As we shall see below, unlike most Chinese radicals, Tangut radicals do not have a single fixed meaning, and so giving names to them (as Nishida and others have done) is at best not very useful, and at worst misleading.

  • A total of 219 characters (in N3797) with this component as a primary component :
  • 1 character with this component as its only component
  • 1 character with this component at the bottom
  • 119 characters with this component on the left hand side
  • 51 characters with this component on the right hand side
  • 47 characters with this component in the middle

The Sea of Characters dictionary has head entries for 102 of these characters :

Wén Hǎi LFW2008 Glyph Reconstruction and Gloss Structural Composition
P 8.243 3040 𗊌 *nju "sweat"

Left side of 𗊻 *śjo "sweat"

Right side of 𘔳 *lwew "steam, smoke"

P 8.252 0399 𘃷 *nju [a surname]

Middle of 𗨛 *rjɨr "to go out, to give birth"

Right side of 𗍊 *sju "as, like"

P 9.222 3808 𗾶 *xju "empty"

Left side of 𗾙 *lew "little bird"

Right side of 𗥪 *rjɨj "to teach"

P 9.223 3673 𗿉 *ɣju "smoke, mist"

Right side of 𘔳 *lwew "steam, smoke"

Right side of 𗞦 *kjur "to smoke sth."

P 9.231 3600 𗾤 *ɣju "to ask, to call"

Left side of 𗿄 *khju "to request, to ask"

Right side of 𗄼 *lja "to come"

P 10.222 2267 𗿻 *ku "phoenix"

Left side of 𗿼 *dźjwow "bird"

Right side of 𘜶 *ljịj "big"

Right side of 𘎃 *·we "bird"

P 10.253 3435 𗼍 *ɣu "god, supernatural being"

Left side of 𗼙 *ɣu "emperor"

Left side of 𗿼 *dźjwow "bird"

P 11.162 5083 𘛲 *gu̱ "to patrol"

Whole of 𘛯 *gu̱ [a surname]

Right side of 𘕂 *dźjij "to go"

P 11.172 5551 𘛯 *gu̱ [a surname]

Left side of 𘛴 *gu̱ "a spirit"

Middle of 𘛲 *gu̱ "to patrol"

P 12.151 2288 𗿺 *nju̱ "smoke"

Whole of 𗿉 *ɣju "smoke, mist"

Right side of 𘃠 *du̱ "to store"

P 12.272 3242 𗾇 *be "mad"

Left side of 𗾷 *dzjị "owlet"

Right side of 𗕶 *ɣạ "crazy"

P 13.111 2868 𗾸 *be "illness"

Whole of 𗾇 *be "mad"

Right side of 𗥓 *ŋo "disease"

P 15.153 3170 𗿜 *tśhji "shame, disgrace"

Left side of 𗾹 *tshwu "shame, disgrace"

Middle of 𗏣 *ljijr "direction"

Right side of 𗼊 *sew "shy, bashful"

P 17.212 3278 𗣣 *tshji "food"

Left side of 𗢯 *lhjwa "tongue"

Right side of 𗮘 *śjwi "food"

P 18.172 3243 𗾎 *kjwi "turtledove"

Left side of 𗿼 *dźjwow "bird"

Left side of 𗰰 *kjir [a surname]

P 18.252 1566 𘇭 *sjwi "to tie"

Left side of 𗝊 *sjwi "roof beam"

Right side of 𘌤 *djɨ̣ "ribbon"

P 19.242 1327 𘝀 *phji̱ "to fly"

Left side of 𘝋 *dzjwɨ "wing"

Left side of 𗿼 *dźjwow "bird"

P 20.113 5840 𗗪 *kji̱ "commerce, trade"

Left side of 𗗥 *źjị "to buy and sell"

Middle of 𘒨 *phjij "to express oneself"

Right side of 𗍋 *khjɨ̱ "to gather"

P 20.142 5985 𗈹 *sji̱ "to inspect"

Left side of 𗈲 *khwa "far"

Left side of 𘕨 *sji̱ "to cry, to wail, to lament"

P 20.253 2873 𗿡 *·wẽ [a place name]

Left side of 𗿼 *dźjwow "bird"

Right side of 𗩇 *·wẽ [a surname]

P 22.172 5966 𘂕 *ta "swallow"

Right side of 𗾙 *lew "little bird"

Left side of 𗿼 *dźjwow "bird"

P 23.262 2767 𗿵 *ɣa [a surname]

Middle of 𗍃 *·jiw [a place name]

Right side of 𗪙 *mur "vulgar"

P 24.122 2322 𘓠 *ɣa "sorrow"

Right side of 𗤶 *nji̱j "heart, mind"

Right side of 𗪆 *sjwɨ̱ "to think"

P 24.231 5911 𗈲 *khwa "far"

Left side of 𗈱 *rjar [a participle]

Middle of 𗈹 *sji̱ "to inspect"

Right side of 𗎘 *bju "border, side"

P 25.122 5969 𘖁 *tsha "empty bag"

Right side of 𗻍 *bu "reed-mace, cattail"

Middle (?) of 𗮺 *tsə̣ "lungs"

Middle of 𗍊 *sju "as, like"

P 26.151 2718 𗹼 *khiwa "kidney"

Whole of 𗹭 *bjij "high"

Right side of 𗥛 *rjɨr "bone"

P 27.241 0538 𗍘 *pja "butterfly"

Surrounding part of 𗍎 *pja "dark green"

Left side of 𗿼 *dźjwow "bird"

P 27.242 1272 𗍜 *pja "broad, shallow"

Left side of 𗍘 *pja "butterfly"

Right side of 𗼗 *djɨj "shallow"

P 28.121 3334 𗿦 *mja "female [of human or animal]"

Left side of 𗿽 *mja "a type of bird"

Right side of 𘓱 *me̱ "swallow" or *ŋwə "heaven, emperor"

P 28.122 2270 𗿽 *mja "a type of bird"

Left side of 𗿦 *mja "female"

Right side of 𗿼 *dźjwow "bird"

P 28.253 0618 𗉅 *tsja "hot"

Bottom of 𗜐 *mə̱ "fire"

Whole of 𗾔 *be "sun"

P 29.222 3698 𗿍 *śja̱ "a type of bird"

Left side of 𗿼 *dźjwow "bird"

Left or right side of 𗉋 *tśiow "to assemble"

Right side of 𗰛 *dzjịj "to cross, to pass"

P 30.141 3436 𗼍 *sa̱ "close relative"

Bottom of 𗒂 *njạ "marriage"

Left side of 𗿅 *·jɨ "marriage"

P 31.111 3672 𗿛 *bã "goose"

Left side of 𗿼 *dźjwow "bird"

Right side of 𗞢 *bã "tray"

P 32.111 3330 𗦌 *swã [a surname]

Left side of 𗤳 * [a surname]

Middle of 𗪆 *sjwɨ̱ "to think"

P 32.272 3311 𗿮 * "elder, senior"

Left side of 𗿒 *khwej "big"

Right side of 𗲟 * "ore"

P 35.111 1107 𘓱 *ŋwə "heaven, emperor"

Left side of 𘓺 *ŋwər "heaven"

Left side of 𗾈 *me̱ "virtuous person"

P 36.271 2811 𘚷 *ljɨ "round bone (?)"

Left side of 𘚶 *ljɨ "wind"

Right side of 𗥛 *rjɨr "bone"

P 36.273 2079 𗾓 *ljɨ "noon"

Left side of 𗿳 *dzjɨj "time"

Right side of 𘆂 *ljij "noon"

P 37.213 2454 𗿁 *phjɨ "to hear"

Left side of 𗾤 *ɣju "to ask, to call"

Right side of 𗣦 *śjwiw "to follow"

Bottom of 𗓁 *mji "to listen, to hear"

P 38.122 0826 𗱁 *thjɨ "to call, to speak"

Left side of 𗱌 *thu "to release"

Right side of 𗱃 *thjɨ "east, end"

P 39.112 2209 𗾲 *tshjɨ "name of a star"

Left side of 𗿘 *tshjɨ "a type of bird"

Right side of 𘛶 *tśjɨ̱r "star, constellation"

P 39.121 3127 𗿘 *tshjɨ "a type of bird"

Left side of 𗿼 *dźjwow "bird"

Right side of 𗅻 *tshjɨ "lamb"

P 39.152 3657 𗿅 *·jɨ "marriage"

Right side of 𗼍 *sa̱ "close relative"

Left side of 𗒂 *njạ "marriage"

P 39.251 3694 𗿬 *kjwɨ "turtledove"

Left side of 𗿼 *dźjwow "bird"

Whole of 𗯢 *gjwɨ "to cut, to break"

P 42.131 3171 𗿭 *mjɨ̱ "pheasant"

Left side of 𗿼 *dźjwow "bird"

Right side of 𗩫 *mjɨ̱ "woman"

P 42.251 2164 𗪆 *sjwɨ̱ "to think"

Left side of 𗤶 *nji̱j "heart, mind"

Middle of 𗾫 *sji̱j "thought"

Right side of 𗍊 *sju "as, like"

P 45.171 3297 𗿌 *tśjij "a type of bird"

Left side of 𗿼 *dźjwow "bird"

Right side of 𘓫 *tśjij [a surname]

P 50.262 3421 𗬴 *ləj "equal, even"

Middle of 𗅋 *mji "not"

Right side of 𗣫 *tsəj "small, little, young"

Left side of 𗿒 *khwej "big"

P 51.142 3647 𗿨 *kiwəj "cuckoo"

Left side of 𗿼 *dźjwow "bird"

Left side of 𗔤 *kiwe "dark"

P 53.242 3299 𘔳 *lwew "steam, smoke"

Left side of 𘔺 *khji "gas, steam"

Left side of 𗿉 *ɣju "smoke, mist"

P 54.271 3960 𘀉 *źjiw "bird"

Left side of 𘀐 *źjiw "six, sixth"

Left side of 𗿼 *dźjwow "bird"

P 55.142 2086 𗾝 *zji̱w "to hang"

Whole of 𗾆 *dzjiw "waist"

Right side of 𗭍 *dźjịj "to go, to send"

P 57.172 3906 𘀚 *tśhio "origin, source"

Left side of 𘀗 *tshjwu "sky, heaven"

Right side of 𘏨 *ljɨ̣ "treasure"

Middle of 𗿀 *tser "land, soil"

P 57.251 2759 𗾦 *tśjo "chaotic"

Right side of 𗊌 *nju "sweat"

Right side of 𗿎 *lew "confused"

P 57.272 2816 𗼝 *ljo "round bone"

Left side of 𗼕 *ljo "good fortune"

Right side of 𗥛 *rjɨr "bone"

P 62.162 5087 𗭴 *·jow [a surname]

Middle of 𗂽 *·jij "sheep"

Left side of 𗿼 *dźjwow "bird"

Right side of 𗿀 *tser "land, soil"

P 63.211 1569 𗹐 *twụ "loyal"

Left side of 𗹑 *tśjɨj "upright"

Left side of 𗾈 *me̱ "virtuous person"

P 65.271 2114 𗿶 *liẹj "crow"

Left side of 𗿼 *dźjwow "bird"

Whole of 𗰞 *nja̱ "black"

P 71.212 2742 𗾋 *tẹ "[bird] shit"

Left side of 𗿼 *dźjwow "bird"

Middle of 𗏡 *kụ "behind"

P 72.211 2268 𗿾 *·wjị "east, tail end"

Left side of 𗾔 *be "sun"

Whole of 𘙎 *lhji "to give birth to"

P 73.212 3298 𗿯 *djị "to tread"

Surrounding part of 𗿞 *djị "to mate"

Right side of 𘈷 *gji "son, child"

P 73.221 3312 𗿞 *djị "to mate"

Surrounding part of 𗿯 *djị "to tread"

Right side of 𗄬 *dzjɨj "sexual intercourse"

P 74.231 3633 𗿃 *də̣ "beautiful [of a bird?]"

Left side of 𗿼 *dźjwow "bird"

Left side of 𘕡 *zewr "graceful, elegant"

P 75.121 3348 𗻲 *tswə̣ "dung"

Left side of 𗺕 *kji̱ "grass"

Right side of 𗆑 *gja̱ "to swallow"

P 75.132 3612 𗿧 *tsə̣ "medicine"

Right side of 𗣣 *tshji "food"

Right side of 𗼫 *sju "medicine"

P 75.172 5655 𘏨 *ljɨ̣ "treasure"

Left side of 𘐱 *dew "true, real"

Right side of 𗾟 *wạ "vast, wide"

P 75.212 2140 𗿥 *·wjɨ̣ "old, aged"

Left side of 𗿦 *mja "female"

Right side of 𘒺 *nar "old"

P 76.141 1454 𘆃 *bjɨ̣ "gibbon"

Middle of 𘂶 *wjị "monkey"

Right side of 𘜶 *ljịj "big"

P 77.211 2087 𗾮 *zjɨ̣ "what time"

Left side of 𗿳 *dzjɨj "time"

Right side of 𗤄 *·jɨr "to ask, inquire"

P 79.151 3645 𗾌 *wejr "a type of bird"

Left side of 𗿼 *dźjwow "bird"

Left side of 𗯿 *wejr "flourishing"

P 81.272 3851 𘀕 *tser "spot"

Middle of 𗿀 *tser "land, soil"

Right side of 𗙾 *kiwəj "golden"

P 82.121 2107 𗿀 *tser "land, soil"

Middle of 𗦴 *me̱ "god, deity"

Right side of 𗼻 *ljɨ̣ "land, soil"

Right side of 𗼱 *dzjiw "land, soil"

P 82.212 3109 𗾏 *·wer "crane"

Left side of 𗿼 *dźjwow "bird"

Right side of 𘉤 *wer "to meet"

P 83.241 3151 𗪻 *mar "oath"

Left side of 𗥛 *rjɨr "bone"

Right side of 𗡔 *ŋwụ "to swear an oath"

P 88.122 3571 𗿑 *xwər "crane"

Left side of 𗿼 *dźjwow "bird"

Right side of 𗼄 *tśier "benefit, interest"

Right side of 𗩯 *sjwij "clear"

P 89.231 2778 𗥛 *rjɨr "bone"

Right side of 𗌄 *low "skeleton"

Right side of 𗧜 *lhu̱ "marrow"

P 90.231 2766 𗿇 *kjiwr "wild duck"

Left side of 𗿼 *dźjwow "bird"

Right side of 𗖬 *kjiwr "urgent"

P 91.162 2383 𗂖 *bowr "bag"

Middle of 𘅌 *bju "to crawl"

Whole of 𗾡 *bowr "bee"

P 91.171 2462 𗾡 *bowr "bee"

Left side of 𗿼 *dźjwow "bird"

Whole of 𘊏 *kjij "insect"

P 91.172 3608 𗿴 *bowr "woman's breast"

Whole of 𗾡 *bowr "bee"

Right side of 𗁮 *tśhji "flesh"

P 92.151 3302 𗩀 *kjwɨr "a type of bird"

Left side of 𗪌 *kjwɨ̱r "Xiongnu"

Left side of 𗿼 *dźjwow "bird"

P 92.211 2975 𘉀 *tsji̱r "an official"

Right side of 𘛅 *dzjɨ̣ "official title"

Left side of 𗿼 *dźjwow "bird"

P 92.241 2628 𗾖 *go̱r "male"

Right side of 𘜶 *ljịj "big"

Middle of 𗸱 *no "son"

P 93.111 3052 𗌜 *njo̱r "water, dew"

Left side of 𗋽 *zjɨ̱r "water"

Right side of 𘚖 *lwo "damp"

Right side of 𘌤 *djɨ̣ "ribbon"

Z 11.112 2801 𗧜 *lhu̱ "marrow"

Left side of 𗤶 *nji̱j "heart, mind"

Right side of 𗥛 *rjɨr "bone"

Z 12.111 2238 𗾐 *lhjwị [a surname]

Left side of 𗿊 *low "body"

Middle of 𗏥 *lju̱ [a surname]

Z 12.132 2240 𗿓 *lhə "a type of insect"

Right side of 𘝀 *phji̱ "to fly"

Right side of 𗿼 *dźjwow "bird"

Z 14.111 2655 𗾣 *dzjị "tall, high"

Left side of 𗿼 *dźjwow "bird"

Right side of 𘖎 *wjịj "short, brief"

Z 17.231 3126 𗿷 *dźjij "to have, to possess"

Right side of 𘜶 *ljịj "big"

Whole of 𘟣 *dju "to have, to possesss"

Z 18.242 1324 𗳥 *dźju̱ "to forbear"

Right side of 𘓯 *khjow "to give, to bestow"

Left side of 𗿒 *khwej "big"

Z 2.112 5420 𘍿 *n— "eagle"

Left side of 𘎃 *·we "bird"

Bottom of 𗔟 *n— [a surname]

Z 4.122 2669 𗿝 *dze "wild goose"

Left side of 𗿼 *dźjwow "bird"

Right side of 𗨜 *dze "longevity"

Z 4.232 3589 𗿳 *dzjɨj "time"

Left side of 𗾞 *njɨ̱ "sun, day"

Right side of 𗑝 *tsewr "section"

Z 5.251 1935 𘉂 *dzjɨj "ditch, moat"

Right side of 𗽂 *ɣew "trench"

Right side of 𘉀 *tsji̱r "an official"

Right side of 𗿀 *tser "land, soil"

Z 6.133 1329 𘞃 *dźjow "flag, banner"

Left side of 𘞄 *ljɨ̱ "flag"

Right side of 𘜶 *ljịj "big"

Z 7.242 3087 𗾆 *dzjiw "waist"

Bottom of 𘗴 *kjir "waist"

Z 7.251 2262 𗿼 *dźjwow "bird"

Left side of 𗿤 *dźjwow "mating"

Left side of 𘝋 *dzjwɨ "wing"

Z 7.252 2260 𗿤 *dźjwow "mating [of birds]"

Left side of 𗿼 *dźjwow "bird"

Right side of 𘟢 *·we "mating"

Z 8.162 3332 𗩳 *dźiwe "to pull, to drag"

Left side of 𗨟 *dź— "oblique, awry"

Right side of 𘆗 *śiə "to rotate"

Z 9.151 0774 𗙫 *·a [used for Sanskrit transcription]

Left side of 𗙏 *ɣiẹ "sound, noise"

Whole of 𗿢 *zur "edict, order"

Z 20.162 1387 𘗾 *lhjij "to sacrifice, to butcher"

Left side of 𘘀 *śji̱ "livestock"

Right side of 𗣣 *tshji "food"

The Chinese Radical model does not seem to apply to Tangut. In Chinese a particular radical has a single semantic determinative function, whereas for Tangut the same radical may have many different semantic functions depending upon the source character from which it is taken. Or, more accurately, a particular "radical" does not have an inherent semantic determinative function, but rather, its semantic function in any given character depends on the source character that the radical is derived from. This helps explain why there are so many characters with the 'person" radical 𘢌 (about 20% of all Tangut characters include this element; see Marc Miyake's How Many People are in the Tangut Script?) — this element does not have an inherent sense of "person" in the same way that the Chinese ⼈ *rén "person" radical has, but can mean almost anything depending upon the character that it is derived from.

It is possible to group characters with the 𘤊 radical into several different categories (as shown below), based on its source character as given in the Sea of Characters. By far the largest category of characters are those related to birds and flying, but as this is only one of several semantic categories covered by this radical, you cannot assume that any character with this radical is related to birds or flying (unlike Chinese, for which you can assume that almost any character with the ⿃ *niǎo "bird" or ⾶ *fēi "flying" radical is related to birds or flying respectively). In fact, of the 102 characters in the above table, only about half of them can be grouped into the semantic categories shown below. The other half are idiosyncratic, and have to be considered one at a time.

A. Characters Related to Birds and Flying (𘤊 = 𗿼 *dźjwow "bird")

Of the 32 characters in this category, 24 have the radical on the left, 6 on the right, and 3 in the middle (one character has the radical on the left and in the middle), so although the left-hand side is the most common position for the radical, it is not fixed, and may occur on the right-hand side or in the middle of the character as well.

  • 𗿼 *dźjwow "bird" = 𗿤 *dźjwow "mating" + 𘝋 *dzjwɨ "wing"
  • 𘝀 *phji̱ "to fly" = 𗿼 *dźjwow "bird" + 𘝋 *dzjwɨ "wing"
  • 𗿻 *ku "phoenix" = 𗿼 *dźjwow "bird" + 𘜶 *ljịj "big" + 𘎃 *·we "bird"
  • 𗾎 *kjwi "turtledove" = 𗿼 *dźjwow "bird" + 𗰰 *kjir [a surname]
  • 𘂕 *ta "swallow" = 𗿼 *dźjwow "bird" + 𗾙 *lew "little bird"
  • 𗿽 *mja "a type of bird" = 𗿼 *dźjwow "bird" + 𗿦 *mja "female"
  • 𗿍 *śja̱ "a type of bird" = 𗿼 *dźjwow "bird" + 𗉋 *tśiow "to assemble" + 𗰛 *dzjịj "to cross, to pass"
  • 𗿛 *bã "goose" = 𗿼 *dźjwow "bird" + 𗞢 *bã "tray"
  • 𗿘 *tshjɨ "a type of bird" = 𗿼 *dźjwow "bird" + 𗅻 *tshjɨ "lamb"
  • 𗿬 *kjwɨ "turtledove" = 𗿼 *dźjwow "bird" + 𗯢 *gjwɨ "to cut, to break"
  • 𗿭 *mjɨ̱ "pheasant" = 𗿼 *dźjwow "bird" + 𗩫 *mjɨ̱ "woman"
  • 𗿌 *tśjij "a type of bird" = 𗿼 *dźjwow "bird" + 𘓫 *tśjij [a surname]
  • 𗿨 *kiwəj "cuckoo" = 𗿼 *dźjwow "bird" + 𗔤 *kiwe "dark"
  • 𘀉 *źjiw "bird" = 𗿼 *dźjwow "bird" + 𘀐 *źjiw "six, sixth"
  • 𗿶 *liẹj "crow" = 𗿼 *dźjwow "bird" + 𗰞 *nja̱ "black"
  • 𗾌 *wejr "a type of bird" = 𗿼 *dźjwow "bird" + 𗯿 *wejr "flourishing"
  • 𗾏 *·wer "crane" = 𗿼 *dźjwow "bird" + 𘉤 *wer "to meet"
  • 𗿑 *xwər "crane" = 𗿼 *dźjwow "bird" + 𗼄 *tśier "benefit, interest" + 𗩯 *sjwij "clear"
  • 𗿇 *kjiwr "wild duck" = 𗿼 *dźjwow "bird" + 𗖬 *kjiwr "urgent"
  • 𗿝 *dze "wild goose" = 𗿼 *dźjwow "bird" + 𗨜 *dze "longevity"
  • 𗩀 *kjwɨr "a type of bird" = 𗿼 *dźjwow "bird" + 𗪌 *kjwɨ̱r "Xiongnu"
  • 𗍘 *pja "butterfly" = 𗿼 *dźjwow "bird" + 𗍎 *pja "dark green"
  • 𗾡 *bowr "bee" = 𗿼 *dźjwow "bird" + 𘊏 *kjij "insect"
  • 𗿓 *lhə "a type of [flying?] insect" = 𗿼 *dźjwow "bird" + 𘝀 *phji̱ "to fly"
  • 𗼍 *ɣu "god, supernatural being [that can fly?]" = 𗿼 *dźjwow "bird" + 𗼙 *ɣu "emperor"
  • 𗿃 *də̣ "beautiful [of a bird?]" = 𗿼 *dźjwow "bird" + 𘕡 *zewr "graceful, elegant"
  • 𗿤 *dźjwow "mating [of birds]" = 𗿼 *dźjwow "bird" + 𘟢 *·we "mating"
  • 𗾋 *tẹ "[bird] shit" = 𗿼 *dźjwow "bird" + 𗏡 *kụ "behind"
  • 𘉀 *tsji̱r "an official [a high-flyer?]" = 𗿼 *dźjwow "bird" + 𘛅 *dzjɨ̣ "official title"
  • 𗾣 *dzjị "tall, high" = 𗿼 *dźjwow "bird" + 𘖎 *wjịj "short, brief"
  • 𗿡 *·wẽ [a place name] = 𗿼 *dźjwow "bird" + 𗩇 *·wẽ [a surname]
  • 𗭴 *·jow [a surname] = 𗿼 *dźjwow "bird" + 𗂽 *·jij "sheep" + 𗿀 *tser "land, soil"

B. Characters Related to Bone (𘤊𘠢 = 𗥛 *rjɨr "bone")

As discussed in Part 1, modern Tangut dictionaries use systems of radical indexing that are based in arbitrary, artifical radicals. However, it should be possible to generate a list of natural radicals used in the Sea of Characters. The bone-related characters in this category all use the two right-hand elements of the character 𗥛 *rjɨr "bone", and so 𘤊𘠢 would form a single natural radical in a hypothetical Sea of Characters radical system.

  • 𗥛 *rjɨr "bone" = 𗌄 *low "skeleton" + 𗧜 *lhu̱ "marrow"
  • 𘚷 *ljɨ "round bone (?)" = 𗥛 *rjɨr "bone" + 𘚶 *ljɨ "wind"
  • 𗼝 *ljo "round bone" = 𗥛 *rjɨr "bone" + 𗼕 *ljo "good fortune"
  • 𗧜 *lhu̱ "marrow" = 𗥛 *rjɨr "bone" + 𗤶 *nji̱j "heart, mind"
  • 𗹼 *khiwa "kidney" = 𗥛 *rjɨr "bone" + 𗹭 *bjij "high"

C. Characters Related to Smoke and Steam (𘤊 = 𗿉 *ɣju "smoke, mist" or 𘔳 *lwew "steam, smoke")

  • 𗿉 *ɣju "smoke, mist" = 𘔳 *lwew "steam, smoke" + 𗞦 *kjur "to smoke sth."
  • 𗊌 *nju "sweat" = 𘔳 *lwew "steam, smoke" + 𗊻 *śjo "sweat"
  • 𘔳 *lwew "steam, smoke" = 𗿉 *ɣju "smoke, mist" + 𘔺 *khji "gas, steam"
  • 𗿺 *nju̱ "smoke" = 𗿉 *ɣju "smoke, mist" + 𘃠 *du̱ "to store"

D. Characters Related to Food (𘤊 = 𗣣 *tshji "food")

  • 𗣣 *tshji "food" = 𗢯 *lhjwa "tongue" + 𗮘 *śjwi "food"
  • 𗿧 *tsə̣ "medicine" = 𗣣 *tshji "food" + 𗼫 *sju "medicine"
  • 𘗾 *lhjij "to sacrifice, to butcher" = 𗣣 *tshji "food" + 𘘀 *śji̱ "livestock"

E. Characters Related to Time (𘤊 = 𗿳 *dzjɨj "time")

  • 𗿳 *dzjɨj "time" = 𗾞 *njɨ̱ "sun, day" + 𗑝 *tsewr "section"
  • 𗾓 *ljɨ "noon" = 𗿳 *dzjɨj "time" + 𘆂 *ljij "noon"
  • 𗾮 *zjɨ̣ "what time" = 𗿳 *dzjɨj "time" + 𗤄 *·jɨr "to ask, inquire"

F. Characters Related to Virtuousness (𘤊 = 𗾈 *me̱ "virtuous person")

  • 𘓱 *ŋwə "heaven, emperor" = 𘓺 *ŋwər "heaven" + 𗾈 *me̱ "virtuous person"
  • 𗹐 *twụ "loyal" = 𗹑 *tśjɨj "upright" + 𗾈 *me̱ "virtuous person"

G. Characters Related to Marriage (𘤊 = 𗼍 *sa̱ "close relative")

  • 𗼍 *sa̱ "close relative" = 𗒂 *njạ "marriage" + 𗿅 *·jɨ "marriage"
  • 𗿅 *·jɨ "marriage" = 𗼍 *sa̱ "close relative" + 𗒂 *njạ "marriage"

H. Characters Related to the Waist (𘤊 = 𗾆 *dzjiw "waist")

  • 𗾆 *dzjiw "waist" = 𘗴 *kjir "waist"
  • 𗾝 *zji̱w "to hang" = 𗾆 *dzjiw "waist" + 𗭍 *dźjịj "to go, to send"

Whilst we can classify some of the characters with the 𘤊 radical according to semantic categories, as shown above, we can more usefully classify characters according to the functions of the component elements that comprise each character.

A. Phonetic plus Semantic Constructions

These constructions are similar to Chinese Radical plus Phonetic constructions, but whereas Chinese phonetic elements usually have a narrow range of phonetic values, and the reading of an unknown character can often be guessed from its phonetic element, Tangut phonetic elements do not have a fixed phonetic value, but represent the phonetic value of the character from which the element is derived. Thus the element 𘤏 represents *źjiw in the character 𘀉, but represents *tser in the character 𘀕, and so if an unknown character were to include the element 𘤏, we could not guess what phonetic value it represented ... or even whether it had a phonetic function or a semantic function.

  • 𗿼 *dźjwow "bird" = 𗿤 *dźjwow "mating" + 𘝋 *dzjwɨ "wing"
  • 𗿽 *mja "a type of bird" = 𗿦 *mja "female" + 𗿼 *dźjwow "bird"
  • 𗾎 *kjwi "turtledove" = 𗰰 *kjir [a surname] + 𗿼 *dźjwow "bird"
  • 𗿛 *bã "goose" = 𗞢 *bã "tray" + 𗿼 *dźjwow "bird"
  • 𗿘 *tshjɨ "a type of bird" = 𗅻 *tshjɨ "lamb" + 𗿼 *dźjwow "bird"
  • 𗿬 *kjwɨ "turtledove" = 𗯢 *gjwɨ "to cut, to break" + 𗿼 *dźjwow "bird"
  • 𗿭 *mjɨ̱ "pheasant" = 𗩫 *mjɨ̱ "woman" + 𗿼 *dźjwow "bird"
  • 𗿌 *tśjij "a type of bird" = 𘓫 *tśjij [a surname] + 𗿼 *dźjwow "bird"
  • 𗿨 *kiwəj "cuckoo" = 𗔤 *kiwe "dark" + 𗿼 *dźjwow "bird"
  • 𘀉 *źjiw "bird" = 𘀐 *źjiw "six, sixth" + 𗿼 *dźjwow "bird"
  • 𗾌 *wejr "a type of bird" = 𗯿 *wejr "flourishing" + 𗿼 *dźjwow "bird"
  • 𗾏 *·wer "crane" = 𘉤 *wer "to meet" + 𗿼 *dźjwow "bird"
  • 𗿇 *kjiwr "wild duck" = 𗖬 *kjiwr "urgent" + 𗿼 *dźjwow "bird"
  • 𗿝 *dze "wild goose" = 𗨜 *dze "longevity" + 𗿼 *dźjwow "bird"
  • 𗩀 *kjwɨr "a type of bird" = 𗪌 *kjwɨ̱r "Xiongnu" + 𗿼 *dźjwow "bird"
  • 𗍘 *pja "butterfly" = 𗍎 *pja "dark green" + 𗿼 *dźjwow "bird"
  • 𗼍 *ɣu "god, supernatural being [that can fly?]" = 𗼙 *ɣu "emperor" + 𗿼 *dźjwow "bird"
  • 𗿤 *dźjwow "mating [of birds]" = 𗿼 *dźjwow "bird" + 𘟢 *·we "mating"
  • 𗿡 *·wẽ [a place name] = 𗩇 *·wẽ [a surname] + 𗿼 *dźjwow "bird"
  • 𘚷 *ljɨ "round bone (?)" = 𘚶 *ljɨ "wind" + 𗥛 *rjɨr "bone"
  • 𗼝 *ljo "round bone" = 𗼕 *ljo "good fortune" + 𗥛 *rjɨr "bone"
  • 𘍿 *n— "eagle" = 𗔟 *n— [a surname] + 𘎃 *·we "bird"
  • 𘛲 *gu̱ "to patrol" = 𘛯 *gu̱ [a surname] + 𘕂 *dźjij "to go"
  • 𗾸 *be "illness" = 𗾇 *be "mad" + 𗥓 *ŋo "disease"
  • 𘇭 *sjwi "to tie" = 𗝊 *sjwi "roof beam" + 𘌤 *djɨ̣ "ribbon"
  • 𗈹 *sji̱ "to inspect" = 𘕨 *sji̱ "to cry, to wail, to lament" + 𗈲 *khwa "far"
  • 𗍜 *pja "broad, shallow" = 𗍘 *pja "butterfly" + 𗼗 *djɨj "shallow"
  • 𗿦 *mja "female [of human or animal]" = 𗿽 *mja "a type of bird" + 𘓱 *me̱ "swallow" or *ŋwə "heaven, emperor" (?)
  • 𗿮 * "elder, senior" = 𗲟 * "ore" + 𗿒 *khwej "big"
  • 𗱁 *thjɨ "to call, to speak" = 𗱃 *thjɨ "east, end" + 𗱌 *thu "to release"
  • 𗾲 *tshjɨ "name of a star" = 𗿘 *tshjɨ "a type of bird" + 𘛶 *tśjɨ̱r "star, constellation"
  • 𗿯 *djị "to tread" = 𗿞 *djị "to mate" + 𘈷 *gji "son, child" (!)
  • 𗿞 *djị "to mate" = 𗿯 *djị "to tread" + 𗄬 *dzjɨj "sexual intercourse"
  • 𘀕 *tser "spot, mark [on a deer]" = 𗿀 *tser "land, soil" + 𗙾 *kiwəj "golden"
  • 𗂖 *bowr "bag" = 𗾡 *bowr "bee" + 𘅌 *bju "to crawl" (!)
  • 𗿴 *bowr "woman's breast" = 𗾡 *bowr "bee" + 𗁮 *tśhji "flesh"

B. Phonetic plus Phonetic Constructions

In some cases where a character his no intrinsic meaning, for example family or clan names, the character may be constructed from two homophonous phonetic elements.

  • 𘛯 *gu̱ [a surname] = 𘛴 *gu̱ "a spirit" + 𘛲 *gu̱ "to patrol"

C. Semantic Constructions

Many characters do not have a phonetic element at all, but comprise two or more elements with a semantic function, which taken together explain the meaning of the character, either directly (e.g. "black" + "bird" = "crow") or indirectly (e.g. "bird" + "wing" = "to fly"). In some cases the semantic elements may help us better understand the meaning of a character. For example, the known meaning of 𗿍 is only "a type of bird", but as its middle component elements comes from a character that means "to assemble", it is probable that the type of bird is one that is usually found in large flocks. As another example, the Lǐ Fànwén dictionary definition for the two characters 𗿷 *dźjij and 𘟣 *dju is the same (有 = "to have" or "to possess"), but the fact that the left side of 𗿷 is derived from the character 𘜶 *ljịj "big" suggests that it may mean something more like "to have much" or "to have everything", and so is subtly different in meaning to 𘟣 (this impression is strengthened by the fact that when reduplicated 𗿷 means "all, every").

  • 𘝀 *phji̱ "to fly" = 𗿼 *dźjwow "bird" + 𘝋 *dzjwɨ "wing"
  • 𗿻 *ku "phoenix" = 𘜶 *ljịj "big" + 𗿼 *dźjwow "bird" + 𘎃 *·we "bird"
  • 𘂕 *ta "swallow" = 𗿼 *dźjwow "bird" + 𗾙 *lew "little bird"
  • 𗿍 *śja̱ "a type of [flocking?] bird" = 𗿼 *dźjwow "bird" + 𗉋 *tśiow "to assemble" + 𗰛 *dzjịj "to cross, to pass"
  • 𗿶 *liẹj "crow" = 𗰞 *nja̱ "black" + 𗿼 *dźjwow "bird"
  • 𗾡 *bowr "bee" = 𗿼 *dźjwow "bird" + 𘊏 *kjij "insect"
  • 𗿓 *lhə "a type of [flying?] insect" = 𗿼 *dźjwow "bird" + 𘝀 *phji̱ "to fly"
  • 𗿃 *də̣ "beautiful [of a bird?]" = 𘕡 *zewr "graceful, elegant" + 𗿼 *dźjwow "bird"
  • 𗾋 *tẹ "[bird] shit" = 𗿼 *dźjwow "bird" + 𗏡 *kụ "behind, bottom"
  • 𘉀 *tsji̱r "an official [a high-flyer?]" = 𗿼 *dźjwow "bird" + 𘛅 *dzjɨ̣ "official title"
  • 𗧜 *lhu̱ "marrow" = 𗥛 *rjɨr "bone" + 𗤶 *nji̱j "heart, mind"
  • 𗊌 *nju "sweat" = 𘔳 *lwew "steam, smoke" + 𗊻 *śjo "sweat"
  • 𗣣 *tshji "food" = 𗢯 *lhjwa "tongue" + 𗮘 *śjwi "food"
  • 𗿧 *tsə̣ "medicine" = 𗣣 *tshji "food" + 𗼫 *sju "medicine"
  • 𘗾 *lhjij "to sacrifice, to butcher" = 𗣣 *tshji "food" + 𘘀 *śji̱ "livestock"
  • 𗿳 *dzjɨj "time" = 𗾞 *njɨ̱ "sun, day" + 𗑝 *tsewr "section"
  • 𗾓 *ljɨ "noon" = 𗿳 *dzjɨj "time" + 𘆂 *ljij "noon"
  • 𗾮 *zjɨ̣ "what time" = 𗿳 *dzjɨj "time" + 𗤄 *·jɨr "to ask, inquire"
  • 𘓱 *ŋwə "heaven, emperor" = 𘓺 *ŋwər "heaven" + 𗾈 *me̱ "virtuous person"
  • 𗹐 *twụ "loyal" = 𗹑 *tśjɨj "upright" + 𗾈 *me̱ "virtuous person"
  • 𗼍 *sa̱ "close relative" = 𗒂 *njạ "marriage" + 𗿅 *·jɨ "marriage"
  • 𗿅 *·jɨ "marriage" = 𗼍 *sa̱ "close relative" + 𗒂 *njạ "marriage"
  • 𗾝 *zji̱w "to hang" = 𗾆 *dzjiw "waist" + 𗭍 *dźjịj "to go, to send"
  • 𗾤 *ɣju "to ask, to call" = 𗿄 *khju "to request, to ask" + 𗄼 *lja "to come"
  • 𗗪 *kji̱ "commerce, trade" = 𗗥 *źjị "to buy and sell" + 𘒨 *phjij "to express oneself" + 𗍋 *khjɨ̱ "to gather"
  • 𘓠 *ɣa "sorrow" = 𗤶 *nji̱j "heart, mind" + 𗪆 *sjwɨ̱ "to think"
  • 𗈲 *khwa "far" = 𗈱 *rjar [a participle] + 𗈹 *sji̱ "to inspect" + 𗎘 *bju "border, side"
  • 𘖁 *tsha "empty bag" = 𗻍 *bu "reed-mace, cattail" + 𗮺 *tsə̣ "lungs" + 𗍊 *sju "as, like"
  • 𗉅 *tsja "hot" = 𗜐 *mə̱ "fire" + 𗾔 *be "sun"
  • 𗿁 *phjɨ "to hear" = 𗾤 *ɣju "to ask, to call" + 𗣦 *śjwiw "to follow" + 𗓁 *mji "to listen, to hear"
  • 𗪆 *sjwɨ̱ "to think" = 𗤶 *nji̱j "heart, mind" + 𗾫 *sji̱j "thought" + 𗍊 *sju "as, like"
  • 𗬴 *ləj "equal, even" = 𗅋 *mji "not" + 𗣫 *tsəj "small, little, young" + 𗿒 *khwej "big"
  • 𗿾 *·wjị "east, tail end" = 𘙎 *lhji "to give birth to" + 𗾔 *be "sun"
  • 𗻲 *tswə̣ "dung" = 𗺕 *kji̱ "grass" + 𗆑 *gja̱ "to swallow"
  • 𗿥 *·wjɨ̣ "old, aged [of a woman]" = 𗿦 *mja "female" + 𘒺 *nar "old"
  • 𘆃 *bjɨ̣ "gibbon" = 𘜶 *ljịj "big" + 𘂶 *wjị "monkey"
  • 𗾖 *go̱r "male" = 𘜶 *ljịj "big" + 𗸱 *no "son"
  • 𗿷 *dźjij "to have, to possess [a lot?]" = 𘜶 *ljịj "big" + 𘟣 *dju "to have, to possesss"
  • 𗳥 *dźju̱ "to forbear" = 𘓯 *khjow "to give, to bestow" + 𗿒 *khwej "big"
  • 𘉂 *dzjɨj "ditch, moat" = 𗽂 *ɣew "trench" + 𘉀 *tsji̱r "an official" + 𗿀 *tser "land, soil"
  • 𘞃 *dźjow "flag, banner" = 𘜶 *ljịj "big" + 𘞄 *ljɨ̱ "flag"
  • 𗩳 *dźiwe "to pull, to drag" = 𗨟 *dź— "oblique, awry" + 𘆗 *śiə "to rotate"

C. Synonym Constructions

Some characters are composed from elements from one or more characters with the same or very similar meaning to itself.

  • 𗾆 *dzjiw "waist" = 𘗴 *kjir "waist"
  • 𗥛 *rjɨr "bone" = 𗌄 *low "skeleton" + 𗧜 *lhu̱ "marrow"
  • 𗿉 *ɣju "smoke, mist" = 𘔳 *lwew "steam, smoke" + 𗞦 *kjur "to smoke sth."
  • 𘔳 *lwew "steam, smoke" = 𗿉 *ɣju "smoke, mist" + 𘔺 *khji "gas, steam"
  • 𗿜 *tśhji "shame, disgrace" = 𗾹 *tshwu "shame, disgrace" + 𗼊 *sew "shy, bashful" + 𗏣 *ljijr "direction"
  • 𗿀 *tser "land, soil" = 𗼻 *ljɨ̣ "land, soil" + 𗼱 *dzjiw "land, soil" + 𗦴 *me̱ "god, deity"
  • 𗌜 *njo̱r "water, dew" = 𗋽 *zjɨ̱r "water" + 𘚖 *lwo "damp" + 𘌤 *djɨ̣ "ribbon"

D. Obscure Constructions

Some constructions are difficult to understand, especially family names and place names. Do we lose something in translation ? Or maybe textual corruption in the Sea of Characters has resulted in the correct source character being replaced by a similar but different character ?

  • 𘃷 *nju [a surname] = 𗨛 *rjɨr "to go out, to give birth" + 𗍊 *sju "as, like"
  • 𗭴 *·jow [a surname] = 𗿼 *dźjwow "bird" + 𗂽 *·jij "sheep" + 𗿀 *tser "land, soil"
  • 𗿵 *ɣa [a surname] = 𗍃 *·jiw [a place name] + 𗪙 *mur "vulgar"
  • 𗦌 *swã [a surname] = 𗤳 * [a surname] + 𗪆 *sjwɨ̱ "to think"
  • 𗾐 *lhjwị [a surname] = 𗿊 *low "body" + 𗏥 *lju̱ [a surname]
  • 𗙫 *·a [used for Sanskrit transcription] = 𗙏 *ɣiẹ "sound, noise" + 𗿢 *zur "edict, order"
  • 𗿑 *xwər "crane" = 𗿼 *dźjwow "bird" + 𗼄 *tśier "benefit, interest" + 𗩯 *sjwij "clear"
  • 𗾣 *dzjị "tall, high" = 𗿼 *dźjwow "bird" + 𘖎 *wjịj "short, brief"
  • 𗿺 *nju̱ "smoke" = 𗿉 *ɣju "smoke, mist" + 𘃠 *du̱ "to store"
  • 𗹼 *khiwa "kidney" = 𗥛 *rjɨr "bone" + 𗹭 *bjij "high"
  • 𗾶 *xju "empty" = 𗾙 *lew "little bird" + 𗥪 *rjɨj "to teach"
  • 𗾇 *be "mad" = 𗾷 *dzjị "owlet" + 𗕶 *ɣạ "crazy"
  • 𘀚 *tśhio "origin, source" = 𘀗 *tshjwu "sky, heaven" + 𘏨 *ljɨ̣ "treasure" + 𗿀 *tser "land, soil"
  • 𗾦 *tśjo "chaotic" = 𗊌 *nju "sweat" + 𗿎 *lew "confused"
  • 𗪻 *mar "oath" = 𗥛 *rjɨr "bone" + 𗡔 *ŋwụ "to swear an oath"
  • 𘏨 *ljɨ̣ "treasure" = 𘐱 *dew "true, real" + 𗾟 *wạ "vast, wide"

Based on this study of a single radical, it seems to me that Tangut does have "radicals", but that they are very different to Chinese radicals. Firstly, each component element of a Tangut character is a radical, so most Tangut characters have two or three radicals. Secondly, Tangut radicals do not have a fixed semantic meaning or phonetic value, but are used to connect one character to another character, so that each character is related through its radicals to two or three other characters, forming a network of interrelated characters. However, as the source character for any given radical is not explicit, but has to be looked up in the Sea of Characters (or some other long lost Tangut reference book), the radicals cannot be used to guess the meaning or pronunciation of an unknown character. At best — and this is perhaps their original intent — radicals can be used as mnemonic devices to help the learner read and write Tangut characters.

2.3 Characters with Unitary Composition

The vast majority of Tangut characters are composed of at least two distinct components, and their compositional analysis in the Sea of Characters describes them as the product of two or more other characters. However, there are a few characters with a unitary composition (30-40, depending upon how you count them), for example the character 𗾆 "waist", which is composed only of the radical 𘤊. In the Sea of Characters, the composition of this and other unitary characters with head entries is described in subtractive terms as deriving from a more complex element with one part removed :

  • 𗢨 *dzjwo "person" (Z 15.221) = top removed from 𘑘 *śji "celestial being"
    • 𘑘 *śji "celestial being" (P 15.212) = top of 𘑗 *ŋər "hill, mountain" and whole of 𗢨 *dzjwo "person"
  • 𗸕 *khwə "half" (P 34.271) = top of 𘓳 *ŋowr "whole" and right of 𘏄 *tjị "to get rid of"
  • 𗾆 *dzjiw "waist" (Z 7.242) = top removed from 𘗴 *kjir "waist"
  • 𘂆 *tsjɨ "little" (P 38.261) = left removed from 𗀹 *zji "little, young [bird or animal]"
    • 𗀹 *zji "little, young [bird or animal]" (P 18.151) = right of 𗏑 *no̱ "weak" and left of 𘂎 *la "small"
  • 𘂪 *dzjij "single" (Z 15.172) = [left of] 𗅋 *mji "not" removed from 𗄴 *twe̱ "pair, couple"
  • 𘇰 *tśhji "old" (P 21.171) = [left of] 𗺕 *kji̱ "grass" removed from 𘇵 *tśhjɨ "reed-mace, cattail"
  • 𘉋 *·jar "eight" (P 85.232) = top removed from 𗒹 *śjạ "seven"
    • 𗒹 *śjạ "seven" (P 70.171) = uses 𗑗 *sej "clean" (!) and the whole of 𘉋 *·jar "eight"
  • 𘉟 *pjụ "to compel" (P 65.112) = top [left] of 𘉡 *pjụ "power" and right of 𘉐 *·iow "contribution, achievement"
    • 𘉡 *pjụ "power" (P 64.273) = whole of 𘉟 *pjụ "to compel" and left of 𘉍 *bji "bright"
    • 𘉐 *·iow "contribution, achievement" (P 63.111) = right of 𗤓 *thjo̱ "beautiful" and left of 𗵺 *wạ "to win"
  • 𘌢 *zu "belt, band" (P 6.243) = top [right] of 𗥕 *zu "to tie up" and bottom [left] of 𘌥 *bej "to tie up"
    • 𘌥 *bej "to tie up" (P 43.241) = right of 𗥕 *zu "to tie up" and left of 𘏛 *bej "rope"
  • 𘔧 *gjụ "seat, post, stick" (P 64.251) = top [right] of 𗹢 *djọ "to build" and bottom [right] of 𗥍 *gjụ "post, pillar, seat"
  • 𘗤 *tsjɨ̱r "fifth son" (P 92.111) = left removed from 𗉨 *tśjɨ̱r "five"
  • 𘘤 *dźjɨ "skin" (Z 6.171) = [whole of] 𗼻 *ljɨ̣ "land" removed from 𗽸 *lhjɨ̱ "epidermis"
    • 𗽸 *lhjɨ̱ "epidermis" (P 41.262) = [left of] 𗽖 *tshji "east", [right of] 𗹨 *·jɨj "tent, building" and whole of 𘘤 *dźjɨ "skin"
  • 𘙬 *tow "insect" (P 59.243) = left removed from 𗏚 *bjij "dung beetle"
  • 𘝣 *niəj "turbid, muddy" (P 51.131) = made from 𗒊 (𗇻) *niəj "dirt" with the top removed
  • 𘞆 *bji "thin" (P 16.222) = is the left side of 𗻬 *źjɨ̣r "thin"

Most of these characters are described using the formula "[component] X of [character] Y removed", and would seem to imply that the simpler unitary character is derived from the more complex character. However, in most of the cases where the source character also has a head entry in the Sea of Characters, we find that there is a circular derivation, for example the character 𗢨 "person" is stated to derive from the character 𘑘 "celestial being" by the removal of its top component, but 𘑘 "celestial being" is stated to derive from the character 𗢨 "person" plus the top of the character 𘑗 *ŋər "mountain". The latter analysis makes a lot of sense as it parallels the construction of the corresponding Chinese character, 仙 xiān "celestial being", which is constructed from the character 人 rén "person" plus the character 山 shān "mountain". On the other hand there is no obvious explanation why the Tangut character for "person" should be derived from "celestial being" minus the "mountain" top. As another example, the character 𘂪 *dzjij "single" is stated to derive from the character 𗄴 *twe̱ "pair, couple" with the left side removed, where the left side of 𗄴 is defined as the left side of the character 𗅋 *mji "not". This is a very confused definition ("single" = "pair" minus "not"), but from it we can reconstruct a very sensible compositional definition for the character 𗄴 *twe̱ "pair, couple" (which does not have a head entry in the Sea of Characters) : 𗄴 *twe̱ "pair, couple" = left side of 𗅋 *mji "not" plus whole of 𘂪 *dzjij "single" (i.e. "pair, couple" = "not" + "single"). In at least these two cases, it seems to me that the compositional definition of the unitary character is simply the inverse of the compositional definition of a complex character that is composed from the unitary character. Other compositional descriptions of unitary characters are also very contrived and unbelievable, such as those for the characters 𘌢 *zu "belt, band" and 𘔧 *gjụ "seat, post, stick", which in both cases derive the unitary character from two halves of the component corresponding to this unitary character in two different complex characters. Thus, I think that unitary characters are not secondary derivations from complex characters as implied by their analysis in the Sea of Characters, but that the compositional definitions of unitary characters are secondary back-formations from the compositional definitions of other characters, perhaps simply intended as mnemonics rather than as a true analysis of the derivation of such characters.

2.4 Characters with Circular Composition

The compositional analysis in the Sea of Characters often includes pairs of characters with circular derivations, as for example the following two examples :

  • Left side of 𗿼 *dźjwow "bird" ⇔ left side of 𗿤 *dźjwow "mating [of birds]"
  • Top and left side of 𗿞 *djị "to mate" ⇔ top and left side of 𗿯 *djị "to tread"

These circular constructions are disturbing, as they appear to suggest that the Sea of Characters analyses are flawed.

2.5 Mapping the Web of Characters

In conclusion, the compositional analysis of Tangut characters given in the Sea of Characters does help to understand how characters were constructed, and I think that in most cases the analyses are correct, and do reflect the mechanisms by which the characters were created. However, there are problems with the analyses of unitary characters, and for many characters with circular derivations, which suggest that the analyses in the Sea of Characters cannot be relied on uncritically.

To better understand the mechanisms by which Tangut characters were created, it would be necessary to fully map out the structural relationships between all the characters in the Sea of Characters. This would not only allow us to better understand the process by which the Tangut script was created, but it might also enable us to better understand the meanings of individual characters, or in some cases even to reconstruct or correct the pronunciations of characters.

Appendix I: Note on the Positional Forms of Components

Many or most Tangut components are not positionally fixed, but can occur at the left, middle, right, top or bottom of a character. For example the most common Tangut component, 𘢌 (Nishida's 'person' radical) occurs in all these positions :

  • L2603 𗣦 (left)
  • L4593 𗡋 (middle)
  • L2386 𗁑 (right)
  • L1883 𗬌 (top)
  • L1148 𗕌 (bottom)
  • L3779 𗫾 (left and under)

The 'person' component takes basically the same form in whatever position it occurs, but some components have slightly different forms for the left hand side (or middle) and for the right hand side, with the right hand form often having a bent vertical stroke and sometimes with an additional short slanting stroke at the end :

  • L1273 𘄟 (left hand form = 𘤒) and L5599 𘂱 (right hand form = 𘤸)
  • L5064 𘁗 (left hand form = 𘤜) and L3569 𗤣 (right hand form = 𘦙)
  • L3940 𘞣 (left hand form = 𘫉) and L4447 𘜹 (right hand form = 𘫊)
  • L5629 𗰈 (middle form = 𘠌) and L5167 𗰆 (right hand form = 𘠴)

In some cases it is difficult to see that different components are actually different positional forms of the same basic component, and it is only by studying the compositional analysis in the Sea of Characters that we are able equate seemingly different components. For example, Nishida's 'water' radical 𘠣 occurs in the form 𘠅 on the right hand side of a character, and in the form 𘡍 on top of a character (see Marc Miyake's Which Way Water?), as demonstrated by the Sea of Characters analyses of L2931 and L5809 :

  • L2931 𗕆 = left hand side of L2699 𗄻 + middle of L3898 𘀼 + left hand side of L2414 𗋒
  • L5809 𗕆 = left hand side of L3073 𗊉 + bottom of L4845 𗒿

[Revised: 2010-05-03 and 2010-05-21]

Last modified: 2017-01-01 (updated with Unicode Tangut characters)

If Tangut characters do not display correctly, please download and install the Tangut Yinchuan font.