Monday, 29 August 2011

The Myth of the Tangut Ritual Language

I have previously discussed how complex the Tangut script is, and how Tangut characters are constructed like interconnected jigsaw puzzles using a method of character construction that I believe is not used for any other known writing system. But it is not only the Tangut script that is difficult and mysterious; the Tangut language is also hard to fathom, and has features that are difficult to explain. One of the most puzzling features of the Tangut language is that the surviving monolingual Tangut dictionaries define about twice as many characters than are actually used in almost all extant Tangut texts; that it is to say about half of the approximately 6,000 Tangut characters defined in dictionaries do not seem to be needed for writing actual Tangut text, but appear to have been designed solely with the intention of teasing future generations of linguists.

However, in 1986 (see bibliography at the bottom of the post) the Japanese Tangutologist Nishida Tatsuo noted that some of the unused characters were to be found in use in several ritual poems or odes (five odes preserved in a woodblock print held at the Institute of Oriental Studies in St Petersburg [Tang. 125, No. 121]). In these poems each line was written twice, once using ordinary Tangut vocabulary (Nishida's "Vocabulary II") and once using an unusual vocabulary that in many cases was constructed from the otherwise unused characters in the Tangut dictionaries (Nishida's "Vocabulary I"). Nishida hypothesised that Vocabulary I represents the language of the "black-headed people" who he considers were a nomadic people that formed the ruling class of Tangut society; and that Vocabulary II represents the langauge of the "red-faced people" who he considers were a sedentary, agricultural people that formed the bulk of Tangut society. In other words, Vocabulary I may represent a linguistic substratum that was only preserved in the odes of the ruling class.

In 1996 the Russian Tangutologist Ksenia Kepping formulated the terms "common language" and "ritual language" to refer to these two forms of vocabulary, suggesting that the ritual language was an artificial language (lacking grammatical morphemes) created by Tangut shamans in ancient times, before the adoption of Buddhism, for ritual purposes. She supposes that this ancient ritual language only survived in writing in a few ancient ritual odes.

Recently Marc Miyake has been discussing Ritual Tangut [RT] and Common Tangut [CT] in a series of blog posts about the Tangut words for "camel", which has encouraged me to take a closer look at the one ritual ode that I have the text for (given in Nishida 1997), the Ode on Monthly Pleasures (translated into Chinese as 月月樂詩). My preliminary study of the text of this ode has caused me to doubt the theory that it was originally written in a special ritual language, and although I haven't yet gone through the entire Tangut text of the ode, I think that there is some value in sharing my initial impressions.

My first observation is that the "common language" version of the ode does not seem to me to be a translation of the "ritual language" version as Kepping suggests (Kepping 1996 page 28). Rather, I would suggest that the "common language" version represents the original text of the ode, and that the "ritual language" version is a gloss on this version, similar in nature to the interlinear vernacular glosses in some medieval Latin manuscripts. For example, in the 8th century Vespasian Psalter the Latin text of Psalm 67:2, exsurgat Deus et dissipentur inimici eius et fugiant qui oderunt eum a facie eius ["Let God arise, let his enemies be scattered: let them also that hate him flee before him"], is glossed word for word, in Latin word order, in Old English as a-rise god ⁊ sien to-strogdne feond his ⁊ flen from on-siene his ða fiodun hine. The resultant Old English is unnatural in a similar way that the Tangut "ritual language" appears unnatural and grammatically awkward compared with the common Tangut language.

In the only extant edition of the Odes, for each line of the ode the "ritual language" text precedes the "common language" text, which gives the impression that the "ritual language" text is the primary version of the ode, and the "common language" text is a secondary or derived version of the text, which on the surface would seem to argue against my gloss theory. However, this layout may simply reflect the fact that the "ritual language" interlinear gloss was originally written in small-sized characters to the right of the main "common language" text (in the same way that interlinear annotations to Chinese novels are written), and that during the course of textual transmission the small-sized interlinear glosses were transformed into ordinary-sized text running parallel and to the right (i.e. preceding) of the main text.

At this point it is worth take a detailed look at a couple of lines from the Ode on Monthly Pleasures. My first example is the first line of the ode, the first of two introductory lines that precede the verse relating to the First Month.

Ode on Monthly Pleasures Introduction line 1B (common language)
L2511L2511rjɨr"to arise"

Ode on Monthly Pleasures Introduction line 1A (ritual language)
L1846L1846ka"month" [RT]
L5288L5288khjij"happy" [CT]
L2480L2480bie̱j"entertainment" [CT]
L2750L2750ɣu"head", "start", [CT]
L2082L2082·jɨr"to ask" [CT]

The "common language" version of this opening line reads perfectly naturally, and can be translated as "How did the celebrations for each month arise?", referring to the various seasonal activities that take place throughout the calendar year. On the other hand, the "ritual language" version is awkward and difficult to translate without filling in some missing gaps: "[Someone] asks [what] the origin [of the] happy entertainment [for each] month [is]". The ordinary Common Tangut verb "to ask" in "ritual language" text corresponding to questions in the "common language" version has been taken as a ritual interrogative marker, but I think it is simpler to take it as an explanatory gloss on the "common language" question: "[the text] asks". Likewise, instead of trying to form the string of preceding nouns into a coherent sentence, I find it more reasonable to see them as a sequence of disconnected glosses on the corresponding words in the "common language" text: "month [means] month"; "entertainment [means] happy entertainment"; "origin [means] start"; "how did it arise asks [a question]".

This line also illustrates an unexpected feature of the text, namely that the "ritual language" version of the ode is not exclusively composed of "ritual" vocabulary, but also includes a great deal of ordinary Tangut vocabulary. In this line only the first word, ka "month", can be considered to be a ritual word; all the other words in the "ritual language" version of this line are ordinary Tangut words that also occur in ordinary Tangut texts. Thus the "ritual language" version of the ode cannot be said to be written in a special ritual language, merely that it includes certain vocabulary items that are not found in other Tangut texts.

My second example is the first line of the verse relating the the third lunar month.

Ode on Monthly Pleasures Month 3 line 1B (common language)
L3911L3911pu"pigeon" (borrowing from Chinese 鵓[鴿] bó[gē])
L4176L4176tju"turtledove" (borrowing from Chinese 鳩[鴿] jiū[gē])
L0795L0795rjɨradverbal prefix
L4519L4519bji"to call"
L0140L0140lhejr"peaceful and happy"

Ode on Monthly Pleasures Month 3 line 1A (ritual language)
L4344L4344lhejr"three" [RT]
L1846L1846ka"month" [RT]
L0673L0673thə"wing" [CT]
L5598L5598gjwi"clothes" [CT]
L4027L4027njɨ̱"two" [CT]
L5932L5932"kind", "variety" [CT]
L4246L4246lhejr"woods" [RT] (L2769 phjo "divination" may be a mistake for L3890 bo which forms a collocation with L4246 meaning "woods")
L4980L4980tśhji̱w"to speak" [CT]
L3092L3092djijadverbal suffix [CT]
L2029L2029low"country" [CT]
L1529L1529lụ"happy" [RT]

The "common language" version translates as "In the third month the pigeons and turtledoves call among the trees, and the country is at peace". The "ritual language" version has a couple of interesting features that are worth noting.

The two specific bird names in the main text (pigeons and turtledoves) are glossed as "two kinds of birds", where "birds" is represented by a kenning of two ordinary Tangut words ("wing-clothes" = "bird") and "two kinds" is written with ordinary Tangut characters, using the ordinary Tangut word for "two" rather than the "ritual language" word for "two". It makes sense to imagine that the glossist could not think of any of other words for these two specific bird names, and so simply glossed them as "two kinds of birds"; but on the other hand it is not plausible that "pigeons and turtledoves" is a translation of "two kinds of birds", which is strong evidence against Kepping's theory that the "common language" version is a translation of an original "ritual language" text.

Kepping has noted that the "ritual language" version lacks grammatical morphemes, which is generally true, and to be expected if the "ritual language" version comprises glosses on the words of the "common language" version; but, as can be seen in this example, grammatical morphemes are sometimes used in the glosses. The adverbal prefix rjɨr in the "common language" version corresponds to an adverbal suffix djij in the "ritual language" version. Both are ordinary Tangut words, and both function to indicate a continuative mode (the birds in the woods called continuously, not just once).

Kepping has also noted that the "ritual language" version favours two-syllable words over one-syllable words (other than verbs) in the "common language" text (Kepping 1996 page 27), and this does seem to be generally true, although, as can be seen from both the above examples, single-syllable nouns are not uncommon in the "ritual language" version. My explanation for the greater number disyllabic words in the "ritual language" version is simply that two-syllable words are less ambiguous and thus provide a more certain gloss. This explains the use of two-syllable "ritual language" words that are composed of two ordinary Tangut words (e.g. "happy entertainment" as a gloss for "entertainment"), but it does not explain how words that are unique to the "ritual language" text and which cannot be understood by reference to ordinary Tangut words were able to be understood by readers as glosses on the corresponding ordinary Tangut word. For example, how would a Tangut reader know that ka ·o means "moon", and why would this word be more understandable than the ordinary word for "month"? Indeed, if "ritual language" words such as this had such extremely restricted usage, and were only very rarely encountered, how would anyone ever even learn to read them in the first place?

I have doubts that "ritual language" words such as ka ·o are the remnants of a linguistic substratum or that they represent an artificial ritual language, but I do have an alternative theory to explain this otherwise unattested vocabulary. I think it is possible that they are in fact archaic words from a culturally important and universally known Tangut text (something equivalent to the Bible in European culture) that we do not know about simply because no copies of it have survived. A text such as this could have been required reading for all Tangut students, and its archaisms could have been as familiar to the Tangut people as biblical or Shakespearean expressions are to us. If this were the case, an otherwise unattested word such as ka ·o for "moon" could have been an obvious and unambiguous gloss for an ordinary, modern Tangut word. Compared with Khitan and Jurchen, a wealth of Tangut manuscripts and printed texts have survived down to the present day, but this just highlights how much more must have been lost forever. We study Tangut through the distorted prism of the sands of Kharakhoto, only able to see a fraction of all the Tangut books that there must once have been. One lost book is all it takes to confuse us into believing in the myth of an ancient ritual language.

Tangut Numbers

For the rest of this post I am going to look in more detail at one particular category of words used in the Ode on Monthly Pleasures, the numbers one to ten (see also Nishida 1997 pages 141–145 where he covers the same topic). Tangut numbers are relatively easy, and are clearly cognate to numbers in other Tibeto-Burman languages such as Nuosu (Liangshan Yi).

Tangut Numbers 1-10
Number Tangut LFW No. Reconstruction Nuosu Notes
1L0100L0100lew cyp [ʦhɿ²¹]L4855 L4855 dzjij "single" is probably the TB cognate for "one"
2L4027L4027njɨ̱ nyip [ȵi²¹] 
3L5865L5865sọ suo [sɔ³³] 
4L2205L2205ljɨr ly [lɿ³³] 
5L1999L1999ŋwə nge [ŋɯ³³] 
6L3200L3200tśhjiw fut [fu⁵⁵]Nuosu [fu⁵⁵] is anomalous; other Yi dialects have e.g. [tɕhɔ¹³]
7L4778L4778śjạ shyp [ʂɿ²¹] 
8L4602L4602·jar hxit [hi⁵⁵] 
9L3113L3113gjɨ̱ ggu [gu³³] 
10L1084L1084ɣạ ci [ʦhi³³]L3231 L3231 dźjɨ̣ "ten" is probably the TB cognate for "ten"

The ordinary month names are equally unproblematic, composed of a number followed by the word lhjị "month" (cf. Nuosu ꆪ hlep [ɬɯ²¹] "moon", "month"), but with special words for the first and last lunar month.

Ordinary Tangut Month Names
L2105L2814L2105/L2814tśjow lhjịfirst lunar month (正月)
L4027L2814L4027/L2814njɨ̱ lhjịsecond lunar month (二月)
L5865L2814L5865/L2814sọ lhjịthird lunar month (三月)
L2205L2814L2205/L2814ljɨr lhjịfourth lunar month (四月)
L1999L2814L1999/L2814ŋwə lhjịfifth lunar month (五月)
L3200L2814L3200/L2814tśhjiw lhjịsixth lunar month (六月)
L4778L2814L4778/L2814śjạ lhjịseventh lunar month (七月)
L4602L2814L4602/L2814·jar lhjịeighth lunar month (八月)
L3113L2814L3113/L2814gjɨ̱ lhjịninth lunar month (九月)
L1084L2814L1084/L2814ɣạ lhjịtenth lunar month (十月)
L1084L0100L2814L1084/L0100/L2814ɣạ lew lhjịeleventh lunar month (十一月)
L4082L2814L4082/L2814rejr lhjịtwelfth lunar month (臘月)

These are the month names that are used in the "common language" version of the Ode on Monthly Pleasures, but the corresponding month names given in the "ritual language" version are very different, using a different, two-syllable word, for "month" (ka ·o), and a different set of numbers.

Special Tangut Month Names
L3305L3457L1846L0863L3305/L3457/L1846/L0863kjiw sjiw ka ·o"new year month" = first lunar month
L0795L5855L1846L0863L0795/L5855/L1846/L0863rjɨr lọ ka ·osecond lunar month
L4344L5565L1846L0863L4344/L5565/L1846/L0863lhejr gju ka ·othird lunar month
L1341L4362L1846L0863L1341/L4362/L1846/L0863kwej ŋwər ka ·ofourth lunar month
L1783L1615L1846L0863L1783/L1615/L1846/L0863tśjɨ̱r lu ka ·ofifth lunar month
L3849L5081L1846L0863L3849/L5081/L1846/L0863źjiw we ka ·osixth lunar month
L0332L1347L1846L0863L0332/L1347/L1846/L0863ŋwər kạ ka ·oseventh lunar month
L4027L2205L1846L0863L4027/L2205/L1846/L0863njɨ̱ ljɨr ka ·o"two four month" = eighth lunar month
L2205L1999L1846L0863L2205/L1999/L1846/L0863ljɨr ŋwə ka ·o"four five month" = ninth lunar month
L4027L1999L1846L0863L4027/L1999/L1846/L0863njɨ̱ ŋwə ka ·o"two five month" = tenth lunar month
L1999L3200L1846L0863L1999/L3200/L1846/L0863ŋwə tśhjiw ka ·o"five six month" = eleventh lunar month
L0804L4051L1846L0863L0804/L4051/L1846/L0863djɨ kjiwr ka ·o"cold month" = twelfth lunar month

The first and last lunar month have special names ("new year month" and "cold month" respectively, written with the ordinary Tangut words for "new", "year" and "cold"), and for the 8th through 11th months combinations of two ordinary Tangut numbers are used to represent the numbers eight through eleven disyllabically ("two [times] four" for "eight"; "four [plus] five" for "nine"; "two [times] five" for "ten"; "five [plus] six" for "eleven"). The remaining months, 2nd through 7th, prefix the word "month" (ka ·o) with a special set of disyllabic numbers that are (as far as I can tell) unique to the odes. Elsewhere in the odes, cardinal numbers are sometimes written using the ordinary ("common language") Tangut number characters (as seen in the first line of the verse for the 3rd Month, given an example above), and they are sometimes written with the special disyllabic words (e.g. lhejr gju for "three" in the 6th line of the verse for the 8th Month; and njɨ̱ ŋwə [2×5] for "ten" in the 7th line of the verse for the 6th Month).

The special numbers for two through eight used in the odes are paralleled by special ordinal numbers used (I'm not sure where, in one of the other odes perhaps) to indicate the relative seniority of sons, where the "common language" character ·jiw meaning "man" or "son" is prefixed by a single character to form a disyllabic word meaning "eldest son" through "eighth son".

Special Words for Sons
L2645L1448L2645/L1448da ·jiweldest son
L5914L1448L5914/L1448lọ ·jiwsecond son
L2465L1448L2465/L1448rjɨj ·jiwthird son
L4934L1448L4934/L1448ŋwər ·jiwfourth son
L5053L1448L5053/L1448tsjɨ̱r ·jiwfifth son
L3649L1448L3649/L1448we ·jiwsixth son
L1423L1448L1423/L1448ŋwər ·jiwseventh son
L1257L1448L1257/L1448·jar ·jiweighth son

The character da used in the word "eldest son" may be a borrowing from Chinese 大 "big", "eldest"; and the character for "eighth", L1257 ·jar, is phonetically identical to the ordinary Tangut word for "eight", and so can be considered to be simply "eight" written with a different character (constructed from the left hand side of L0384 L0384 ljịj "child" and the whole of L4602 ·jar "eight"). The other six characters, corresponding to "second" through "seventh" are not related to the ordinary Tangut number characters, but do correspond phonetically to one of the characters in the disyllabic words for "two" through "seven" in the names for the second through seventh months in the "ritual language".

Month Numbers and Son Numbers
Number Month Numbers Son Numbers
twoL0795L5855L0795/L5855rjɨr lọL5914L5914lọ
threeL4344L5565L4344/L5565lhejr gjuL2465L2465rjɨj
fourL1341L4362L1341/L4362kwej ŋwərL4934L4934ŋwər
fiveL1783L1615L1783/L1615tśjɨ̱r luL5053L5053tsjɨ̱r
sixL3849L5081L3849/L5081źjiw weL3649L3649we
sevenL0332L1347L0332/L1347ŋwər kạL1423L1423ŋwər

The son numbers for "two" (lọ), "four" (ŋwər) and "six" (we) are phonetically identical to the second character of the corresponding month names; whereas the son numbers for "three" (rjɨj), "five" (tsjɨ̱r) and "seven" (ŋwər) are phonetically identical or similar to the first character of the corresponding month names. This correspondence cannot be a coincidence, and the only reasonable explanation is that the son numbers two through seven are cognate to the month numbers two through seven, or indeed, in four out of six cases are they are essentially just different ways of writing the same number. The characters for the son numbers two through seven are bound characters, and do not appear to occur other than modifying the word ·jiw "son"; but five of the six corresponding characters from the month numbers do occur by themselves with their numeric meaning in combination with ordinary ("common language") words in various Tangut texts :

  • L3057L5855 — L3057/L5855 źju lọ "double fish" (雙魚) [a constellation]
  • L4344L4730 — L4344/L4730 lhejr ·ụ "Tripitaka" (三藏) [Buddhist scriptures and the name of the famous monk]
  • L4362L0435 — L4362/L0435 ŋwər kjạ "pipa" (琵琶) [the pipa is a musical instrument with four strings, hence the Tangut name which literally means "four stringed musical instrument" (四琴)]
  • L1783L4783L4871 — L1783/L4783/L4871 tśjɨ̱r njir ŋər "Five Platforms Mountain" (五臺山) [Wutai Shan, the famous Buddhist mountain]
  • L0332L2777 — L0332/L2777 ŋwər ŋewr "seven sounds" (七鳴) [seven claps of thunder or bursts of music]

Even in the Odes the shortened monosyllable form of the month number is sometimes used instead of the disyllabic word, for example, in the 4th line of the verse for the 12th Month the character L4344 lhejr is used by itself to mean "three". On the other hand, the disyllabic month numbers do not appear to occur outside the odes, and would seem to be expansions of the simple monosyllabic numbers. Quite why or how the single syllable numbers have been expanded to two syllables is unclear; except for the first character of the disyllabic word for "two" (L0795 rjɨr), all the other halves of the disyllabic numbers (L5565 gju, L1341 kwej, L1615 lu, L3849 źjiw, L1347 kạ) are bound characters and do not seem to occur by themselves or in combination with any other character anywhere else. The Wen Hai dictionary defines these five characters as their corresponding number when used as a disyllabic collocation, and their character construction also implies that they are numbers, but it does not seem likely that they were originally monosyllabic numbers, so I will not discuss them further.

This leaves us with six monosyllabic words for the numbers two through seven that can be used instead of the normal Tangut numbers :

  • L5855lọ "two"
  • L4344lhejr "three"
  • L4362ŋwər "four"
  • L1783tśjɨ̱r "five"
  • L5081we "six"
  • L0332ŋwər "seven"

These alternative numbers are used both in "common language" and "ritual language" texts, and do not seem to me to have any particular connection with a pre-Buddhist ritual language that Kepping has posited; indeed, some of the words that incorporate these numbers, such as Tripitaka and Wutai Shan, are overtly Buddhist. But why are these alternative numbers used sometimes? The mountain name "Five Platforms Mountain" (Wutai Shan) is written with the ordinary Tangut words for "platform" and "mountain", so why is "five" alone written with the non-standard word for "five"? What, then, are the origins of this series of numbers?

  • Do they represent a linguistic substratum?
  • Are they loans from another language?
  • Why are the words for "four" and "seven" pronounced exactly the same?

At the present time I don't have an answer to these questions, but would note that I have been unable to find cognates for these numbers in any other language, or find any language with identical or very similar words for four and seven that are phonetically similar to ŋwər.


  • Kepping, K. B. (Кепинг, Ксения Борисовна), Tangut Ritual Language. Paper presented at the 29th International conference on Sino-Tibetan languages and linguistics, Leiden, 10–13 October 1996.
  • Nishida Tatsuo (西田龍雄), Seikago 'tsukizuki rakushi' no kenkyū 西夏語『月々楽詩』の研究 [Study of the Tangut language 'Poem on Pleasure of Every Month']; in Kyoto Daigaku Bungakubu Kiyō 京都大学文学部紀要 [Memoirs of the Faculty of Letters Kyoto University] vol. 25 (1986) pages 1–116. Reprinted in Seika Ōkoku no gengo to bunka 西夏王国の言語と文化 [Language and Culture of the Tangut kingdom] (Tokyo: Iwanami Shoten, 1997) pages 112–205.

Addendum A

Coincidentally, at the same time that I was writing this post, Marc Miyake has been discussing the different words for "two" on his blog :

Addendum B

On 25 April 2013 a Day of Tangut Studies was held at SOAS, University of London, and I presented a talk entitled "The Ode on Monthly Pleasures—a new interpretation" which elaborates on the theme of this blog post. A PDF of my presentation may be downloaded from academia.