Saturday, 7 April 2007

Numbers that Don't Add Up : Tibetan Half Digits

One set of numbers that have caused no end of discussion since they were first encoded in Unicode 2.0 (July 1996) are the ten Tibetan half digits, which are forms of the Tibetan digits zero through nine [U+0F20..U+0F29] with a hooked slash through them :


The problem with these characters is that it is very hard to get hold of any examples of their usage, and even respected Tibetan experts (and most Tibetans) are not familiar with them other than from the Unicode character charts. Tibetan reference books are universally silent on these characters, and as no other Indic scripts have similar half digits it is not immediately evident what they are meant to represent.

As I understand it, the original Tibetan encoding proposal from 1995 included a combining slash character that could be used in combination with one or more of the Tibetan digit characters, but this was rejected by the UTC in favour of the ten precomposed half digits. This was in line with the position of the Chinese national body that only these ten slashed digits were required.

In Unicode 2.0 and 2.1 none of these ten characters had a numeric value assigned to them in the Unicode Data files; then exactly ten years ago to the day Tim Greenwood asked on the Unicore mailing list why the Tibetan half digits did not have a numeric property. The response from Lee Collins, who was one of the key players in the reintroduction of Tibetan into Unicode after it had been banished following the merger with ISO/IEC 10646, was that they don't have a fixed value, but are used to represent fractions. For the next couple of years the Unicode Data files remained the same with respect to these characters, but then in August 2000 the ten half digits finally got assigned numeric values (in square brackets in the above list) in Unicode 3.0.1. There is nothing in the UTC minutes or the Unicode document register to indicate where the impetus to add these numeric values came from or what the evidence for the values assigned to them was -- the values simply appeared in the final beta of 3.0.1, and have remained unchanged ever since.

But every now and then on the Tibetan mailing lists that I am subscribed to there is heated and inconclusive discussion as to what these characters represent and whether we really need them or not.

The values assigned to the half digit characters in the Unicode Data file are one half less than the corresponding whole digit, but there are some Tibetan experts who believe that these values are wrong, and should be corrected or removed. There are two different theories of what slashed digits represent numerically.

Firstly, there is some supporting evidence for the Unicode Data position that the digits represent one half less than the unslashed digit. When I say some, I mean the solitary example of a single Tibetan postage stamp :

This stamp is one of a set of five stamps first issued in 1933 with the following values (1 tranka ཊཾ = 1 ½ zho ཞོ = 15 skar སྐར) :

  • 7 ½ skar = ½ tranka
  • 1 zho = 10 skar = ⅔ tranka
  • 1 tranka
  • 2 tranka
  • 4 tranka

The value of the 7 ½ skar stamp is given on the stamp as (on the right panel) and སྐར (on the left panel), using U+0F31 TIBETAN DIGIT HALF EIGHT to represent the value of 7 ½. The slashed digit is unfortunately incomplete on the example given above from my own collection, but in this page from some stamp catalogue that I once photocopied many years ago the figure (labelled "1/2 t.") is shown quite clearly to be identical to the representative glyph for U+0F31 in the Unicode charts :

Unfortunately this is the total extent of evidence that has thusfar been adduced in favour of the "half less" usage of slashed digits. There is a 7 ½ skar Tibetan coin that was minted 1918-1925, but the value is given in words as skar phyed brgyad སྐར་ཕྱེད་བརྒྱད "seven and a half skar". Note that the written Tibetan for "seven and a half" (as with other half values) is "half [less than] eight", so it is easy to understand why slashing a number would be used to indicate a value of half less than the unslashed number.

The competing theory is that slashed digits are used in Tibetan art when drawing thangkas to indicate the proportions of Buddha figures and chörtens (stupas), indicating a dimension half that of the unslashed number. It is claimed that in this usage the slash may be applied to a single digit or to a group of digits representing a larger number, although one informed informant has stated that he has only ever seen a slashed digit one in these contexts, and of course the value of a slashed digit one is a half whichever theory you subscribe to.

I would love to be able to give an example of this usage of slashed digits, but am unable to do so at present. If any of my readers have some pictures of thangkas showing half digits that I can use please let me know.

Clearly the difficulty in finding examples of their usage indicates that the Tibetan Half Digit characters are not particularly needed by anyone other than philatelists and possibly scholars of thangkas, and it is of very little consequence whether the numeric value assigned to these characters by Unicode is correct or not. Indeed for those handful of users who will ever want to use a Tibetan half digit in real life it is probably totally irrelevant what value the character has according to Unicode -- all that matters is that the character is rendered correctly. So, all in all, given that the "half less than" usage is attested by the 7 ½ skar stamp, I personally think that the Unicode Data values for these characters are perfectly OK, and should not be changed.

No comments: