Saturday, 25 March 2006

Unicode Character Names Part 2 : A Name is for Life

As discussed in my post on Good. Bad and Ugly Character Names, there are some Unicode characters with wrong or misleading names. Some people get very worked up about bad character names (or names that they perceive to be bad), and insist that Unicode must change the name. However, for reasons of stability with other standards, which may refer to Unicode characters by name rather than code point, character names once assigned cannot under any circumstances be changed.

Nevertheless, the names for 1,944 characters introduced in Unicode 1.0 are different from their current names (in the vast majority of cases the changes are very minor), but these name changes were required by the merger between the developing Unicode and ISO/IEC 10646 standards in 1993. One of the most noticeable difference between the 1.0 names (pre-merger) and the 1.1 names (post-merger) is that the 1.0 names reflect American English (because Unicode is in origins a consortium of American companies), whereas the 1.1 names have a more British English flavour (because, or so I am told, this was insisted upon by Bruce Paterson, who is British and was the editor of ISO/IEC 10646 until 2000).


Differences in Names between Unicode 1.0 and 1.1
Code Point Unicode 1.0 Name Unicode 1.1 Name
002E PERIOD FULL STOP
002F SLASH SOLIDUS
005C BACKSLASH REVERSE SOLIDUS
00B6 PARAGRAPH SIGN PILCROW SIGN
02D2 MODIFIER LETTER CENTERED RIGHT HALF RING MODIFIER LETTER CENTRED RIGHT HALF RING
02D3 MODIFIER LETTER CENTERED LEFT HALF RING MODIFIER LETTER CENTRED LEFT HALF RING
271B OPEN CENTER CROSS OPEN CENTRE CROSS
271C HEAVY OPEN CENTER CROSS HEAVY OPEN CENTRE CROSS
272B OPEN CENTER BLACK STAR OPEN CENTRE BLACK STAR
272C BLACK CENTER WHITE STAR BLACK CENTRE WHITE STAR
2732 OPEN CENTER ASTERISK OPEN CENTRE ASTERISK
273C OPEN CENTER TEARDROP-SPOKED ASTERISK OPEN CENTRE TEARDROP-SPOKED ASTERISK
2742 CIRCLED OPEN CENTER EIGHT POINTED STAR CIRCLED OPEN CENTRE EIGHT POINTED STAR
32A5 CIRCLED IDEOGRAPH CENTER CIRCLED IDEOGRAPH CENTRE
FE4A SPACING CENTERLINE OVERSCORE CENTRELINE OVERLINE
FE4E SPACING CENTERLINE UNDERSCORE CENTRELINE LOW LINE

Nevertheless, Unicode 1.1 did preserve a couple of American English spellings from Unicode 1.0 :

  • U+3238 PARENTHESIZED IDEOGRAPH LABOR
  • U+3298 CIRCLED IDEOGRAPH LABOR

Since Unicode 1.1 the character names have remained predominantly British English, with U+1D355 TETRAGRAM FOR LABOURING and a further seven characters with CENTRE in their name. However, two American spellings did slip in with Unicode 3.0 :

  • U+2F7E KANGXI RADICAL PLOW
  • U+2F8A KANGXI RADICAL COLOR

Since the merger between Unicode and ISO/IEC 10646 only two characters have ever changed their name, namely U+00C6 and U+00E6, which were originally called LATIN CAPITAL LETTER A E and LATIN SMALL LETTER A E in Unicode 1.0, then changed to LATIN CAPITAL LIGATURE AE and LATIN SMALL LIGATURE AE in Unicode 1.1 after the merger with ISO/IEC 10646, and finally changed to their current names LATIN CAPITAL LETTER AE and LATIN SMALL LETTER AE in Unicode 2.0. The latter change was due to representations by the Danish Standards Association who considered these two characters to be letters rather than ligatures; but this caused so much trouble and acrimony that the respective committees of Unicode and ISO/IEC 10646 resolved never again to make any name changes, regardless of the severity of the mistake or the triviality of the change required (see the Unicode Standard Stability Policy).


No comments: