Monday, 24 July 2006

R Rotunda Part 2

Yesterday I gave an overview of the history and usage of r rotunda. Today I will be exploring some of the issues relating to the proposed encoding of r rotunda in ISO/IEC 10646 and Unicode.



Encoding R Rotunda

As r rotunda is a contextual glyph variant of the lowercase letter 'r' and not a distinct letter with its own semantics, in principle it need not be encoded as a separate character, but could be dealt with at the font level; for example a blackletter OpenType font could contextually substitute an r rotunda glyph whenever 'r' comes after the letters 'B', 'D', 'O', 'P', 'V', 'W', 'b', 'd', 'h', 'o', 'p', 'v', 'w' and 'y' (or whatever rule the font designer decides to apply).

However, in practice this approach cannot satisfy the requirements of textual scholars, who need to be able to exactly represent the often idiosyncratic usage of r rotunda in a particular text. A font-based contextual rule for r rotunda is fine for users who just want to display some text in a blackletter typeface for aesthetic or pseudo-archaic purposes, but as we have seen medieval manuscripts and early printed books are very rarely consistent in their use of r rotunda, and so textual scholars need a mechanism for marking a letter as r rotunda at the text level.

Some scripts, such as Mongolian and Phags-pa, deal with contextual glyph variants very successfully with variation selectors, which may be applied to a character in order to override a default contextual rule or to display a glyph variant in isolation. However, I am probably in the minority in considering variation selectors to be a useful way to deal with contextual glyph variation in general (note that I do not advocate the use of variation selectors to deal with simple or historic glyph variants), and given that variation selectors have never previously been applied to the Latin script, I do not think that it would be an acceptable solution in this case. Therefore I have to agree that the only practical solution is to encode r rotunda as a distinct character in its own right.

N3027 proposed to encode the following r rotunda related characters, which have been accepted into the second Proposed Draft Amendment for Amd.3 of ISO/IEC 10646 (PDAM 3.2) :

  • 1DE3 COMBINING LATIN SMALL LETTER R ROTUNDA
  • A75A LATIN CAPITAL LETTER R ROTUNDA
  • A75B LATIN SMALL LETTER R ROTUNDA
  • A75C LATIN CAPITAL LETTER RUM ROTUNDA
  • A75D LATIN SMALL LETTER RUM ROTUNDA

I welcome the encoding of the lowercase forms of r and rum rotunda, but am dubious of the necessity or usefulness of encoding an uppercase form of r rotunda.



Capital R Rotunda

In their justification for encoding capital R rotunda the authors of N3027 claim that :

The case-pairing LATIN CAPITAL LETTER R ROTUNDA is attested in texts from the 15th century (Figure 68 shows it in RUM ROTUNDA form, but it does occur on its own).

In other words, no evidence is provided to justify the encoding of capital R rotunda other than a single example of capital Rum rotunda. The omission of any supporting evidence for capital R rotunda is particularly remarkable when we consider that every reference book I have consulted agrees that r rotunda is a lowercase only letterform. Overturning received opinion would normally require substantial evidence rather than simply a vague assertion that a particular character exists; and I think that my friends on the UTC were remiss in accepting capital R rotunda without any supporting evidence.

So does capital R rotunda exist or not ? Well, extensive examination of medieval manuscripts and early printed books by myself has failed to uncover a single instance of capital R rotunda, and indeed, the corresponding capital form of lowercase r rotunda is the ordinary capital 'R', as can be seen from this example from the Welsh Bible, where 'yr' is written with r rotunda in lowercase, but is written as 'YR' in capitals :


Y Beibl Cyssegr-lan Matthew chapter 12 (London, 1588)


Having failed to find any evidence of capital R rotunda myself, I turned to the MUFI character recommendation, from which the characters proposed in N3027 are drawn. Capital R rotunda was not in version 1.0 of the MUFI recomendation, but suddenly appears in version 2.0 with the note "Added for reasons of case pairing"; then in later editions of version 2.0, the justification is changed to "Attested by epigraphical usage", and in the most recent edition of version 2.0 it is included with no comment at all. None of this filled me with any confidence, so I turned to the member of MUFI who was responsible for initially suggesting R rotunda for inclusion in MUFI version 2.0, and he very kindly provided me with these scans as evidence for capital R rotunda :


The Lorsch Sacramentary (?) folio 9r (mid 11th century)

from Bibliotheca Palatina (Heidelberg, 1986)

To be read as "PER OMNIA SAECULA SAECULORUM"


Inscription of Pope Alexander IV (1256)

from Degering, Die Schrift (1929)

All examples of Rum rotunda


Bulla Gregorii XI de basilicae Lateranensis dignitate (1372)

from Degering, Die Schrift (1929) [N3027 fig.68]

An example of Rum rotunda


Unfortunately, to my eye these images only provide evidence for capital Rum rotunda, and not capital R rotunda per se.

The arguments in favour of encoding capital R rotunda, despite the paucity of evidence, that I have heard include :

  • It is required for casing operations
  • Capital Rum rotunda is attested and so capital R rotunda may also be attested (even if there are no examples to hand)
  • There is no harm in encoding a spurious uppercase form of r rotunda
  • Other non-existant capital letters have been encoded in the past

None of which are convincing to me.

The problem with encoding capital R rotunda simply in order to provide a casing pair of letters is that lowercase r rotunda already has a corresponding uppercase form, U+0052 LATIN CAPITAL LETTER R. User expectations are that if you uppercase a digital text with r rotunda in it, then r rotunda will case to 'R'; 'R' would then lowercase to 'r', but there is not much you can do about that. The situation with respect to r rotunda is exactly the same as for U+017F LATIN SMALL LETTER LONG S ſ , which has a compatibility mapping to 's', and uppercases to 'S'.

If lowercase and capital R rotunda are encoded with case mapping to each other (which is the assumed default behaviour, although N3027 does not include proposed Unicode properties as is now expected of all character proposals), then casing operations are going to run contrary to user expectations; therefore it would seem to me to be a very bad idea to encode capital R rotunda simply "for reasons of case pairing". In the recent past I have argued strongly in favour of encoding casing pairs of Latin letters unless there is a good reason not do so. That r rotunda already has an expected casing behaviour is, in my opinion, very good reason not to encode an unsubstantiated capital form in this instance.



Addendum I [2006-10-08]

I submitted a document (L2/06-252) on the issues associated with encoding capital R Rotunda to the Unicode Technical Committee, which was considered at the August 2006 UTC meeting, but no consensus was reached. Then at the September 2006 of WG2 in Tokyo I had long discussions with Michael Everson on the subject, and in the end the UK dropped its objections to encoding capital R Rotunda, on the basis that the user community's desire for r rotunda to have roundtrip casing should be taken into consideration.

Well, I guess that scholars who are using Unicode to faithfully reproduce the content of texts written or printed with r rotunda are not going to be applying casing operations to their electronic texts anyway (and r rotunda isn't even affected by title casing), so the argument about what a capital r rotunda should look like is probably academic.

At the same meeting it was also decided to encode a number of "insular" letterforms. The UK had previously objected to the encoding of insular letterforms as distinct characters, but on the basis of a new proposal by Michael Everson which demonstrated significant use of insular letters for Cornish and Welsh orthography and phonetic notation, the UK withdrew its objections, and these letters are now on the ISO/IEC 10646:2003 Amd.3 ballot (which will correspond to Unicode 5.1).



Addendum II [2008-04-04]

R rotunda and Rum rotunda, in both lowercase and uppercase forms, have now been encoded in Unicode 5.1 :

  • U+A75A Ꝛ LATIN CAPITAL LETTER R ROTUNDA
  • U+A75B ꝛ LATIN SMALL LETTER R ROTUNDA
  • U+A75C Ꝝ LATIN CAPITAL LETTER RUM ROTUNDA
  • U+A75D ꝝ LATIN SMALL LETTER RUM ROTUNDA
  • U+1DE3 ᷣ COMBINING LATIN SMALL LETTER R ROTUNDA

No comments: