Sunday, 30 July 2006

What's That ?

N3027 ("Proposal to add medievalist characters to the UCS") proposes to encode a wide range of abbreviation letters used in medieval manuscripts and early printed books. At present it is impossible to transcribe into Unicode many early texts as special abbreviation letters are so common, so I am very pleased to see that these letters are finally being encoded. However, there is one proposed letter that I have a little quibble with. According to N3207 :

LATIN LETTER THORN WITH STROKE is used for Old Norse þat, þess, þor-, þæt (Figures 29, 32, 33, 40, 73, 79).

Which is true enough as far as it goes, but I suspect that most of my readers will be more familiar with the letter thorn with a stroke through the ascender in the context of Old English, where it is the ubiquitous abbreviation for þæt (and unlike Old Norse, only þæt).

It is an odd thing about the proposal that Latin, Old Norse, Irish, Welsh and even Cornish are frequently cited as languages using a particular proposed character, but Old English is only cited for a single character (COMBINING DOUBLE CIRCUMFLEX ABOVE, which is an editorial mark used in some editions of Old English poetry) and there are only two other mentions of Old English in the entire 51 pages of the document, when quite a few of the proposed characters are applicable to Old English (three primarily used for OE), and six of the examples provided are actually of Old English text (figs. 29, 30, 31, 37, 39, 40). In fact two of the six examples cited for LATIN LETTER THORN WITH STROKE are Old English, contrary to what the casual reader might assume.

Not only does the proposal not mention Old English in relation to the proposed LATIN LETTER THORN WITH STROKE, but it omits the crucial piece of information that the the glyph forms of Old Norse and Old English letter thorn with stroke are quite different from each other. The Old Norse form has a short horizontal stroke through the ascender , whereas the Old English form has a longer diagonal stroke through the ascender . This difference can be seen in the examples given in N3027, where the Old Norse examples (figs. 32, 33, 42, 73 and 79) all use the former letterform and the Old English examples (figs. 29 and 40) both use the latter letterform. Although the examples show that Old Norse and Old English use distinct glyph forms, the text of the proposal does not make any mention of the fact that this character occurs in two distinct glyph forms, which I think is an important detail that should have been made explicit.

The following is an example of an early 12th century Old Norse manuscript :


Elucidarius (AM 674a folio 17r)

Thorn with stroke on lines 2 and 7


And this is an example of an Old English manuscript dated to about the year 1000 :


The Cædmon Manuscript [part of the Old English verse rendition of Genesis] (Bodleian Junius MS 11 folio 14)

Thorn with stroke on lines 3 and 11


These manuscripts exemplify the differences between the Old Norse and Old English forms of thorn with stroke. This difference is preserved in most modern typeset editions, with editions of Old Norse texts normally using a short horizontal stroke, and editions of Old English texts normally using a longer diagonal stroke. The following are a few examples that show the Old English form of the letter (see N3027 figs. 32, 33, 73 and 79 for some ON examples) :


Plummer and Earle, Two of the Saxon Chronicles Parallel (Oxford: Oxford University Press, 1889) p.69


A. Campbell, An Old English Grammar (Oxford: Oxford University Press, 1959) p.12


C.L. Wrenn (ed.), Beowulf (London: Harrap, 1973) p.210


The question then arises, should the Old Norse and Old English forms be encoded separately (LATIN LETTER THORN WITH STROKE and LATIN LETTER THORN WITH DIAGONAL STROKE) or should they be considered to be glyph variants of the same abstract character ? According to N3027 it would seem that they should be encoded as a single character, although the latest version of the MUFI character recommendation treats the two glyph forms as separate characters :


MUFI Character Recommendation Version 2.0 f (12 January 2006)


My inclination is to agree with MUFI on this one, although I suspect that I am wrong. According to Unicode encoding principles, language-specific glyph variations should be dealt with at the font level (i.e. in a font designed for Old Norse the glyph for LATIN LETTER THORN WITH STROKE would have a horizontal stroke, whereas a font designed for Old English would have a glyph with a diagonal stroke). However, there are plenty of precedents for encoding language-specific letterforms as separate characters.

My feeling is that in N3027, the proposed LATIN LETTER VEND is really only an Old Norse glyph variant of the already encoded LATIN LETTER WYNN used in Old English, and so if Vend and Wynn should be distinguished at the character level, why not Thorn with a horizontal stroke and Thorn with a diagonal stroke ?

An example from further afield that has been discussed recently is the proposed MYANMAR LETTER MON JHA (N3044), which is acknowledged to be a glyph variant of the already encoded MYANMAR LETTER JHA used for writing Mon, but which Michael Everson is proposing to encode as a distinct character because there is a requirement for a single "plain-text monofont" that covers all of the Myanmar-script languages of Union of Myanmar, and so language-specific glyph variants must be dealt with at the character level rather than the font level.

This has a bearing on the encoding of Thorn with a stroke, as fonts that are intended for use by medievalists (Alphabetum, Andron Scriptor, Cardo, Junicode, Leeds Uni) are general fonts that cover the characters required for all languages. Thus, users will generally not be using fonts specifically designed for Old Norse or Old English, but will be using a single "medievalist" font with a single glyph for LATIN LETTER THORN WITH STROKE, which will either cater for Old Norse or Old English, but not for both. I think that this is a pretty good argument for saying that as Old English Thorn with a diagonal stroke "is a language-specific variant which differs significantly from the 'default' letter" (ME's justification for MYANMAR LETTER JHA), it should be encoded separately from LATIN LETTER THORN WITH STROKE.

Anyhow, those are just my thoughts. It would be interesting to hear what other people think on this issue.


Monday, 24 July 2006

R Rotunda Part 2

Yesterday I gave an overview of the history and usage of r rotunda. Today I will be exploring some of the issues relating to the proposed encoding of r rotunda in ISO/IEC 10646 and Unicode.



Encoding R Rotunda

As r rotunda is a contextual glyph variant of the lowercase letter 'r' and not a distinct letter with its own semantics, in principle it need not be encoded as a separate character, but could be dealt with at the font level; for example a blackletter OpenType font could contextually substitute an r rotunda glyph whenever 'r' comes after the letters 'B', 'D', 'O', 'P', 'V', 'W', 'b', 'd', 'h', 'o', 'p', 'v', 'w' and 'y' (or whatever rule the font designer decides to apply).

However, in practice this approach cannot satisfy the requirements of textual scholars, who need to be able to exactly represent the often idiosyncratic usage of r rotunda in a particular text. A font-based contextual rule for r rotunda is fine for users who just want to display some text in a blackletter typeface for aesthetic or pseudo-archaic purposes, but as we have seen medieval manuscripts and early printed books are very rarely consistent in their use of r rotunda, and so textual scholars need a mechanism for marking a letter as r rotunda at the text level.

Some scripts, such as Mongolian and Phags-pa, deal with contextual glyph variants very successfully with variation selectors, which may be applied to a character in order to override a default contextual rule or to display a glyph variant in isolation. However, I am probably in the minority in considering variation selectors to be a useful way to deal with contextual glyph variation in general (note that I do not advocate the use of variation selectors to deal with simple or historic glyph variants), and given that variation selectors have never previously been applied to the Latin script, I do not think that it would be an acceptable solution in this case. Therefore I have to agree that the only practical solution is to encode r rotunda as a distinct character in its own right.

N3027 proposed to encode the following r rotunda related characters, which have been accepted into the second Proposed Draft Amendment for Amd.3 of ISO/IEC 10646 (PDAM 3.2) :

  • 1DE3 COMBINING LATIN SMALL LETTER R ROTUNDA
  • A75A LATIN CAPITAL LETTER R ROTUNDA
  • A75B LATIN SMALL LETTER R ROTUNDA
  • A75C LATIN CAPITAL LETTER RUM ROTUNDA
  • A75D LATIN SMALL LETTER RUM ROTUNDA

I welcome the encoding of the lowercase forms of r and rum rotunda, but am dubious of the necessity or usefulness of encoding an uppercase form of r rotunda.



Capital R Rotunda

In their justification for encoding capital R rotunda the authors of N3027 claim that :

The case-pairing LATIN CAPITAL LETTER R ROTUNDA is attested in texts from the 15th century (Figure 68 shows it in RUM ROTUNDA form, but it does occur on its own).

In other words, no evidence is provided to justify the encoding of capital R rotunda other than a single example of capital Rum rotunda. The omission of any supporting evidence for capital R rotunda is particularly remarkable when we consider that every reference book I have consulted agrees that r rotunda is a lowercase only letterform. Overturning received opinion would normally require substantial evidence rather than simply a vague assertion that a particular character exists; and I think that my friends on the UTC were remiss in accepting capital R rotunda without any supporting evidence.

So does capital R rotunda exist or not ? Well, extensive examination of medieval manuscripts and early printed books by myself has failed to uncover a single instance of capital R rotunda, and indeed, the corresponding capital form of lowercase r rotunda is the ordinary capital 'R', as can be seen from this example from the Welsh Bible, where 'yr' is written with r rotunda in lowercase, but is written as 'YR' in capitals :


Y Beibl Cyssegr-lan Matthew chapter 12 (London, 1588)


Having failed to find any evidence of capital R rotunda myself, I turned to the MUFI character recommendation, from which the characters proposed in N3027 are drawn. Capital R rotunda was not in version 1.0 of the MUFI recomendation, but suddenly appears in version 2.0 with the note "Added for reasons of case pairing"; then in later editions of version 2.0, the justification is changed to "Attested by epigraphical usage", and in the most recent edition of version 2.0 it is included with no comment at all. None of this filled me with any confidence, so I turned to the member of MUFI who was responsible for initially suggesting R rotunda for inclusion in MUFI version 2.0, and he very kindly provided me with these scans as evidence for capital R rotunda :


The Lorsch Sacramentary (?) folio 9r (mid 11th century)

from Bibliotheca Palatina (Heidelberg, 1986)

To be read as "PER OMNIA SAECULA SAECULORUM"


Inscription of Pope Alexander IV (1256)

from Degering, Die Schrift (1929)

All examples of Rum rotunda


Bulla Gregorii XI de basilicae Lateranensis dignitate (1372)

from Degering, Die Schrift (1929) [N3027 fig.68]

An example of Rum rotunda


Unfortunately, to my eye these images only provide evidence for capital Rum rotunda, and not capital R rotunda per se.

The arguments in favour of encoding capital R rotunda, despite the paucity of evidence, that I have heard include :

  • It is required for casing operations
  • Capital Rum rotunda is attested and so capital R rotunda may also be attested (even if there are no examples to hand)
  • There is no harm in encoding a spurious uppercase form of r rotunda
  • Other non-existant capital letters have been encoded in the past

None of which are convincing to me.

The problem with encoding capital R rotunda simply in order to provide a casing pair of letters is that lowercase r rotunda already has a corresponding uppercase form, U+0052 LATIN CAPITAL LETTER R. User expectations are that if you uppercase a digital text with r rotunda in it, then r rotunda will case to 'R'; 'R' would then lowercase to 'r', but there is not much you can do about that. The situation with respect to r rotunda is exactly the same as for U+017F LATIN SMALL LETTER LONG S ſ , which has a compatibility mapping to 's', and uppercases to 'S'.

If lowercase and capital R rotunda are encoded with case mapping to each other (which is the assumed default behaviour, although N3027 does not include proposed Unicode properties as is now expected of all character proposals), then casing operations are going to run contrary to user expectations; therefore it would seem to me to be a very bad idea to encode capital R rotunda simply "for reasons of case pairing". In the recent past I have argued strongly in favour of encoding casing pairs of Latin letters unless there is a good reason not do so. That r rotunda already has an expected casing behaviour is, in my opinion, very good reason not to encode an unsubstantiated capital form in this instance.



Addendum I [2006-10-08]

I submitted a document (L2/06-252) on the issues associated with encoding capital R Rotunda to the Unicode Technical Committee, which was considered at the August 2006 UTC meeting, but no consensus was reached. Then at the September 2006 of WG2 in Tokyo I had long discussions with Michael Everson on the subject, and in the end the UK dropped its objections to encoding capital R Rotunda, on the basis that the user community's desire for r rotunda to have roundtrip casing should be taken into consideration.

Well, I guess that scholars who are using Unicode to faithfully reproduce the content of texts written or printed with r rotunda are not going to be applying casing operations to their electronic texts anyway (and r rotunda isn't even affected by title casing), so the argument about what a capital r rotunda should look like is probably academic.

At the same meeting it was also decided to encode a number of "insular" letterforms. The UK had previously objected to the encoding of insular letterforms as distinct characters, but on the basis of a new proposal by Michael Everson which demonstrated significant use of insular letters for Cornish and Welsh orthography and phonetic notation, the UK withdrew its objections, and these letters are now on the ISO/IEC 10646:2003 Amd.3 ballot (which will correspond to Unicode 5.1).



Addendum II [2008-04-04]

R rotunda and Rum rotunda, in both lowercase and uppercase forms, have now been encoded in Unicode 5.1 :

  • U+A75A Ꝛ LATIN CAPITAL LETTER R ROTUNDA
  • U+A75B ꝛ LATIN SMALL LETTER R ROTUNDA
  • U+A75C Ꝝ LATIN CAPITAL LETTER RUM ROTUNDA
  • U+A75D ꝝ LATIN SMALL LETTER RUM ROTUNDA
  • U+1DE3 ᷣ COMBINING LATIN SMALL LETTER R ROTUNDA

Sunday, 23 July 2006

R Rotunda Part 1

Having recently discussed long s (in some detail), I want to turn my attention to r rotunda, in the second of a series of posts related directly or indirectly to the Proposal to add medievalist characters to the UCS (N3027) co-authored by Michael Everson and members of the Medieval Unicode Font Initiative (MUFI). This is a very important proposal to encode many letters and abbreviations used in medieval manuscripts or used by scholars of medieval manuscripts. I very much support the encoding of characters which will finally allow textual scholars and pedants such as myself to faithfully reproduce the contents of medieval manuscripts and early printed books in electronic text format; although, as some of my readers already know, I do have concerns over some of the characters proposed in N3027.



The Origins of R Rotunda

R rotunda is a '2'-shaped variant form of the lowercase letter 'r' used in medieval manuscripts and early printed books. Like the long-s, r rotunda only occurs in lower case, but whereas long-s is normally used initially and medially within a word but not finally, r rotunda is used medially and finally in a word after certain letters but never initially. Furthermore, r rotunda is normally only used in blackletter styles of typeface (or Gothic styles of mansuscript). Therefore, when roman typefaces replaced blackletter for printing during the sixteenth and seventeenth centuries, r rotunda disappeared from the scene. As far as I can tell, in recent centuries, German books set in Fraktur do not use r rotunda.

R rotunda apparently developed from a ligature of the letters O and R (minuscule 'r' being written as 'R' in half uncial scripts) where the lefthand stroke of the 'R' merged with the 'O'. I'm not sure exactly when or where this happened, although I'm told that r rotunda is first seen in the southern Italian Beneventan script that developed during the 8th century. You can certainly see r rotunda and its cousin rum rotunda (r rotunda with a stroke, used as an abbreviation for Latin -rum) in British Library MS Burney 284, but this is comparatively late in date (late 11th to early 12th century) and so not particularly significant. Unfortunately, I have found it hard to find any examples of early Beneventan manuscripts, so I can't really confirm or deny the theory that r rotunda originated in the Beneventan script.

Whatever their origins, r rotunda and rum rotunda can be seen in some late 10th and early 11th century manuscripts written in the Carolingian script, such as British Library MSS Cotton Cleopatra C. VIII (late 10th century) and Arundel 375 (late 10th or early 11th century). The example below is from a manuscript written by the scribe Eadui Basan at Canterbury between 1012 and 1023 :


Eadui Psalter (British Library MS Arundel 155 folio 11v)

"adoret" (line 5), "iſtorum" (line 5), "pſalmorum" (line 5), etc.


When the Gothic script came to replace the Carolingian script in the 12th century, it inherited the r rotunda form of the letter 'r' after the letter 'o'. At this stage, the r rotunda is still joined to the preceding letter, and may still be considered to be a ligature of the two letters, but by the end of the medieval period the ligature had become broken, and r rotunda became perceived as a separate form of the letter 'r', in the same way that long-s was perceived as distinct form of the letter 's'. Although initially restricted in position to following the letter 'o', r rotunda soon came to be used after any letter that (in Gothic script) ended with a rounded stroke, as can be seen in this page from the Luttrell Psalter (circa 1325-1335) :


Luttrell Psalter folio 171v

"probauerunt" (line 3), "Quadraginta" (line 4), "corde" (line 6).


Early printed books retained all the peculiar letters and abbreviations used in mansuscripts, including r rotunda, which can be seen (in various glyph forms) in books set in blackletter typefaces up until the early 17th century :


A most strange and true report of a monsterous fish page A3 (London, 1604)

"februarie" (line 1), "three" (line 2), "ſhores" (line 5), "ſpright" (line 12), "droue" (line 13), etc.



The Rules of R Rotunda

In order to try to come to an understanding of the rules for using r rotunda in printed books, I have roughly analysed the usage of r rotunda in a small selection of books written in various languages that were printed in blackletter type between the mid 15th and early 17th centuries. The results are tabulated below. With hindsight I should have noted for each book which letters ordinary 'r' follows as well as which letters r rotunda follows, otherwise it is not obvious whether a letter is not listed as preceding r rotunda for a particular book because the letter only precedes an ordinary 'r' or because there are no examples of any sort of 'r' following that letter in the book. Although I do not have the time or energy to go back and repair this omission at the present time, in a few cases I have explicitly indicated that a particular letter is not followed by r rotunda by enclosing it in square brackets (thus [y] for The Canterbury Tales means that 'y' is followed by ordinary 'r'). An asterisk is used to indicate cases where a particular letter is mostly followed by ordinary 'r', but occasionally by r rotunda. Single anomalous occurences of r rotunda where ordinary 'r' would be exepected or ordinary 'r' where r rotunda would be expected are generally ignored.


Title Published Language R Rotunda After
Bulla turcorum (The Calixtus Bull) Mainz, 1456 Latin [r rotunda is not used]
The Canterbury Tales (2nd ed.)
[Prologue only]
London, 1483 English B O P
b d h o p w [y]
Schönsperger-Bibel (The Schönsperger Bible)
[Genesis only]
Augsburg, 1490 German b d e* h o p r
The solempnities & triumphes doon & made at the spousells and mariage of the kyngs doughter the Ladye Marye to the Prynce of Castile Archeduke London, c.1508 English B G P
b d h o p w [y]
La tryumphante et solemnelle entree faicte sur le nouuel et ioyeux aduenement de treshault trespuissant et tresexcellent prince monsieur Charles prince des Hespaignes archiduc daustrice duc de Bourgongne conte de Flandres Paris, 1515 French B O P
b d h o p v y
Prima e seconda coronatione di Carlo Quinto sacratissimo imperatore re de Romani, fatta in Bologna Bologna, 1530 Italian B O P
b d* h o p
The maner of the tryumphe at Caleys & Buleyn London, 1532 English B D
b d o p w y
Die incoemste der twee seer hoochgeboren ghesusteren ons alder genadichsten keysers Karolus dye vijfde van dyen name, dye Coninginne van Vranckrijck Antwerp, 1535 Dutch O P V
b d o p v w
Yny lhyvyr hwnn London, 1546 Welsh B D G* O P
a* b d g* h m* o p v w y
Y Drych Cristianogawl Rouen, 1585 Welsh B D G* W [Y]
a* b d ḋ e* f* g* h i* m* o p
r-rotunda t* u* v w y
Y Beibl Cyssegr-lan (The Welsh Bible) London, 1588 Welsh B C* D G* O P W Y
b d h o p w y
A most strange and true report of a monsterous fish London, 1604 English B D P
b d h o p w y

Actually printed secretly in a cave on the estate of Robert Pue of Penrhyn Creuddyn during 1586 and 1587.

The following is a summary of my findings :

  • There is a core set of letters with a final rounded stroke (B, D, O, P, V, W, b, h, o, p, v, w) after which r rotunda is almost always used, regardless of place or date of publication.
  • The letter 'd' is normally followed by r rotunda when it has a bent back, but not if it has a straight back.
  • The letter 'y' mostly takes r rotunda, but not in all books.
  • The only letter that consistently takes r rotunda where ordinary 'r' would normally be expected is the letter 'r' in the Schönsperger Bible, where der herr "the Lord" (and inflexions) is written with an ordinary 'r' followed by r rotunda (see picture in rules for long s).
  • An apostrophe can intervene between r rotunda and a preceding letter, thus Welsh o'r is written with r rotunda.

What is also striking is that in almost every book examined the rules of r rotunda are not applied uniformly, and there are cases where ordinary 'r' is found after a rounded letter such as 'o', as well as cases where r rotunda is found after letters such as 'a' or 'i' that should be followed by ordinary 'r'.

Y Drych Cristianogawl is an extreme example of eratic usage of r rotunda. It starts off with fairly well-defined rules, but about half way through the book we start to see r rotunda popping up after all sorts of letters that should not take r rotunda -- presumably one of the hazards of printing illegal Catholic tracts secretly in a cave (other than being burnt at the stake) is that you may have to rely on unskilled typesetters.

In the case of capital 'G', it is not at all clear which form of letter 'r' it should be followed by, as almost all the books which have examples of G plus r rotunda also have counter examples showing G plus ordinary 'r'.

Thus, it seems to me, the rules for r rotunda are less well defined and less strictly enforced than the rules for long s.


More on r rotunda tomorrow.



Addendum I [2008-04-04]

R rotunda, in both its normal, lowercase form and the abnormal uppercase form, has now been encoded in Unicode 5.1 :

  • U+A75A Ꝛ LATIN CAPITAL LETTER R ROTUNDA
  • U+A75B ꝛ LATIN SMALL LETTER R ROTUNDA

The Unicode case mappings mean that if you uppercase U+A75B ꝛ LATIN SMALL LETTER R ROTUNDA you will get U+A75A Ꝛ LATIN CAPITAL LETTER R ROTUNDA (a virtually non-existant letter) rather than the expected ordinary uppercase letter R. In my opinion, this is an abomination of the same order as it would be to have long s uppercase to an artificial capital long s,



Addendum II [2008-06-21]

I have now discovered that the rules for r rotunda (ragged r as he calls it) or are laid out in John Smith's Printer's Grammar, first published in 1755 :



Black Letter conſiſts of as many Sorts as a Com-
mon Fount of Roman ; ſave that the firſt has two dif-
derent r's, one of which is called the ragged r [], and
is particularly uſed after letters that round off behind,
whether they be Capitals or Lower-caſe Sorts. Thus
they are properly put after the following Capitals, viz.
B D G O P U W ; and after theſe Lower-caſe let-
ters, viz. b d h o p and w.

The ragged r, of which we have taken this ſhort
notice, witneſſeth, that the German letters owe their
being to the Gothic or Black characters that were firſt
uſed for Printing : for the Germans have a ragged r,
which they call the round r ; but which, in modelizing
their letters to the preſent ſhape, they have castrated, by
depriving it of its comely tail. But that they do not
know the proper application of that letter, may be ga-
thered from their uſing it in very cloſe lines, inſtead of
common r's, thereby to gain the room of a thin Hair-
ſpace. Which obſervation we have made on purpoſe
to aſſiſt thoſe who delight to exercise themſelves in that
painful ſtudy which attends writing De Origine rerum.

The Printer's Grammar (London, 1787) page 112.


The rules stated by Smith accord fairly well with my empirical observations, with the exception that the letter y is also usually followed by r rotunda.


Saturday, 15 July 2006

BabelMap Version 5.0.0.1

Unicode 5.0 was finally released yesterday (although it won't be published in book form until later this year), several months after its original anticipated date of release (see What's New in Unicode 5.0 for a sumary of what's new). This is a small triumph for me as I am responsible for the introduction of one of the new scripts now covered by Unicode, the historic 'Phags-pa script that was used for writing Chinese, Mongolian and other languages during the 13th and 14th centuries (there is a worthwhile story here about the long and sometimes fraught passage from initial proposal to final encoding of the script, but it will have to wait for another day).

A new version of Unicode inevitably means the release of new versions of my flagship software products, BabelMap and BabelPad, and so I am pleased to announce that BabelMap version 5.0.0.1 is now available for download. Up until a few days ago a new Unicode 5.0 enabled version of BabelPad was also ready for release, but as usual I couldn't leave things well alone, and decided to add in just one more feature; and of course this feature required me to entirely disembowel the code, so that it is now in a wretched and lifeless state (as my friends in the programming fraternity know, I am a keen exponent of the art of eXtreme reFactoring) ... but hopefully BabelPad (with many great new features) will be released before the end of the month.

New BabelMap Features

My number one question from new BabelMap users is "Why is such-and-such a character displayed as a little square box ?" or "Why doesn't BabelMap support such-and-such a script ?" The reason for such questions is almost invariably that the characters they want to see are not available in the default font that BabelMap uses when it is first started (Tahoma), and they do not realise that they have to select an appropriate font to see a particular character. For me it is obvious that any given font only supports a particular subset of the Unicode repertoire (due to the 64K glyph limit for TrueType fonts, it is physically impossible for any font to cover the entire Unicode repertoire of 99,098 characters), and so you may need to select different fonts to display different characters; but for many people this is not at all evident. I have therefore changed BabelMap so that you can either select a single font to display all characters (good for seeing what a particular font covers) or use a user-defined virtual, composite font in which each Unicode block is mapped to a particular font on your system, with the result that different Unicode blocks will be rendered using different fonts (good if you are more interested in characters than fonts). By default BabelMap will use a composite font when run for the first time, so that most characters in the BMP should be displayed OK if you are running Vista, and hopefully I should get fewer questions from new users about little square boxes.

Composite Font Mappings Dialog

image



Future Enhancements

My OpenType Analysis Tool is still half-finished, and with no time to work on it, it won't be available until sometime year.

I am also planning to add in the ability to take a picture of a character as rendered using the selected font, which will be made available to the clipboard as a bitmap image ... useful if you want to display a character on a web page in situations where you doubt that the end user will have an appropriate font.

P.S. You may notice that I have done away with the arbitrary version numbering system that I previously employed (which never got beyond version 1 and never would have), and replaced it with a four-digit version number that is linked to the version of Unicode that the particular release of BabelMap/BabelPad supports. The first three digits of the version number now correspond to the Unicode version supported, and the last digit is the version of the BabelMap/BabelPad released for this version of Unicode. Thus, the new release of BabelMap is version 5.0.0.1, as it is the first release supporting Unicode 5.0.0.



Addendum [2006-07-20]

Following hot on the heels of the announcement of the release of the Unicode 5.0 character database (but not the publication of the actual Unicode 5.0 standard) on 2006-07-14 comes the notice of publication of the corresponding ISO/IEC 10646: 2003 Amendment 2, two weeks earlier (on 2006-07-01).

It's a bit of a chicken and egg relationship between Unicode and ISO/IEC 10646, further confused by the fact that although ISO/IEC 10646 Amd.2 was published before Unicode 5.1, Unicode 5.1 includes four characters (U+097B, U+097C, U+097E and U+097F) from ISO/IEC 10646 Amd.3, which won't be published until next year ... along with Unicode 5.1. And by that time we'll be well into the work of Amd.4 (corresponding to Unicode 5.2 or 6.0), which should finally include Egyptian Hieroglyphs (or at least the Gardiner subset).


Sunday, 9 July 2006

The Long and the Short of the Letter S

Commenting on the Rules of Long S, Conrad Roth said :

I have definitely seen Renaissance English texts (though I can't remember which particular examples) where in a double s the first is long and the second short. I was under the impression that this was the origin of the German Eszett, which looks like a long-s followed by a short-s, though maybe I'm wrong about this.

To which Uncle Jazzbeau responded :

The German eszett is a long-s followed by a z.

Wikipedia does provide quite a good overview of German eszett ß, although I personally find it all a little bit confusing. As my previous post on long-s left much unsaid about the origins of the long-s, which should perhaps have preceded any discussion of its rules of usage, I shall endeavour to give a brief, illustrated history of the letter "s", culminating with my take on the eszett issue. Western paleography isn't my area of expertise (although I do profess a certain dilettantish interest), so I expect my more learned readers to correct me where I may have inadvertently strayed from the truth.



The Origins of the Long S

The long-s originated at a very early date in cursive Roman scripts, and can be seen in both Old Roman Cursive (1st to 3rd centuries AD) and New Roman Cursive (late 3rd century to 7th century). The form used in Old Roman Cursive is written in two strokes, a vertical downstroke followed by a horizontal or diagonal cross-stroke (see Vindolanda I Fig.11). The following example from one of the famous Vindolanda tablets clearly illustrates this early form of the Roman cursive letter s :


Vindolanda Tablet 248

"suo" (line 2), "salutem" (line 2), "simum" (line 5), "sit" (line 5), etc.


In New Roman Cursive the way the letter was written changed so that the letter starts with a vertical downstroke but is followed by a curving upstroke (see Vindolanda I Fig.10), resulting in a letter that looks similar to our modern letter "r". Most of the early medieval scripts that devloped from the Roman cursive tradition, such as Merovingian (developed in France during the 7th century), Visigothic (developed in Spain during the late 7th century), Beneventan (developed in southern Italy during the 8th century) and Carolingian (developed at the court of Charlemagne at the end of the 8th century), inherited this form of the letter s.

In Roman uncial and half-uncial scripts the letter s often followed the form of the Roman capital S, and so early Latin texts written in insular half-uncials such as the Lindisfarne Gospels (circa 700) and the Book of Kells (circa 800) mostly use short-s (although long-s is sometimes used, for example in the "st" ligature). However, as the insular script developed, a distinctive long-s form, that must ultimately be derived from Roman cursive, came to be employed. Shown below are two manuscripts in which the insular form of the long-s is very clear, the first an Old English manuscript dating to about the year 1000, and the second an Old Irish manuscript which was written between the 11th and 15th centuries (the insular script died out in England after the Norman Conquest, but the insular tradition was preserved in Ireland where it evolved into the Gaelic script used for writing Irish up to modern times) :


The Cædmon Manuscript [part of the Old English verse rendition of Genesis] (Bodleian Junius MS 11 folio 14)

"se" (line 1), "his" (line 3), "giongorscipe" (line 3), "gesceop" (line 5), "swa" (lines 6-10), etc.


The Annals of Inisfallen [part of the entry for 1192] (Bodleian MS. Rawl. B. 503 folio 40r)

"senad" (line 4), "lascud" (line 6), "durlais" (line 8), "casstel" (line 9).


In the influential Carolingian script the hook of the first stroke is far less pronounced than the insular letter s, forming a residual knob at the top of the vertical section of the letter, which is the precursor of the left cross-stroke on the 18th century long-s. This Carolingian long-s is beautifully illustrated in the late 10th century English manuscript shown below, which was described by Edward Johnston as "an almost perfect model for a model formal hand" :


Psalter (British Library Harl. MS 2904)

Alfred Fairbank, A Book of Scripts (Penguin Books, 1949) Plate 8

"eiuſ" (line 2), "ſcientiam" (line 5), "ſermoneſ (line 6).



Positional Differentiation of Long S and Short S

In all the examples given above long-s is used exclusively, and I have not been able to find any examples showing a consistent positional distinction between long-s and short-s prior to the 12th century, although my guess is that a positional distinction between the two forms of the letter "s" first arose sometime during the 11th century.

A positional distinction, with short-s used finally and long-s used initially and medially, can be seen in this early 12th century Italian manuscript written in a Carolingian script :


Homilies and Lessons (British Library Harl. MS 7183)

Alfred Fairbank, A Book of Scripts (Penguin Books, 1949) Plate 9

"poſſimus" (line 14), etc.


During the high medieval period (13th-15th centuries) the angular Gothic script became ubiquitous, pushing the Carolingian script out of common use. As far as I can tell, almost all of the manuscripts from this period that are written in a Gothic script differentiate the long-s and short-s by position; see for example the beautiful Luttrell Psalter (circa 1325-1335). However, as I am more interested in vernacular fiction than liturgical texts, I will give as my example a section from Sir Gawain and the Green Knight :


Sir Gawain and the Green Knight (British Library Cotton MS Nero A X folio 101a).

"ſayned" (line 7), "ſelf" (line 7), "ſegge" (line 7), "diches" (line 10), "caſtel" (line 11), "palays (line 13), "Ieſus" (line 18) etc.


Eventually, during the Renaissance, the simpler and cleaner Carolingian script was revived under the name littera antiqua. The example below is from a manuscript written by Ciriagio at Florence in 1454. Note how in this manuscript long-s is used exclusively in all positions, imitating original Carolingian practice; but by the end of the 15th century the rule of using short-s finally and long-s only initially and medially had become firmly established.


De Dignitate et Excellentia Hominis (British Library Harl. MS 2593)

Alfred Fairbank, A Book of Scripts (Penguin Books, 1949) Plate 13

"philoſophuſ" (line 1), etc.


As an aside, Omniglot gives an image of the prologue from Beowulf written in an Anglo-Saxon insular script in the page devoted to the Old English script, but it has obviously been reconstructed from the modern transcription, without any reference to the original manuscript, as it uses long-s initially and medially but short-s finally, when in fact the Beowulf manuscript normally uses the long-s in all positions (interestingly the long-s is Carolingian rather than Insular in the manuscript) :


Beowulf [lines 1-11] (British Library Cotton MS Vitelius A XV folio 132)

"æþelingaſ" (line 3), "ſcyld" (line 4), "wæſ" (line 11); but note the anomalous "syððan" (line 6).



The Double S Ligature

During the 15th century cursive forms of the revived Carolingian style script (Antiqua) started to develop in Italy, including the Chancery hand that was approved for use in the Vatican by Pope Eugenius IV (1431-1447). The distinction between roman and italic hands is apparent in this 16th century Italian manuscript. What is interesting to us is that in the roman script double-s is written using a ligature of long-s and long-s, as is the case with 18th century roman typefaces, but in the italic script double-s (whether medial or final) is written using a ligature of long-s and short-s, which is very similar in form to the German sharp s ß.


La Paraphrasi (British Library Harl. MS 3541)

Alfred Fairbank, A Book of Scripts (Penguin Books, 1949) Plate 17

"paßi" (line 4), "adeßo" (line 5), "aßediato" (line 6), "santißima" (line 11); "eſſaudiſci" (line 14), "compaſſione" (line 15).


And in this beautiful handrwiting from an influential book of scripts published by G.B. Palatino in 1544 you can see some more examples of the double s ligature.


Libro nel qual s'insegna a scrivere (Rome, 1544)

Alfred Fairbank, A Book of Scripts (Penguin Books, 1949) Plate 20

"appreßo" (line 4), "eßer" (line 6).


Books such as Palatino's helped spread the use of italic scripts to the rest of Europe, which is why, as Conrad pointed out, you may well see a long-s short-s ligature (ß) in English texts from the 15th and 16th centuries. I was hoping to use as my example something from the hand of Queen Elizabeth (who had a beautiful hand as a princess) but was unable to find anything suitable, so instead I have chosen a printed text, Queen Elizabeth's Letters Patent for the 1560 Book of Common Prayer in Latin (Liber Precum Publicarum) :


Liber Precum Publicarum [Queen Elizabeth's Letters Patent] (1560)

"aßensum" (line 7), "paßim" (line 10), but "eſſe" (line 19).



German Eszett Ligature

Whilst Antiqua typefaces had come to dominate the printing trade in most of Europe by the early 17th century, blackletter typefaces, derived from the Medieval Gothic script, continued as the normal typeface for printed books in Germany until the 20th century.

What may perhaps be surprising to those unfamiliar with German set in blackletter typefaces is that where German words in modern type are written with an eszett or sharp s (ß), in blackletter typefaces a ligature of the letters long-s and z is employed. My first example comes from Johann Schönsperger's 1490 German Bible, which is set in the Schwabacher typeface favoured during the late 15th and early 16th centuries. Here the "ſz" ligature looks quite similar to the "ſs" ligature, as the top of the 3-shaped letter "z" ligates with the top of the long-s.


Schönsperger-Bibel (Augsburg: Johann Schönsperger, 1490) page b3b

"heiſz" (line 1), "hauſz-fraw" (line 4), "auſz" (lines 7 and 12), "laſze" (line 9), "lieſz" (line 12), "paradeÿſz" (line 13).


My second example, dating from the early 19th century, is set in a traditional Fraktur typeface. In the Fraktur typeface the letter "z" does not join to the top of the long-s, so that the ligature is much more clearly an "ſz" ligature, not an "ſs" ligature.


Der Hufstand der Braunschweiger (Braunschweig: Friedrich Bieweg und Sohn, 1830) p.38

"daſz" (line 1), "groſzen" (line 3), "entſchloſz" (line 5), "gröſzter" (line 6), "Schloſzgarten" (line 8), "Garſzen" (line10).


My final example is from an early 20th century book, set in what I would describe as a very modern and readable rounded semi-Fraktur typeface.


Hermann Hesse, Aus Indien (Berlin: S. Fischer, 1913) p.11

"groſze" (lines 5 and 8), "daſz" (line 7), "weiſz" (line 10).


When German books were printed in Antiqua typefaces, as they occasionally were from the 17th century onwards, the "ſz" ligature was replaced by the "sharp s" ß. What I'm not entirely certain about is whether the ß in such books was graphically an "ſz" ligature modelled after the Schwabacher form of the "ſz" ligature, or whether it was a borrowing of the "ſs" ligature from the Italic tradition. Unfortunately I haven't got access to any early non-blackletter German books, which may have helped shed some light on the matter. Anyhow, whether the ß was originally conceived of as an "ſz" ligature or an "ſs" ligature, it later became identified as an "ſs" ligature, which is why ß normally uppercases to "SS" rather than "SZ" as might have been expected.



Addendum [2006-07-17]

I have been browsing through some of the many Italian books published during the 16th and 17th centuries that are available on-line from the British Library at Renaissance Festival Books, and I was surprised at how relatively uncommon the ligatured long-s short-s (ß) is in these works. It is found in some books, but only in text printed in italic typeface (e.g. grandißima, meßa, etc. [1539], dignißimi [1549], reuerendiß [1574], deuotißimo [1579], appreßo [1587], neceßaria [1600], Serenißimo [1613]), and in most cases alongside words spelled with double long-s or unligatured long-s short-s or even short-s long-s; there does not appear to be any clear rule as to when ligatured long-s short-s is used in preference to double long-s.

On the other hand, many of the books, especially from the middle of the 16th century onwards, do use a common rule for double s in text set in roman typeface; namely unligatured long-s short-s at the end of a word and in the middle of a word before a letter 'i', but double long-s in the middle of a word before any other vowel. This rule is illustrated in the example below.


Descrizione della entrata delle serenissima regina Giovanna d'Austria ... (Florence, 1566) p.6

"amatiſsimo" (line 1), "Ducheſſa (line 2), "Illustriſs." (line 3), etc.