Sunday, 6 April 2008

BabelMap Version 5.1.0.0

To coincide with Friday's release of Unicode 5.1.0 I am releasing an updated version of BabelMap which supports all 100,713 characters encoded in Unicode 5.1 (1,624 new characters and 11 new scripts).

In addition to the support for Unicode 5.1 this version also has the following improvements (most of which I only added in last week, which is why it was released two days late). However, I am still working on a major new version for release later in the year which will solve (what I consider to be) the main problem with BabelMap—the fact that the "edit buffer" only supports a single font, and so text in multiple scripts may display badly or as boxes.

1. A new "Font Info" dialog box has been added (available from the Tools menu or as a button in the Font Analysis utility). This gives detailed information about the currently selected font, currently all the information from the font's NAME table (for all platforms, encodings and languages supported by the font) and a list of all CMAP subtables in the font. This is my first experiment in providing information directly from the font tables, and in the future I might include more information from other tables if there is a demand. You can find out some very interesting things about your fonts from this dialog; for example I was very surprised to see just how many fonts there are that have a Unicode 1.0 or 1.1 semantics CMAP subtable, even though I very much doubt that the subtable mappings really do accord to Unicode 1.0 or 1.1 (i.e. Hangul symbols are mapped to where CJK-A now is).



2. The Composite Font configuration dialog ("Configure" button next to the "Composite Font" radio button) has been improved and simplified (largely in response to suggestions by John Cowan). There is now a simple correspondence between a single Unicode block selected and a list of fonts that are available for mapping to that block. This makes the configuration tool much easier to use, although it does mean that it is no longer possible to map a single font to multiple Unicode blocks in a single operation. The list of fonts covering a particular Unicode block are now also sortable by name or by number of characters that they cover, which should make it easier to find the font with the best coverage for any block.



I have also added an "Auto" button that will attempt to automatically configure the best composite font by mapping the font with the best coverage for each block, whilst at the same time using as few different fonts as possible. The results produced may not always be brilliant because the number of characters in a font's CMAP table is not necessarily the best indicator that the font has good coverage and support for a particular Unicode subset, especially for complex scripts. Another problem is that some fonts distort their actual coverage by including explicit blank or not defined glyphs for characters that they don't cover, which may make them seem as if they have good coverage, when in fact they don't (for example "Ming(for ISO10646)" has mapping for all 6,582 CJK-A characters but only a handful of them are non-blank). To avoid running the risk of getting every block mapped to a last resort font, I have explicitly excluded from the auto-configuration process any font which includes the string "LastResort" or "fallback" in its name.

And as a final touch I have added coverage statistics for the current configuration—a prize to the first person to achieve 100% coverage!


3. Related to the changes in the way the Font Configuration dialog works, I have also improved the the way that the default font mappings are assigned the first time that the application is run. This means that there may be a delay of several seconds the first time BabelMap is run (and also the first time it is run after upgrading from a version of BabelMap that supports a prior version of Unicode). This time is used to auto-configure the composite font and determine which font on your system has the greatest coverage, so that it can be set as the initial single font.


4. The Character Properties dialog (the "?" button or F9) has been extended to include the following additional information about characters :

  • XID_Start and XID_Continue have been added to the list of binary properties for each character.
  • Joining Type and Joining Group (for Arabic and related scripts) have been added.
  • All ideographic variation sequences (IVS) that are defined in the Ideographic Variation Database (currently only the Adobe-Japan1 collection) are listed under both the relevant base character [a CJK unified ideograph] and the relevant variation selector [VS17 through VS31] (e.g. <9089 E010E> [Adobe-Japan1 CID+20233] is listed under both U+9089 and U+E010E).
  • All currently defined named sequences and provisional named sequences are listed under the first character in the sequence (e.g. <0045 0329> LATIN CAPITAL LETTER E WITH VERTICAL LINE BELOW is listed under U+0045 LATIN CAPITAL LETTER E).

5. The character grid font size can now be adjusted from the new "Fonts" menu. Generally speaking, most glyphs for most fonts fit in their cell comfortably at the default font size, but some fonts have glyphs that are smaller or larger than typical at the default font size, and may not display well. This new feature allows you to adjust the font size used for the character grid display if you are having display problems.



Latest Version [2008-06-12]

BabelMap Version 5.1.0.4 incorporates a number of minor bug fixes and improvements to the user interface, as well as workarounds for fonts with invalid data (e.g. Caslon Roman/Italic, Matisse ITC 1.00). The most important change is to fix a bug that causes BabelMap to become unresponsive if you have installed Apple's Last Resort Font. If you are considering installing the last Resort Font you should first upgrade to the latest versions of BabelMap and BabelPad.


38 comments:

John Cowan said...

Well, I am no longer on Windows, but BabelMap works pretty well on Wine as well. One thing that doesn't work, ironically, is the composite font feature: it isn't able to find any fonts that contain the block you ask about -- presumably because some Windows API is unimplemented.

Andrew West said...

That's a bummer. The most likely culprit is GetFontData(), but if this isn't implemented in Wine then I can imagine a lot things in BabelMap not working properly.

I am trying to get away from a relience on the Windows API, and in the next version the code will be far more portable, though I'm afraid that it'll never run natively on Linux. But anyway, please do keep me informed of things that work or don't under Wine.

jedi787plus said...

OK, now someone just needs a BMP-only version of BabelMap for his/her Pocket PC, since all those Pocket PC character-map utilities out there either lack functionality (font selection, codepoint-value/ISO-10646-name info, etc.) or are simply rubbish. Any idea when might such "BabelMap Mobile" become available for Pocket PCs? (All Pocket PCs are Unicode-based ever since their inception back in 2000, unlike Palm PDAs for which I still doubt any Unicode support today.)

Andrew West said...

Leroy,

I'll look into the possibility of a simplified version of BabelMap for Pocket PC (surrogates should not be a problem).

Andrew West said...

I have fixed the main problem of BabelMap running under Wine (a redundant reliance on the Unicode coverage bits in the OS/2 table, which the Wine implementation of the function I call always returns as all being set to zero).

However there is one more problem with the display under Wine that needs fixing before I release an update -- this is where BabelMap gets font metrics indirectly via the Windows API ... but if I read the data directly from the font table using GetFontData() it should work I think.

అనుపమ said...

I have downloaded it and checked for the Characters 0C58 and 0C59 and could not find them. In their places only the rectangle box is appearing. Further also I could not find any of the new Characters introduced for this language Telugu. Why is it so.

Andrew West said...

You need too have a font installed that supports these characters.

Please download and install the latest version of the Code2000 font which suports many of the new Unicode 5.1 characters, including Telugu.

Once Code2000 1.17 is installed run BabelMap, and either:

a) click on "Single Font" and select "Code2000" from the font list

or

b) click on "Composite Font", click on the "Configure" button and either manually set Code2000 as the font to use for Telugu or simply click on the "Auto" button (and then hit "OK" to close the dialog)

If you do this you should be able to see the new Telugu characters.

asmodai said...

I love the coverage checker Andrew!

jedi787plus said...

The new version of BabelMap fails to run on my Windows Vista computer. Whenever I try to run it (even right-clicking and choosing to run it as administrator), I get the infamous "has stopped working" error message.

jedi787plus said...

OK, it was a problem with Windows, not with BabelMap. I re-installed Windows Vista and the new BabelMap works fine. Sorry for the alarm.

Andrew West said...

I'm glad to hear. Sorry about not replying to your comment but I don't currently have a Vista system to test on, so I couldn't really say anything. Anyhow, I've just ordered a new laptop a few days ago, so I have all the delights of Vista to look forward to!

Chris Weber (Casaba Security) said...

Hi Andrew, I just wanted to say thanks for making such a killer tool. I've used BabelMap quite a bit in security testing, and to help me understand and research Unicode. It's also helped me find some flaws recently in popular web browsers which the vendor should release advisories about. Thanks for making BabelMap available.

tty01 said...

99.6%!? I need your CompositeFont.xml :-)

Andrew West said...

Before 5.1 was released I was going to write a post entitled How to Get 100% Unicode Coverage with links to all the fonts that I use (freeware and shareware only). But then 5.1 came along, and my coverage went right down (not that I could actually achieve 100% coverage with 5.0 -- I have yet to find a single Unicode Balinese font).

Now that there are quite a few fonts that cover 5.1 (see footnote to this post) perhaps it is time to revise my font list and publish the post.

jedi787plus said...

Yes, I haven't found any Unicode fonts for Balinese either, and even now with Unicode 5.1, I still have to see fonts for Sundanese and Lepcha. Also, I still need to find fonts that cover: the 28 Uni5.1 additions to the Combining Diacritic Marks Supplement region; the complete Cyrillic Extension A repertoire; the Uni5.1 additions to the Arabic and Arabic Supplement areas; 8 Tibetan codepoints not covered in all the other Tibetan fonts; and the single Uni5.1 char added to Mongolian. I would also like an Arabic font with all Unicode-defined presentation forms (especially those complex yet beautiful ligatures; Arial Unicode MS's Arabic presentation ligatures are not proper ligatures at all).

Andrew West said...

It's still a bit too soon to be expecting good support for Unicode 5.1. I'm sure James Kass will be working on Lepcha and Sundanese, but unfortunately he does not release new versions of Code2000 very frequently. Your best bet for combining diacritical marks and Cyrillic extensions would be to put in a request to the DejaVu project.

Andrew West said...

The new version of Everson Mono released today has complete 5.1 coverage of Combining Diacritical Marks and Cyrillic Extended A and B.

jedi787plus said...

I stumbled upon a new serifed font that covers all Latin/Greek/Cyrillic Uni5.1 characters (including Cyrillic Extensions A & B):

http://kodeks.uni-bamberg.de/AKSL/Schrift/RomanCyrillicStd.htm

Well, actually, it's a pair of fonts that are essentially the same except for the name.

Latest version of DejaVu still has no Cyrillic Extension A and its Extension B coverage is very small at 24 chars. So for sans-serifed fonts I'm out of luck, will have to stick to serifed CampusRoman Std for full Cyrillic. I just hope Bill Gates's successor adds full Cyrillic for "Windows 7" (or shall I call it Windows Vista Plus? It's really Windows ver.6.1, not Windows ver.7.0 as someone would assume, just like Windows Mobile 6.1 is actually Windows CE 5.2 Pocket PC Edition)

Still no luck with Balinese, Lepcha, Sundanese, or Arabic Uni5.1 additions, though...

Andrew West said...

Support for the Cyrillic extensions seems to be growing. I maintain a list of fonts with Unicode 5.1 coverage at the bottom of my What's new in Unicode 5.1 ? page, whch includes a link to the pair of fonts that you mention.

The problem with minor scripts such as Balinese, Lepcha and Sundanese is that if the proposers of the script in question do not release a Unicode font then nobody other than James Kass will. If I had my way, new characters would only be accepted for encoding on the condition that the proposers put a Unicode font for the proposed new characters in the public domain.

PapaNadin said...

Hello,

As for Sundanese font, please try this implementation:
http://jamparing.sytes.net/aksara/files/font/SundaneseUnicode-1.0.3.ttf

Andrew West said...

Thanks, that is very useful. I have now add it to my list of Unicode 5.1 fonts in What's new in Unicode 5.1 ?

Vlatko said...

Man your program is amazing. Portable and has all possible info you need to play with fonts.

I really like that it is possible to look for fonts with Cyrillic page included.

What I wanted to ask...can you..you can..but please..will you? do a small font viewer..something like the normal MS Viewer in XP but with custom unicode string? and maybe some more info on the top. It would be great for a new downloaded fonts..fast inspection on double click! XP Viewer is very limited and does not show Cyrillic letters

THANKS

jedi787plus said...

I fear Andrew is too busy working on a small Windows-Mobile version of BabelMap, if that is the case.

Also, the Sundanese Aksara font seems to be no longer available - the link no longer works.

Andrew West said...

I'm afraid that I have been too busy on other stuff (basically just Tangut and Tangut) over the last six months to even look at BabelMap or BabelPad. There is zero chance of me being able to create any BabelMap derivitives, and the best I can hope for is that I will be able to get the new version of BabelMap ready in time for the release of Unicode 5.2 in the Fall. I'm hoping to be able to put aside six months from May to work on BabelMap and BabelPad, but I'm getting squeezed from all sides, so I really don't know whether it will work out or not ...

Andrew West said...

a small font viewer..something like the normal MS Viewer in XP but with custom unicode string? and maybe some more info on the top.

I did have plans for a font explorer application that would do as you wanted, and about a year back started work on it, but it has all fallen by the wayside. At present I really can't say whether I will be able to get back to work on it or not.

Incidentally, the most frequent application request I get is for a command line version of BabelPad (i.e. a command line app that does all the encoding conversions and text transformations that Babelpad supports).

Andrew West said...

the Sundanese Aksara font seems to be no longer available - the link no longer works.

Try http://sabilulungan.org/aksara/ (I will update the link on my Unicode 5.1 page)

Vlatko said...

Ok I will make my request MUCH easier. When you will make the new version of the program (or semi next)...add this..please...option to accept font file as parameter.. %1 thing. Then treat that file as file/add uninstalled font..it will make double clicking on a font file to auto open in babel map...and an option ..maybe just external plain utf-8 text file that will contain a sentence that will be reopened in edit buffer on every new font. It will do the job. No need of registry settings.

THANKS

hotaru said...

And as a final touch I have added coverage statistics for the current configuration—a prize to the first person to achieve 100% coverage!

i've managed to get 99.8% coverage... the only blocks that aren't completely covered with my current configuration are the Variation Selectors Supplement and the two Supplementary Private Use Areas... and i'm pretty sure those shouldn't count anyway.

Andrew West said...

That's pretty good -- the best I can get is 99.7%. The PUA areas don't count, but the Variation Selectors Supplement does (there is at least one free font that covers it). In addition, some fonts shouldn't count, for example unifont has 100% coverage of the BMP, but has low quality bitmap glyphs which are not really usable.

I'm planning a blog post on how to get 100% Unicode font coverage soon ... or at least sometime before Unicode 5.2 comes out in the autumn.

hotaru said...

That's pretty good -- the best I can get is 99.7%. The PUA areas don't count, but the Variation Selectors Supplement does (there is at least one free font that covers it).

i'd really like to know what that font is... the best i've been able to find is the various YOzFont* fonts, which only have 16 characters in that block.

In addition, some fonts shouldn't count, for example unifont has 100% coverage of the BMP, but has low quality bitmap glyphs which are not really usable.

ah, i am using unifont for a few blocks... arabic (235/250 without it), Arabic Presentation Forms-A (593/595 without it), Arabic Supplement (30/48 without it), Mongolian (155/156 without it), and Balinese and Lepcha (which i can't find any other fonts for).
if unifont doesn't count then i'm down to 99.5%... i agree that it's not really usable for most things, but for the balinese and lepcha blocks in babelpad it's very useful... unless there are other free fonts that cover those blocks that i don't know of.

Mrvnhc said...

Seem to have found a bug with the latest version(5.1.0.5).
The CompositeFont.xml will corrupt if the fonts contain chinese characters.

Andrew West said...

Thanks for the bug report. I have just tested this, and can confirm that it is indeed a bug in Babelmap/BabelPad. It will be fixed in the next version of BabelMap, which I anticipate will be released at the end of October, when Unicode 5.2 is released.

Klortho said...

Is there any way to look up, for a *specific character* which fonts on the current system cover that character? I see that I can get lists of which fonts cover various blocks, but since "cover a block" doesn't necessarily mean "has every character in that block", that's not exactly what I want.

Andrew West said...

Not at present. That is one of the most popular feature requests, but I am afraid I have not got round to implementing it yet. It won't be in the new version of BabelMap scheduled for release later today, but I will try to add such a feature to the next update (sometime before the end of this year).

Juergen said...

Having two, or more, different fonts of the same name. One installed, the other(s) not, of course. Is there a way to look at the uninstalled fonts? When I open an uninstalled font I appear to be looking at the installed font. Am I right?

Andrew West said...

If you use File > Add Uninstalled Fonts... then the font you choose will be temporarily installed for the current instance of BabelMap only. If a font with exactly the same name already exists then they will both be viewable from BabelMap -- the newly added font will be the one that is currently selected.

BUT, there is no way for either the user or BabelMap to know for sure which of the two identically named fonts the characters shown by BabelMap come from. BabelMap gets the font data from Windows using the font name, so it is quite possible (or even probable) that the font data returned will always be for the permanently installed font. The same is true if you have two permanently installed identically named fonts (this can happen if you have an independent font, e.g. batang.ttf installed as well as the same-named font as part of a True Type Collection, e.g. batang.ttc).

Theoretically BabelMap could get round this by reading the font data directly from file rather than getting it from Windows, but that would be a lot of extra work for me, for very little benefit.

In summary, you are advised not to install two or more fonts with the same name at the same time.

jedi787plus said...

I once saw an utility that changed the name (not the filename) of a TrueType or OpenType font, maybe you can search for it (unfortunately I don't remember how it was called, that was about three or four years ago); however, I don't know if it worked for copyrighted fonts (or fonts with special restrictions/licenses built-in)

Juergen said...

This was just meant as a simple question since a Help file does not exist.

I know I can use other programs(*) or uninstall or even rename it.
But in this case I am using the font and, because of another program, am forced to keep the name.
The font is still being developed, so I wanted to see what is being changed as I get newer versions. Uninstalling and re-installing is just a hassle I wanted to avoid.

(*)But as this font goes beyond Latin (it has Greek, Hebrew and other) characters, and many glyph view programs show only 128 characters, I haven't found the perfect program yet.