Saturday, 16 April 2011

Unicode 6.0 — One character at a time

A recent youtube video by jörg piringer that scrolls through "all" 49,571 Unicode characters in 33 minutes and 16 seconds (25 characters a second) has been doing the rounds, but I'm afraid that I was not impressed. The 49,571 characters in the video only cover the BMP, and even then it is 5,000 characters short, missing out most of the characters that have been added to Unicode over the past ten years, and missing out entirely some scripts that have been in Unicode since Year Zero.

Unicode version 6.0 (released October 2010) actually defines 109,384 characters (109,244 graphic and 140 format characters). How many of them you are able to see depends upon your operating system, your browser and whether you have additional fonts installed covering obscure and recently encoded scripts and characters (and whether your browser will actually apply those fonts or not). On my Windows 7 SP1 machine, with no additional fonts installed, I can see 95,372 of these 109,384 characters (87.1% coverage of total number of characters, but only fully covering 66 out of 203 blocks, and 85 blocks with no coverage at all).

Anyway, I've made my own attempt at a javascript-based "video" that goes through the entire 109,384 characters in Unicode 6.0 at this page, which you can launch in a new window by clicking on the image below. Then if you have 3 hours 2 minutes and 18 seconds to spare (at the default 10 characters per second) just hit the "Start" button, and see how well your system does. By default it lets your browser choose what fonts to use, but if you have additional fonts installed covering obscure scripts and recent Unicode additions which are not being applied by your browser, then try checking the "Use Custom Fonts" button and it will apply a custom list of pan-Unicode and script-specific fonts for each block that can give up to 99% coverage of the 109,384 characters if you have the appropriate fonts installed.

[2011-04-17 Update : to scroll through random characters from random blocks check the "Random Characters" checkbox before hitting "Start"; to view a single random character hit the "Random Character" button; to view a specific Unicode character enter its hexadecimal code point value in the text box and hit "Go To".]

[2011-04-19 Update : to find a character by whole or partial name (regular expressions not currently supported) enter the name in the text box and hit "Search For"; if not searching by exact name, keep hitting "Search For" to find the next matching character.]

[2011-04-21 Update : fixed skip blocks bug; added formal aliases; slide show page now accepts parameters to initially show a particular character (?char=A1B2), show a random character (?char=random), search for a given character name or part of a name (?find=string), or find an exact character name (?name=character name).]