Thursday, 24 November 2005

How many Unicode characters are there ?

Otto Stolz asked on the Unicode List how many Unicode characters there were, classified as control characters, format characters, graphic characters, private use chacters, noncharacters, surrogate code points, etc. Now I love Unicode facts, figures and trivia, so I can't resist trying to answer this question.

The "Unicode Version History" utility of BabelMap provides precisely the information requested by Otto for all versions of Unicode from 1.0.0 up to the current version (4.1 when I first wrote this post,). This information is tabulated below :

I have now moved the detailed tables and charts of Unicode data to my website at How many Unicode characters are there ? If you don't want to go there, the short answer is 128,172 for Unicode 9.0 (released June 2016).

You may also be interested in:

[Last updated : 2016-06-22]


Suz said...

What about Phoenician? Is it in or out?


Andrew said...

In. Please see todays blog "What's new in Unicode 5.0".

crasshopper said...

Awesome. I was just thinking about ways to come up with easy to imagine large numbers. Lots of people know what Unicode is, now I can say there are approx 120,000 Unicode symbols so the number of possible permutations of the Unicode "alphabet" would be 120,000!, a number with 557,389 digits.!

Goodbye, protein example!

Taro said...

According to , there are 107,156 graphic characters in Unicode 5.2.

In your article, the number is 107,154.

Do you know the reason of this difference?

Andrew West said...

Hi Taro,

I have checked the figures, and I can confirm that my table is correct.

My table gives 107,154 graphic characters and 142 format characters for Unicode 5.2; whereas as the Unicode page gives 107,156 graphic characters and 140 format characters, which is the same total but a different distribution between graphic and format characters. The Unicode web site's figure of 140 format characters only takes into account the 140 characters with general category = Cf, whereas the Unicode Standard ch. 2 Table 2-3 defines format characters as those characters with gc=Cf|Zl|Zp. My figure of 142 for format characters is calculated as 140 Cf + 1 Zl + 1 Zp, and my figure for graphic characters excludes the two Zl/Zp characters.

I will report the issue with the statistics to Unicode.

Andrew West said...

Just to clarify, the Unicode 5.2 Table 2-3 also defines format characters as Cf + Zl + Zp.

Interestingly, the Unicode 6.0 page gives the correct figures for format characters, and notes that they are derived from Cf + Zl + Zp.

Taro said...

I understood very well.
Thank you so much.

Tomi Adewole said...

Awesome...but is there a page where one can actually see (and thus copy-and-paste) these unicode images? Thanks in advance...

Andrew West said...

PDF code charts covering all Unicode characters are available from the Unicode Consortium website (here).