Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the world’s writing systems. Developed in conjunction with the Universal Coded Character Set (UCS) standard and published as The Unicode Standard, the latest version of Unicode contains a repertoire of more than 128,000 characters covering 135 modern and historic scripts, as well as multiple symbol sets. The standard consists of a set of code charts for visual reference, an encoding method and set of standard character encodings, a set of reference data files, and a number of related items, such as character properties, rules for normalization, decomposition, collation, rendering, and bidirectional display order (for the correct display of text containing both right-to-left scripts, such as Arabic and Hebrew, and left-to-right scripts).[1] As of June 2016, the most recent version is Unicode 9.0. The standard is maintained by the Unicode Consortium.
Unicode’s success at unifying character sets has led to its widespread and predominant use in the internationalization and localization of computer software. The standard has been implemented in many recent technologies, including modern operating systems, XML, Java (and other programming languages), and the .NET Framework.
Unicode can be implemented by different character encodings. The most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII character, all of which have the same code values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit (two 8-bit bytes) for each character but cannot encode every character in the current Unicode standard. UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units (4 × 8 bits) to handle each of the additional characters.

The Unicode Consortium

Welcome! The Unicode Consortium enables people around the world to use computers in any language. Our freely-available specifications and data form the foundation for software internationalization in all major operating systems, search engines, applications, and the World Wide Web. An essential part of our mission is to educate and engage academic and scientific communities, and the general public.


Unicode Character Range

Bit Unicode Range Block range
0 Basic Latin 0000-007F
1 Latin-1 Supplement 0080-00FF
2 Latin Extended-A 0100-017F
3 Latin Extended-B 0180-024F
4 IPA Extensions 0250-02AF
Phonetic Extensions 1D00-1D7F
Phonetic Extensions Supplement 1D80-1DBF
5 Spacing Modifier Letters 02B0-02FF
Modifier Tone Letters A700-A71F
6 Combining Diacritical Marks 0300-036F
Combining Diacritical Marks Supplement 1DC0-1DFF
7 Greek and Coptic 0370-03FF
8 Coptic 2C80-2CFF
9 Cyrillic 0400-04FF
Cyrillic Supplement 0500-052F
Cyrillic Extended-A 2DE0-2DFF
Cyrillic Extended-B A640-A69F
10 Armenian 0530-058F
11 Hebrew 0590-05FF
12 Vai A500-A63F
13 Arabic 0600-06FF
Arabic Supplement 0750-077F
14 NKo 07C0-07FF
15 Devanagari 0900-097F
16 Bengali 0980-09FF
17 Gurmukhi 0A00-0A7F
18 Gujarati 0A80-0AFF
19 Oriya 0B00-0B7F
20 Tamil 0B80-0BFF
21 Telugu 0C00-0C7F
22 Kannada 0C80-0CFF
23 Malayalam 0D00-0D7F
24 Thai 0E00-0E7F
25 Lao 0E80-0EFF
26 Georgian 10A0-10FF
Georgian Supplement 2D00-2D2F
27 Balinese 1B00-1B7F
28 Hangul Jamo 1100-11FF
29 Latin Extended Additional 1E00-1EFF
Latin Extended-C 2C60-2C7F
Latin Extended-D A720-A7FF
30 Greek Extended 1F00-1FFF
31 General Punctuation 2000-206F
Supplemental Punctuation 2E00-2E7F
32 Superscripts And Subscripts 2070-209F
33 Currency Symbols 20A0-20CF
34 Combining Diacritical Marks For Symbols 20D0-20FF
35 Letterlike Symbols 2100-214F
36 Number Forms 2150-218F
37 Arrows 2190-21FF
Supplemental Arrows-A 27F0-27FF
Supplemental Arrows-B 2900-297F
Miscellaneous Symbols and Arrows 2B00-2BFF
38 Mathematical Operators 2200-22FF
Supplemental Mathematical Operators 2A00-2AFF
Miscellaneous Mathematical Symbols-A 27C0-27EF
Miscellaneous Mathematical Symbols-B 2980-29FF
39 Miscellaneous Technical 2300-23FF
40 Control Pictures 2400-243F
41 Optical Character Recognition 2440-245F
42 Enclosed Alphanumerics 2460-24FF
43 Box Drawing 2500-257F
44 Block Elements 2580-259F
45 Geometric Shapes 25A0-25FF
46 Miscellaneous Symbols 2600-26FF
47 Dingbats 2700-27BF
48 CJK Symbols And Punctuation 3000-303F
49 Hiragana 3040-309F
50 Katakana 30A0-30FF
Katakana Phonetic Extensions 31F0-31FF
51 Bopomofo 3100-312F
Bopomofo Extended 31A0-31BF
52 Hangul Compatibility Jamo 3130-318F
53 Phags-pa A840-A87F
54 Enclosed CJK Letters And Months 3200-32FF
55 CJK Compatibility 3300-33FF
56 Hangul Syllables AC00-D7AF
57 Non-Plane 0 * D800-DFFF
58 Phoenician 10900-1091F
59 CJK Unified Ideographs 4E00-9FFF
CJK Radicals Supplement 2E80-2EFF
Kangxi Radicals 2F00-2FDF
Ideographic Description Characters 2FF0-2FFF
CJK Unified Ideographs Extension A 3400-4DBF
CJK Unified Ideographs Extension B 20000-2A6DF
Kanbun 3190-319F
60 Private Use Area (plane 0) E000-F8FF
61 CJK Strokes 31C0-31EF
CJK Compatibility Ideographs F900-FAFF
CJK Compatibility Ideographs Supplement 2F800-2FA1F
62 Alphabetic Presentation Forms FB00-FB4F
63 Arabic Presentation Forms-A FB50-FDFF
64 Combining Half Marks FE20-FE2F
65 Vertical Forms FE10-FE1F
CJK Compatibility Forms FE30-FE4F
66 Small Form Variants FE50-FE6F
67 Arabic Presentation Forms-B FE70-FEFF
68 Halfwidth And Fullwidth Forms FF00-FFEF
69 Specials FFF0-FFFF
70 Tibetan 0F00-0FFF
71 Syriac 0700-074F
72 Thaana 0780-07BF
73 Sinhala 0D80-0DFF
74 Myanmar 1000-109F
75 Ethiopic 1200-137F
Ethiopic Supplement 1380-139F
Ethiopic Extended 2D80-2DDF
76 Cherokee 13A0-13FF
77 Unified Canadian Aboriginal Syllabics 1400-167F
78 Ogham 1680-169F
79 Runic 16A0-16FF
80 Khmer 1780-17FF
Khmer Symbols 19E0-19FF
81 Mongolian 1800-18AF
82 Braille Patterns 2800-28FF
83 Yi Syllables A000-A48F
Yi Radicals A490-A4CF
84 Tagalog 1700-171F
Hanunoo 1720-173F
Buhid 1740-175F
Tagbanwa 1760-177F
85 Old Italic 10300-1032F
86 Gothic 10330-1034F
87 Deseret 10400-1044F
88 Byzantine Musical Symbols 1D000-1D0FF
Musical Symbols 1D100-1D1FF
Ancient Greek Musical Notation 1D200-1D24F
89 Mathematical Alphanumeric Symbols 1D400-1D7FF
90 Private Use (plane 15) F0000-FFFFD
Private Use (plane 16) 100000-10FFFD
91 Variation Selectors FE00-FE0F
Variation Selectors Supplement E0100-E01EF
92 Tags E0000-E007F
93 Limbu 1900-194F
94 Tai Le 1950-197F
95 New Tai Lue 1980-19DF
96 Buginese 1A00-1A1F
97 Glagolitic 2C00-2C5F
98 Tifinagh 2D30-2D7F
99 Yijing Hexagram Symbols 4DC0-4DFF
100 Syloti Nagri A800-A82F
101 Linear B Syllabary 10000-1007F
Linear B Ideograms 10080-100FF
Aegean Numbers 10100-1013F
102 Ancient Greek Numbers 10140-1018F
103 Ugaritic 10380-1039F
104 Old Persian 103A0-103DF
105 Shavian 10450-1047F
106 Osmanya 10480-104AF
107 Cypriot Syllabary 10800-1083F
108 Kharoshthi 10A00-10A5F
109 Tai Xuan Jing Symbols 1D300-1D35F
110 Cuneiform 12000-123FF
Cuneiform Numbers and Punctuation 12400-1247F
111 Counting Rod Numerals 1D360-1D37F
112 Sundanese 1B80-1BBF
113 Lepcha 1C00-1C4F
114 Ol Chiki 1C50-1C7F
115 Saurashtra A880-A8DF
116 Kayah Li A900-A92F
117 Rejang A930-A95F
118 Cham AA00-AA5F
119 Ancient Symbols 10190-101CF
120 Phaistos Disc 101D0-101FF
121 Carian 102A0-102DF
Lycian 10280-1029F
Lydian 10920-1093F
122 Domino Tiles 1F030-1F09F
Mahjong Tiles 1F000-1F02F
123-127 Reserved for process-internal usage  

GNU Unifont Glyphs


GNU Unifont is part of the GNU Project. This page contains the latest release of GNU Unifont, with glyphs for every printable code point in the Unicode 9.0 Basic Multilingual Plane (BMP). The BMP occupies the first 65,536 code points of the Unicode space, denoted as U+0000..U+FFFF. There is also growing coverage of the Supplemental Multilingual Plane (SMP), in the range U+010000..U+01FFFF, and of Michael Everson’s ConScript Unicode Registry (CSUR).
These font files are licensed under the GNU General Public License, either Version 2 or (at your option) a later version, with the exception that embedding the font in a document does not in itself constitute a violation of the GNU GPL.


Using special characters from Windows Glyph List 4 (WGL4) in HTML

Microsoft has defined a Pan-European character set known as Windows Glyph List 4 (WGL4). It contains characters that are required for Western, Central and Eastern European languages, and includes Cyrillic and Greek alphabets and many characters for which Monotype’s Symbol font was previously required. The WGL4 standard incorporates codepages 1250 (Eastern Europe), 1251 (Cyrillic), 1252 (US English = ANSI), 1253 (Greek) and 1254 (Turkish). WGL4 contains 652 characters (compared with 256 in the old codepages), and uses Unicode numbering for the characters.
The new characters can easily be used in Web pages, either by adding &# to the front of the decimal number and ; to the end (e.g. ‰ for a per mille sign (‰)), or by using the new HTML 4 character entity references (although not all WGL4 characters have corresponding entities). However, support for the new characters varies between browsers. The tables below use the decimal numeric character references to display the characters in the first column, and can be used to determine which characters are supported by your browser and fonts.
The new characters can be used in Word 97, Word 2000, Word 2002, Word 2003 and Word 2007 by selecting Symbol on the Insert menu and choosing a character from the dialog box.
There are many Unicode fonts that include the WGL4 characters.

