Mapping of Unicode character planes - AbsoluteAstronomy.com

In the Unicode

Unicode

Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

system, planes are groups of numerical values that point to specific characters. Unicode code point

Code point

In character encoding terminology, a code point or code position is any of the numerical values that make up the code space . For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112...

s are logically divided into 17 planes, each with 65,536 (= 2¹⁶) code points. Planes are identified by the numbers 0 to 16_decimal, which corresponds with the possible values 00-10_hexadecimal of the first two positions in six position format (hhhhhh). Six of these planes also have names.

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively mapped out for every current and ancient writing system (script) the Unicode consortium has been able to identify. While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain. Even if previously unknown scripts with tens of thousands of characters are discovered, the limit of 1,114,112 code points is unlikely to be reached in the near future. The Unicode consortium has stated that limit will never be changed.

The odd-looking limit (it is not a power of 2) is due to the design of UTF-16. In UTF-16 a "surrogate pair" of two words is used to encode 2²⁰ code points (16 planes), plus single words are used to encode plane 0. It is not due to UTF-8

UTF-8

UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

, which was designed with a limit of 2³¹ code points (32768 planes), and can encode 2²¹ code points (32 planes) even if limited to 4 bytes.

Sometimes, the terms “astral plane” and “astral characters” are used informally to refer to the planes above the Basic Multilingual Plane (planes 1–16) and their characters.

Basic Multilingual Plane

The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing

Writing

Writing is the representation of language in a textual medium through the use of a set of signs or symbols . It is distinguished from illustration, such as cave drawing and painting, and non-symbolic preservation of language via non-textual media, such as magnetic tape audio.Writing most likely...

. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK

CJK

CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

) characters.

The High Surrogates (U+D800..U+DBFF) and Low Surrogate (U+DC00..U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.

, the BMP comprises the following blocks:

C0 Controls and Basic Latin (Basic Latin) (0000–007F)
C1 Controls and Latin-1 Supplement (0080–00FF)
Latin Extended-A (0100–017F)
Latin Extended-B (0180–024F)
IPA Extensions (0250–02AF)
Spacing Modifier Letters (02B0–02FF)
Combining Diacritical Marks (0300–036F)
Greek
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

and Coptic
Coptic alphabet
The Coptic alphabet is the script used for writing the Coptic language. The repertoire of glyphs is based on the Greek alphabet augmented by letters borrowed from the Demotic and is the first alphabetic script used for the Egyptian language...

(0370–03FF)
Cyrillic
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

(0400–04FF)
Cyrillic Supplement (0500–052F)
Armenian
Armenian alphabet
The Armenian alphabet is an alphabet that has been used to write the Armenian language since the year 405 or 406. It was devised by Saint Mesrop Mashtots, an Armenian linguist and ecclesiastical leader, and contained originally 36 letters. Two more letters, օ and ֆ, were added in the Middle Ages...

(0530–058F)
Hebrew
Hebrew alphabet
The Hebrew alphabet , known variously by scholars as the Jewish script, square script, block script, or more historically, the Assyrian script, is used in the writing of the Hebrew language, as well as other Jewish languages, most notably Yiddish, Ladino, and Judeo-Arabic. There have been two...

(0590–05FF)
Arabic
Arabic alphabet
The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...

(0600–06FF)
Syriac
Syriac alphabet
The Syriac alphabet is a writing system primarily used to write the Syriac language from around the 2nd century BC . It is one of the Semitic abjads directly descending from the Aramaic alphabet and shares similarities with the Phoenician, Hebrew, Arabic, and the traditional Mongolian alphabets.-...

(0700–074F)
Arabic Supplement (0750–077F)
Thaana (0780–07BF)
NKo
N'Ko
N'Ko is both a script devised by Solomana Kante in 1949 as a writing system for the Mande languages of West Africa, and the name of the literary language itself written in the script. The term N'Ko means 'I say' in all Manding languages....

(07C0–07FF)
Samaritan (0800–083F)
Mandaic
Mandaic alphabet
The Mandaic alphabet is based on the Aramaic alphabet, and is used for writing the Mandaic language.The Mandaic name for the script is Abagada or Abaga, after the first letters of the alphabet...

(0840–085F)
Indic scripts:
Devanagari
Devanagari
Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

(0900–097F)
Bengali
Bengali script
The Bengali alphabet is the writing system for the Bengali language. The script with variations is used for Assamese and is basis for Meitei, Bishnupriya Manipuri, Kokborok, Garo and Mundari alphabets. All these languages are spoken in the eastern region of South Asia. Historically, the script has...

(0980–09FF)
Gurmukhi (0A00–0A7F)
Gujarati
Gujarati script
The Gujarati script , which like all Nāgarī writing systems is strictly speaking an abugida rather than an alphabet, is used to write the Gujarati and Kutchi languages...

(0A80–0AFF)
Oriya
Oriya script
The Oriya script or Utkala Lipi or Utkalakshara is used to write the Oriya language, and can be used for several other Indian languages, for example, Sanskrit.- History :...

(0B00–0B7F)
Tamil
Tamil script
The Tamil script is a script that is used to write the Tamil language as well as other minority languages such as Badaga, Irulas, and Paniya...

(0B80–0BFF)
Telugu
Telugu script
Telugu script, an abugida from the Brahmic family of scripts, is used to write the Telugu language, a language found in the South-Central Indian state of Andhra Pradesh as well as several other neighboring states. The Telugu script is derived from the Bhattiprolu script...

(0C00–0C7F)
Kannada
Kannada script
The Kannada script is an alphasyllabary of the Brahmic family, used primarily to write the Kannada language, one of the Dravidian languages of southern India and also Sanskrit in the past. The Telugu script is derived from Old Kannada, and resembles Kannada script...

(0C80–0CFF)
Malayalam
Malayalam script
The Malayalam script is a Brahmic script used commonly to write the Malayalam language—which is the principal language of the Indian state of Kerala, spoken by 36 million people in the world. Like many other Indic scripts, it is an abugida, or a writing system that is partially “alphabetic” and...

(0D00–0D7F)
Sinhala (0D80–0DFF)
Thai (0E00–0E7F)
Lao (0E80–0EFF)
Tibetan
Tibetan script
The Tibetan alphabet is an abugida of Indic origin used to write the Tibetan language as well as the Dzongkha language, Denzongkha, Ladakhi language and sometimes the Balti language. The printed form of the alphabet is called uchen script while the hand-written cursive form used in everyday...

(0F00–0FFF)
Myanmar (1000–109F)
Georgian
Georgian alphabet
The Georgian alphabet is the writing system used to write the Georgian language and other Kartvelian languages , and occasionally other languages of the Caucasus such as Ossetic and Abkhaz during the 1940s...

(10A0–10FF)
Hangul Jamo (1100–11FF)
Ethiopic (1200–137F)
Ethiopic Supplement (1380–139F)
Cherokee
Cherokee syllabary
The Cherokee syllabary is a syllabary invented by Sequoyah to write the Cherokee language in the late 1810s and early 1820s. His creation of the syllabary is particularly noteworthy in that he could not previously read any script. He first experimented with logograms, but his system later developed...

(13A0–13FF)
Unified Canadian Aboriginal Syllabics (1400–167F)
Ogham
Ogham
Ogham is an Early Medieval alphabet used primarily to write the Old Irish language, and occasionally the Brythonic language. Ogham is sometimes called the "Celtic Tree Alphabet", based on a High Medieval Bríatharogam tradition ascribing names of trees to the individual letters.There are roughly...

(1680–169F)
Runic
Runic alphabet
The runic alphabets are a set of related alphabets using letters known as runes to write various Germanic languages before the adoption of the Latin alphabet and for specialized purposes thereafter...

(16A0–16FF)
Philippine scripts:
Tagalog
Baybayin
Baybayin , is a pre-Spanish Philippine writing system. It is a member of the Brahmic family and is recorded as being in use in the 16th century...

(1700–171F)
Hanunoo
Hanunó'o script
Hanunó’o is one of the indigenous scripts of the Philippines and is used by the Mangyan people of southern Mindoro to write the Hanunó’o language. It is an abugida descended from the Indic scripts, closely related to Baybayin, and is famous for being written vertical but written upward, rather than...

(1720–173F)
Buhid
Buhid script
Buhid, is an indigenous Brahmic script of the Philippines, closely related to Baybayin, and is used today by the Mangyans to write their language, Buhid.- Unicode :Buhid script was added to the Unicode Standard in March, 2002 with the release of version 3.2....

(1740–175F)
Tagbanwa (1760–177F)
Khmer
Khmer script
The Khmer script is an alphasyllabary script used to write the Khmer language . It is also used to write Pali among the Buddhist liturgy of Cambodia and Thailand....

(1780–17FF)
Mongolian
Mongolian script
The classical Mongolian script , also known as Uyghurjin, was the first writing system created specifically for the Mongolian language, and was the most successful until the introduction of Cyrillic in 1946...

(1800–18AF)
Unified Canadian Aboriginal Syllabics Extended (18B0–18FF)
Limbu
Limbu script
The Limbu script is used to write the Limbu language. The Limbu script is an abugida derived from the Tibetan script.-History:According to traditional histories, the Limbu script was first invented in the late 9th century by King Sirijonga Haang, then fell out of use, to be reintroduced in the 18th...

(1900–194F)
Tai Le
Tai Le script
Tai Le is the name of Tai Nüa script, the script used for the Tai Nüa language, given by Microsoft.-Unicode:Tai Le script was added to the Unicode Standard in April, 2003 with the release of version 4.0....

(1950–197F)
Tai Lue
Tai Lü language
Tai Lü is a language spoken by about 670,000 people in South East Asia. This includes 250,000 people in China, 200,000 in Burma, 134,000 in Thailand, and 5,000 in Vietnam...

(1980–19DF)
Khmer
Khmer script
The Khmer script is an alphasyllabary script used to write the Khmer language . It is also used to write Pali among the Buddhist liturgy of Cambodia and Thailand....

Symbols (19E0–19FF)
Buginese (1A00–1A1F)
Tai Tham (1A20–1AAF)

Balinese

Balinese script

The Balinese alphabet is an abugida that was used to write the Balinese language, an Austronesian language spoken by about three million people on the Indonesian island of Bali. The use of the Balinese script has mostly been replaced by the Roman alphabet. Although it is learned in school, few...

(1B00–1B7F)

Sundanese

Sundanese script

Sundanese script Sundanese script Sundanese script (Aksara Sunda, is a writing system which is used by some Sundanese people. It is built based on Old Sundanese script (Aksara Sunda Kuna) which was used by ancientSundanese between 14th and 18th centuries....

(1B80–1BBF)

Batak (1BC0–1BFF)

Lepcha

Lepcha script

The Lepcha script, or Róng script is an abugida used by the Lepcha people to write the Lepcha language. Unusually for an abugida, syllable-final consonants are written as diacritics.-History:...

(1C00–1C4F)

Ol Chiki

Ol Chiki script

The Ol Chiki script, also known as Ol Cemetʼ , Ol Ciki, Ol , was created in 1925 by Raghunath Murmu for the Santali language. Previously, Santali had been written with the Bengali alphabet, Oriya alphabet, or Latin alphabet, on the rare occasions it was written at all...

(1C50–1C7F)

Vedic

Vedic Sanskrit

Vedic Sanskrit is an old Indo-Aryan language. It is an archaic form of Sanskrit, an early descendant of Proto-Indo-Iranian. It is closely related to Avestan, the oldest preserved Iranian language...

Extensions (1CD0–1CFF)

Phonetic Extensions (1D00–1D7F)

Phonetic Extensions Supplement (1D80–1DBF)

Combining Diacritical Marks Supplement (1DC0–1DFF)

Latin extended additional (1E00–1EFF)

Greek Extended (1F00–1FFF)

Symbols

Unicode Symbols

In computing, in addition to encoding characters for the various writing systems used throughout the World, Unicode also devotes several blocks of characters to symbols that have a well-defined place in plain text. In Unicode there is a main distinction between "scripts" and "symbols". A character...

General Punctuation

Punctuation

Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.In written English, punctuation is vital to disambiguate the meaning of sentences...

(2000–206F)

Superscripts and Subscripts (2070–209F)

Currency Symbols (20A0–20CF)

Combining Diacritical Marks for Symbols (20D0–20FF)

Letterlike Symbols

Letterlike Symbols are graphemes which are constructed mainly from the glyphs of one or more letters.In Unicode, Letterlike Symbols are placed in the block U+2100–214F, as in the following table.-See also:*Mapping of Unicode characters...

(2100–214F)

Number Forms

Number Forms are Unicode characters which have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and roman numerals. They are placed in the Unicode codepoint range 0x2150 through 0x218F , except for three fractions in ISO-8859-1...

(2150–218F)

Arrows

Arrow (symbol)

An arrow is a graphical symbol such as → or ←, used to point or indicate direction, being in its simplest form a line segment with a triangle affixed to one end, and in more complex forms a representation of an actual arrow...

(2190–21FF)

Mathematical Operators

Unicode Mathematical Operators

Unicode ranges mathematical operators and symbols in multiple blocks.* Mathematical Operators * Miscellaneous Mathematical Symbols-A * Miscellaneous Mathematical Symbols-B...

(2200–22FF)

Miscellaneous Technical

Miscellaneous Technical (Unicode)

Miscellaneous Technical is the name of a a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language and academic professions....

(2300–23FF)

Control Pictures (2400–243F)

Optical Character Recognition

Optical character recognition

Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

(2440–245F)

Enclosed Alphanumerics (2460–24FF)

Box Drawing

Box drawing characters

Box drawing characters, also known as line drawing characters, or pseudographics, are widely used in text user interfaces to draw various frames and boxes...

(2500–257F)

Block Elements (2580–259F)

Geometric Shapes (25A0–25FF)

Miscellaneous Symbols

The Miscellaneous Symbols Unicode block contains various glyphs representing things from a variety of categories: Astrological, Astronomical, Chess, Dice, Ideological symbols, Musical notation, Political symbols, Recycling, Religious symbols, Trigrams, Warning signs and Weather.-Tables:Note: These...

(2600–26FF)

Dingbats (2700–27BF)

Miscellaneous Mathematical Symbols-A (27C0–27EF)

Supplemental Arrows-A (27F0–27FF)

Braille

The Braille system is a method that is widely used by blind people to read and write, and was the first digital form of writing.Braille was devised in 1825 by Louis Braille, a blind Frenchman. Each Braille character, or cell, is made up of six dot positions, arranged in a rectangle containing two...

Patterns (2800–28FF)

Supplemental Arrows-B (2900–297F)

Miscellaneous Mathematical Symbols-B (2980–29FF)

Supplemental Mathematical Operators (2A00–2AFF)

Miscellaneous Symbols and Arrows (2B00–2BFF)

Glagolitic

Glagolitic alphabet

The Glagolitic alphabet , also known as Glagolitsa, is the oldest known Slavic alphabet. The name was not coined until many centuries after its creation, and comes from the Old Slavic glagolъ "utterance" . The verb glagoliti means "to speak"...

(2C00–2C5F)

Latin Extended-C (2C60–2C7F)

Coptic (2C80–2CFF)

Georgian Supplement (2D00–2D2F)

Tifinagh

Tifinagh is a series of abjad and alphabetic scripts used by some Berber peoples, notably the Tuareg, to write their language.A modern derivate of the traditional script, known as Neo-Tifinagh, was introduced in the 20th century...

(2D30–2D7F)

Ethiopic Extended (2D80–2DDF)

Cyrillic Extended-A (2DE0–2DFF)

Supplemental Punctuation (2E00–2E7F)

East Asian scripts and symbols:

CJK

Radicals

Radical (Chinese character)

A Chinese radical is a component of a Chinese character. The term may variously refer to the original semantic element of a character, or to any semantic element, or, loosely, to any element whatever its origin or purpose...

Supplement (2E80–2EFF)

Kangxi Radicals (2F00–2FDF)

Ideographic Description Characters (2FF0–2FFF)

CJK Symbols and Punctuation (3000–303F)

Hiragana

is a Japanese syllabary, one basic component of the Japanese writing system, along with katakana, kanji, and the Latin alphabet . Hiragana and katakana are both kana systems, in which each character represents one mora...

(3040–309F)

Katakana

is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji, and in some cases the Latin alphabet . The word katakana means "fragmentary kana", as the katakana scripts are derived from components of more complex kanji. Each kana represents one mora...

(30A0–30FF)

Bopomofo

Zhuyin fuhao , often abbreviated as zhuyin and colloquially called bopomofo, was introduced in the 1910s as the first official phonetic system for transcribing Chinese, especially Mandarin....

(3100–312F)

Hangul

Hangul,Pronounced or ; Korean: 한글 Hangeul/Han'gŭl or 조선글 Chosŏn'gŭl/Joseongeul the Korean alphabet, is the native alphabet of the Korean language. It is a separate script from Hanja, the logographic Chinese characters which are also sometimes used to write Korean...

Compatibility Jamo (3130–318F)

Kanbun

The Japanese word originally meant "Classical Chinese writings, Chinese classic texts, Classical Chinese literature". This evolved into a Japanese method of reading annotated Classical Chinese in translation . Much Japanese literature was written in literary Chinese using this annotated style...

(3190–319F)

Bopomofo

Zhuyin fuhao , often abbreviated as zhuyin and colloquially called bopomofo, was introduced in the 1910s as the first official phonetic system for transcribing Chinese, especially Mandarin....

Extended (31A0–31BF)

CJK Strokes (31C0–31EF)

Katakana

Phonetic Extensions (31F0–31FF)

Enclosed CJK Letters and Months (3200–32FF)

CJK Compatibility (3300–33FF)

CJK Unified Ideographs Extension A (3400–4DBF)

Yijing Hexagram Symbols (4DC0–4DFF)

CJK Unified Ideographs

The Chinese, Japanese and Korean scripts share a common background. In the process called Han unification the common characters were identified, and named "CJK Unified Ideographs"...

(4E00–9FFF)

Yi Syllables (A000–A48F)

Yi Radicals (A490–A4CF)

Lisu

Fraser alphabet

The Fraser alphabet or Old Lisu Alphabet is an artificial script invented around 1915 by Sara Ba Thaw, a Karen preacher from Myanmar, and improved by the missionary James O. Fraser, to write the Lisu language. It is a single-case alphabet....

(A4D0–A4FF)

Vai (A500–A63F)

Cyrillic Extended-B (A640–A69F)

Bamum

Bamum script

The Bamum scripts are an evolutionary series of six scripts created for the Bamum language by King Njoya of Cameroon at the turn of the 20th century...

(A6A0–A6FF)

Modifier Tone Letters (A700–A71F)

Latin Extended-D (A720–A7FF)

Syloti Nagri (A800–A82F)

Common Indic Number Forms (A830–A83F)

Phags-pa

Phagspa script

The Phags-pa script was an alphabet designed by the Tibetan Lama 'Gro-mgon Chos-rgyal 'Phags-pa for Yuan emperor Kublai Khan, as a unified script for the literary languages of the Yuan Dynasty....

(A840–A87F)

Saurashtra

Saurashtra script

Saurashtra is a script used to write the Saurashtra language. Its usage has declined and Tamil script and Latin are now used more commonly.The Saurashtra Language is written in its own script. Because this is a minority language not taught in schools, people learn to write in Sourashtra Script...

(A880–A8DF)

Devanagari

Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

Extended (A8E0–A8FF)

Kayah Li

Kayah Li script

The Kayah Li alphabet is used to write the Kayah languages Eastern Kayah Li and Western Kayah Li, which are members of Karenic branch of the Sino-Tibetan language family. They are also known as Red Karen and Karenni...

(A900–A92F)

Rejang

Rejang script

The Rejang script, sometimes spelt Redjang and locally known as Surat Ulu , is an abugida of the Brahmic family, and is related to other scripts of the region, like Batak, Buginese, and others. Rejang is a member of the closely related group of Surat Ulu scripts that include the script variants of...

(A930–A95F)

Hangul Jamo Extended-A (A960–A97F)

Javanese

Javanese script

The Javanese alphabet, natively known as Hanacaraka or Carakan , known by the Sundanese people as Cacarakan is the pre-colonial script used to write the Javanese language....

(A980–A9DF)

Cham (AA00–AA5F)

Myanmar Extended-A (AA60–AA7F)

Tai Viet (AA80–AADF)

Ethiopic Extended-A (AB00–AB2F)

Meetei Mayek (ABC0–ABFF)

Hangul

Syllables (AC00–D7AF)

Hangul Jamo Extended-B (D7B0–D7FF)

Surrogates:

High Surrogates (D800–DB7F)

High Private Use Surrogates (DB80–DBFF)

Low Surrogates (DC00–DFFF)

Private Use Area (E000–F8FF)

CJK Compatibility Ideographs (F900–FAFF)

Alphabetic Presentation Forms (FB00–FB4F)

Arabic Presentation Forms-A (FB50–FDFF)

Variation Selectors (FE00–FE0F)

Vertical Forms (FE10–FE1F)

Combining Half Marks (FE20–FE2F)

CJK Compatibility Forms (FE30–FE4F)

Small Form Variants (FE50–FE6F)

Arabic Presentation Forms-B (FE70–FEFF)

Halfwidth and Fullwidth Forms

In CJK computing, graphic characters are traditionally classed into fullwidth and halfwidth characters...

(FF00–FFEF)

Specials

Unicode Specials

Specials is the name of a short Unicode block allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 codepoints, 5 are assigned as of Unicode 6.0:, marks start of annotated text, marks start of annotating text, marks end of annotating text, placeholder in the...

(FFF0–FFFF)

Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B

Linear B

Linear B is a syllabic script that was used for writing Mycenaean Greek, an early form of Greek. It pre-dated the Greek alphabet by several centuries and seems to have died out with the fall of Mycenaean civilization...

, but is also used for musical and mathematical symbols.

, the SMP comprises the following blocks:

Linear B
Linear B
Linear B is a syllabic script that was used for writing Mycenaean Greek, an early form of Greek. It pre-dated the Greek alphabet by several centuries and seems to have died out with the fall of Mycenaean civilization...

Syllabary (10000–1007F)
Linear B Ideograms (10080–100FF)
Aegean Numbers (10100–1013F)
Ancient Greek Numbers (10140–1018F)
Ancient Symbols (10190–101CF)
Phaistos Disc
Phaistos Disc
The Phaistos Disc is a disk of fired clay from the Minoan palace of Phaistos on the Greek island of Crete, possibly dating to the middle or late Minoan Bronze Age . It is about 15 cm in diameter and covered on both sides with a spiral of stamped symbols...

(101D0–101FF)
Lycian (10280–1029F)
Carian
Carian script
The Carian alphabets are a number of regional scripts used to write the Carian language of western Anatolia. They consisted of some 30 alphabetic letters, with several geographic variants in Caria and a homogeneous variant attested from the Nile delta, where Carian mercenaries fought for the...

(102A0–102DF)
Old Italic
Old Italic alphabet
Old Italic refers to several now extinct alphabet systems used on the Italian Peninsula in ancient times for various Indo-European languages and non-Indo-European languages...

(10300–1032F)
Gothic
Gothic alphabet
The Gothic alphabet is an alphabet for writing the Gothic language, created in the 4th century by Ulfilas for the purpose of translating the Christian Bible....

(10330–1034F)
Ugaritic
Ugaritic alphabet
The Ugaritic script is a cuneiform abjad used from around 1400 BCE for Ugaritic, an extinct Northwest Semitic language, and discovered in Ugarit , Syria, in 1928. It has 30 letters...

(10380–1039F)
Old Persian (103A0–103DF)
Deseret
Deseret alphabet
The Deseret alphabet is a phonemic English spelling reform developed in the mid-19th century by the board of regents of the University of Deseret under the direction of Brigham Young, second president of The Church of Jesus Christ of Latter-day Saints.In public statements, Young claimed the...

(10400–1044F)
Shavian
Shavian alphabet
The Shavian alphabet is an alphabet conceived as a way to provide simple, phonetic orthography for the English language to replace the difficulties of the conventional spelling. It was posthumously funded by and named after Irish playwright George Bernard Shaw...

(10450–1047F)
Osmanya (10480–104AF)
Cypriot Syllabary
Cypriot syllabary
The Cypriot syllabary is a syllabic script used in Iron Age Cyprus, from ca. the 11th to the 4th centuries BCE, when it was replaced by the Greek alphabet. A pioneer of that change was king Evagoras of Salamis...

(10800–1083F)

Imperial Aramaic

Aramaic alphabet

The Aramaic alphabet is adapted from the Phoenician alphabet and became distinctive from it by the 8th century BC. The letters all represent consonants, some of which are matres lectionis, which also indicate long vowels....

(10840–1085F)

Phoenician

Phoenician alphabet

The Phoenician alphabet, called by convention the Proto-Canaanite alphabet for inscriptions older than around 1050 BC, was a non-pictographic consonantal alphabet, or abjad. It was used for the writing of Phoenician, a Northern Semitic language, used by the civilization of Phoenicia...

(10900–1091F)

Lydian (10920–1093F)

Kharoshthi (10A00–10A5F)

Old South Arabian

South Arabian alphabet

The ancient Yemeni alphabet branched from the Proto-Sinaitic alphabet in about the 9th century BC. It was used for writing the Yemeni Old South Arabic languages of the Sabaean, Qatabanian, Hadramautic, Minaean, Himyarite, and proto-Ge'ez in Dʿmt...

(10A60–10A7F)

Avestan (10B00–10B3F)

Inscriptional Parthian (10B40–10B5F)

Inscriptional Pahlavi (10B60–10B7F)

Old Turkic (10C00–10C4F)

Rumi Numeral Symbols (10E60–10E7F)

Brahmi

Brāhmī script

Brāhmī is the modern name given to the oldest members of the Brahmic family of scripts. The best-known Brāhmī inscriptions are the rock-cut edicts of Ashoka in north-central India, dated to the 3rd century BCE. These are traditionally considered to be early known examples of Brāhmī writing...

(11000–1107F)

Kaithi

Kaithi , also called "Kayathi" or "Kayasthi", is the name of a historical script used widely in parts of North India, primarily in the former North-Western Provinces, Oudh and Bihar...

(11080–110CF)

Cuneiform

Cuneiform script

Cuneiform script )) is one of the earliest known forms of written expression. Emerging in Sumer around the 30th century BC, with predecessors reaching into the late 4th millennium , cuneiform writing began as a system of pictographs...

(12000–123FF)

Cuneiform Numbers and Punctuation (12400–1247F)

Egyptian Hieroglyphs

Egyptian hieroglyphs

Egyptian hieroglyphs were a formal writing system used by the ancient Egyptians that combined logographic and alphabetic elements. Egyptians used cursive hieroglyphs for religious literature on papyrus and wood...

(13000–1342F)

Bamum Supplement (16800–16A3F)

Kana Supplement (1B000–1B0FF)

Byzantine Musical Symbols (1D000–1D0FF)

Musical Symbols (1D100–1D1FF)

Ancient Greek Musical Notation (1D200–1D24F)

Tai Xuan Jing

The text Tài Xuán Jīng was composed by the Confucian writer Yáng Xióng . The first draft of this work was completed in 2BCE...

Symbols (1D300–1D35F)

Counting Rod Numerals

Counting rods

Counting rods are small bars, typically 3–14 cm long, used by mathematicians for calculation in China, Japan, Korea, and Vietnam. They are placed either horizontally or vertically to represent any number and any fraction....

(1D360–1D37F)

Mathematical Alphanumeric Symbols

Mathematical alphanumeric symbols

Mathematical Alphanumeric Symbols is a Unicode block of Latin and Greek letters and decimal digits that enable mathematicians to denote different notions with different letter styles .Unicode now includes many such symbols Mathematical Alphanumeric Symbols is a Unicode block of Latin and Greek...

(1D400–1D7FF)

Mahjong

Mahjong, sometimes spelled Mah Jongg, is a game that originated in China, commonly played by four players...

Tiles (1F000–1F02F)

Domino

Dominoes

Dominoes generally refers to the collective gaming pieces making up a domino set or to the subcategory of tile games played with domino pieces. In the area of mathematical tilings and polyominoes, the word domino often refers to any rectangle formed from joining two congruent squares edge to edge...

Tiles (1F030–1F09F)

Playing Cards (1F0A0–1F0FF)

Enclosed Alphanumeric Supplement

The Enclosed Alphanumeric Supplement is a Unicode block consisting mostly of Latin alphabet characters enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100 - U+1F1FF in the Supplementary Multilingual Plane....

(1F100–1F1FF)

Enclosed Ideographic Supplement (1F200–1F2FF)

Miscellaneous Symbols And Pictographs

Emoji

is the Japanese term for the picture characters or emoticons used in Japanese electronic messages and webpages. Originally meaning pictograph, the word literally means e "picture" + moji "letter". The characters are used much like emoticons elsewhere, but a wider range is provided, and the icons...

(1F300–1F5FF)

Emoticons (1F600–1F64F)

Transport And Map Symbols (1F680–1F6FF)

Alchemical Symbols

Alchemical symbol

Alchemical symbols, originally devised as part of alchemy, were used to denote some elements and some compounds until the 18th century. Note that while notation like this was mostly standardized, style and symbol varied between alchemists, so this page lists the most common.-Three primes:According...

(1F700–1F77F)

Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for Unified Han (CJK) Ideographs

Han unification

Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the so-called CJK languages into a single set of unified characters. Han characters are a common feature of written Chinese , Japanese , Korean , and—at least historically—other...

that were mostly not included in earlier character encoding standards.

, the SIP comprises the following blocks:

CJK Unified Ideographs Extension B (20000–2A6DF)
CJK Unified Ideographs Extension C (2A700–2B73F)
CJK Unified Ideographs Extension D (2B740–2B81F)
CJK Compatibility Ideographs Supplement (2F800–2FA1F)

Tertiary Ideographic Plane

Plane 3, the Tertiary Ideographic Plane (TIP), is reserved for Oracle Bone

Oracle bone script

Oracle bone script refers to incised ancient Chinese characters found on oracle bones, which are animal bones or turtle shells used in divination in Bronze Age China...

script, Bronze Script, Small Seal Script, additional CJK unified ideographs, and other historic ideographic scripts.

, no characters or blocks are assigned in TIP.

Unassigned planes

Unicode has not yet assigned any characters to Planes 4 through 13. It is not anticipated that all these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless.

Supplementary Special-purpose Plane

Plane 14 (E in hexadecimal

Hexadecimal

In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.