Mapping of Unicode character planes
Encyclopedia
In the Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 system, planes are groups of numerical values that point to specific characters. Unicode code point
Code point
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space . For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112...

s are logically divided into 17 planes, each with 65,536 (= 216) code points. Planes are identified by the numbers 0 to 16decimal, which corresponds with the possible values 00-10hexadecimal of the first two positions in six position format (hhhhhh). Six of these planes also have names.

Currently, about ten percent of the potential space is used. Furthermore, ranges of characters have been tentatively mapped out for every current and ancient writing system (script) the Unicode consortium has been able to identify. While Unicode may eventually need to use another of the spare 11 planes for ideographic characters, other planes remain. Even if previously unknown scripts with tens of thousands of characters are discovered, the limit of 1,114,112 code points is unlikely to be reached in the near future. The Unicode consortium has stated that limit will never be changed.

The odd-looking limit (it is not a power of 2) is due to the design of UTF-16. In UTF-16 a "surrogate pair" of two words is used to encode 220 code points (16 planes), plus single words are used to encode plane 0. It is not due to UTF-8
UTF-8
UTF-8 is a multibyte character encoding for Unicode. Like UTF-16 and UTF-32, UTF-8 can represent every character in the Unicode character set. Unlike them, it is backward-compatible with ASCII and avoids the complications of endianness and byte order marks...

, which was designed with a limit of 231 code points (32768 planes), and can encode 221 code points (32 planes) even if limited to 4 bytes.

Sometimes, the terms “astral plane” and “astral characters” are used informally to refer to the planes above the Basic Multilingual Plane (planes 1–16) and their characters.

Basic Multilingual Plane

The first plane (plane 0), the Basic Multilingual Plane (BMP), is where most characters have been assigned so far. The BMP contains characters for almost all modern languages, and a large number of special characters. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing
Writing
Writing is the representation of language in a textual medium through the use of a set of signs or symbols . It is distinguished from illustration, such as cave drawing and painting, and non-symbolic preservation of language via non-textual media, such as magnetic tape audio.Writing most likely...

. Most of the allocated code points in the BMP are used to encode Chinese, Japanese, and Korean (CJK
CJK
CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

) characters.

The High Surrogates (U+D800..U+DBFF) and Low Surrogate (U+DC00..U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a pair of 16-bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character.



, the BMP comprises the following blocks:
  • C0 Controls and Basic Latin (Basic Latin) (0000–007F)
  • C1 Controls and Latin-1 Supplement (0080–00FF)
  • Latin Extended-A (0100–017F)
  • Latin Extended-B (0180–024F)
  • IPA Extensions (0250–02AF)
  • Spacing Modifier Letters (02B0–02FF)
  • Combining Diacritical Marks (0300–036F)
  • Greek
    Greek alphabet
    The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

     and Coptic
    Coptic alphabet
    The Coptic alphabet is the script used for writing the Coptic language. The repertoire of glyphs is based on the Greek alphabet augmented by letters borrowed from the Demotic and is the first alphabetic script used for the Egyptian language...

     (0370–03FF)
  • Cyrillic
    Cyrillic alphabet
    The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

     (0400–04FF)
  • Cyrillic Supplement (0500–052F)
  • Armenian
    Armenian alphabet
    The Armenian alphabet is an alphabet that has been used to write the Armenian language since the year 405 or 406. It was devised by Saint Mesrop Mashtots, an Armenian linguist and ecclesiastical leader, and contained originally 36 letters. Two more letters, օ and ֆ, were added in the Middle Ages...

     (0530–058F)
  • Hebrew
    Hebrew alphabet
    The Hebrew alphabet , known variously by scholars as the Jewish script, square script, block script, or more historically, the Assyrian script, is used in the writing of the Hebrew language, as well as other Jewish languages, most notably Yiddish, Ladino, and Judeo-Arabic. There have been two...

     (0590–05FF)
  • Arabic
    Arabic alphabet
    The Arabic alphabet or Arabic abjad is the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually stand for consonants, it is classified as an abjad.-Consonants:The Arabic alphabet has...

     (0600–06FF)
  • Syriac
    Syriac alphabet
    The Syriac alphabet is a writing system primarily used to write the Syriac language from around the 2nd century BC . It is one of the Semitic abjads directly descending from the Aramaic alphabet and shares similarities with the Phoenician, Hebrew, Arabic, and the traditional Mongolian alphabets.-...

     (0700–074F)
  • Arabic Supplement (0750–077F)
  • Thaana (0780–07BF)
  • NKo
    N'Ko
    N'Ko is both a script devised by Solomana Kante in 1949 as a writing system for the Mande languages of West Africa, and the name of the literary language itself written in the script. The term N'Ko means 'I say' in all Manding languages....

     (07C0–07FF)
  • Samaritan (0800–083F)
  • Mandaic
    Mandaic alphabet
    The Mandaic alphabet is based on the Aramaic alphabet, and is used for writing the Mandaic language.The Mandaic name for the script is Abagada or Abaga, after the first letters of the alphabet...

     (0840–085F)
  • Indic scripts:
  • Devanagari
    Devanagari
    Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

     (0900–097F)
  • Bengali
    Bengali script
    The Bengali alphabet is the writing system for the Bengali language. The script with variations is used for Assamese and is basis for Meitei, Bishnupriya Manipuri, Kokborok, Garo and Mundari alphabets. All these languages are spoken in the eastern region of South Asia. Historically, the script has...

     (0980–09FF)
  • Gurmukhi (0A00–0A7F)
  • Gujarati
    Gujarati script
    The Gujarati script , which like all Nāgarī writing systems is strictly speaking an abugida rather than an alphabet, is used to write the Gujarati and Kutchi languages...

     (0A80–0AFF)
  • Oriya
    Oriya script
    The Oriya script or Utkala Lipi or Utkalakshara is used to write the Oriya language, and can be used for several other Indian languages, for example, Sanskrit.- History :...

     (0B00–0B7F)
  • Tamil
    Tamil script
    The Tamil script is a script that is used to write the Tamil language as well as other minority languages such as Badaga, Irulas, and Paniya...

     (0B80–0BFF)
  • Telugu
    Telugu script
    Telugu script, an abugida from the Brahmic family of scripts, is used to write the Telugu language, a language found in the South-Central Indian state of Andhra Pradesh as well as several other neighboring states. The Telugu script is derived from the Bhattiprolu script...

     (0C00–0C7F)
  • Kannada
    Kannada script
    The Kannada script is an alphasyllabary of the Brahmic family, used primarily to write the Kannada language, one of the Dravidian languages of southern India and also Sanskrit in the past. The Telugu script is derived from Old Kannada, and resembles Kannada script...

     (0C80–0CFF)
  • Malayalam
    Malayalam script
    The Malayalam script is a Brahmic script used commonly to write the Malayalam language—which is the principal language of the Indian state of Kerala, spoken by 36 million people in the world. Like many other Indic scripts, it is an abugida, or a writing system that is partially “alphabetic” and...

     (0D00–0D7F)
  • Sinhala (0D80–0DFF)
  • Thai (0E00–0E7F)
  • Lao (0E80–0EFF)
  • Tibetan
    Tibetan script
    The Tibetan alphabet is an abugida of Indic origin used to write the Tibetan language as well as the Dzongkha language, Denzongkha, Ladakhi language and sometimes the Balti language. The printed form of the alphabet is called uchen script while the hand-written cursive form used in everyday...

     (0F00–0FFF)
  • Myanmar (1000–109F)
  • Georgian
    Georgian alphabet
    The Georgian alphabet is the writing system used to write the Georgian language and other Kartvelian languages , and occasionally other languages of the Caucasus such as Ossetic and Abkhaz during the 1940s...

     (10A0–10FF)
  • Hangul Jamo (1100–11FF)
  • Ethiopic (1200–137F)
  • Ethiopic Supplement (1380–139F)
  • Cherokee
    Cherokee syllabary
    The Cherokee syllabary is a syllabary invented by Sequoyah to write the Cherokee language in the late 1810s and early 1820s. His creation of the syllabary is particularly noteworthy in that he could not previously read any script. He first experimented with logograms, but his system later developed...

     (13A0–13FF)
  • Unified Canadian Aboriginal Syllabics (1400–167F)
  • Ogham
    Ogham
    Ogham is an Early Medieval alphabet used primarily to write the Old Irish language, and occasionally the Brythonic language. Ogham is sometimes called the "Celtic Tree Alphabet", based on a High Medieval Bríatharogam tradition ascribing names of trees to the individual letters.There are roughly...

     (1680–169F)
  • Runic
    Runic alphabet
    The runic alphabets are a set of related alphabets using letters known as runes to write various Germanic languages before the adoption of the Latin alphabet and for specialized purposes thereafter...

     (16A0–16FF)
  • Philippine scripts:
  • Tagalog
    Baybayin
    Baybayin , is a pre-Spanish Philippine writing system. It is a member of the Brahmic family and is recorded as being in use in the 16th century...

     (1700–171F)
  • Hanunoo
    Hanunó'o script
    Hanunó’o is one of the indigenous scripts of the Philippines and is used by the Mangyan people of southern Mindoro to write the Hanunó’o language. It is an abugida descended from the Indic scripts, closely related to Baybayin, and is famous for being written vertical but written upward, rather than...

     (1720–173F)
  • Buhid
    Buhid script
    Buhid, is an indigenous Brahmic script of the Philippines, closely related to Baybayin, and is used today by the Mangyans to write their language, Buhid.- Unicode :Buhid script was added to the Unicode Standard in March, 2002 with the release of version 3.2....

     (1740–175F)
  • Tagbanwa (1760–177F)
  • Khmer
    Khmer script
    The Khmer script is an alphasyllabary script used to write the Khmer language . It is also used to write Pali among the Buddhist liturgy of Cambodia and Thailand....

     (1780–17FF)
  • Mongolian
    Mongolian script
    The classical Mongolian script , also known as Uyghurjin, was the first writing system created specifically for the Mongolian language, and was the most successful until the introduction of Cyrillic in 1946...

     (1800–18AF)
  • Unified Canadian Aboriginal Syllabics Extended (18B0–18FF)
  • Limbu
    Limbu script
    The Limbu script is used to write the Limbu language. The Limbu script is an abugida derived from the Tibetan script.-History:According to traditional histories, the Limbu script was first invented in the late 9th century by King Sirijonga Haang, then fell out of use, to be reintroduced in the 18th...

     (1900–194F)
  • Tai Le
    Tai Le script
    Tai Le is the name of Tai Nüa script, the script used for the Tai Nüa language, given by Microsoft.-Unicode:Tai Le script was added to the Unicode Standard in April, 2003 with the release of version 4.0....

     (1950–197F)
  • Tai Lue
    Tai Lü language
    Tai Lü is a language spoken by about 670,000 people in South East Asia. This includes 250,000 people in China, 200,000 in Burma, 134,000 in Thailand, and 5,000 in Vietnam...

     (1980–19DF)
  • Khmer
    Khmer script
    The Khmer script is an alphasyllabary script used to write the Khmer language . It is also used to write Pali among the Buddhist liturgy of Cambodia and Thailand....

     Symbols (19E0–19FF)
  • Buginese (1A00–1A1F)
  • Tai Tham (1A20–1AAF)
  • Balinese
    Balinese script
    The Balinese alphabet is an abugida that was used to write the Balinese language, an Austronesian language spoken by about three million people on the Indonesian island of Bali. The use of the Balinese script has mostly been replaced by the Roman alphabet. Although it is learned in school, few...

     (1B00–1B7F)
  • Sundanese
    Sundanese script
    Sundanese script Sundanese script Sundanese script (Aksara Sunda, is a writing system which is used by some Sundanese people. It is built based on Old Sundanese script (Aksara Sunda Kuna) which was used by ancientSundanese between 14th and 18th centuries....

     (1B80–1BBF)
  • Batak (1BC0–1BFF)
  • Lepcha
    Lepcha script
    The Lepcha script, or Róng script is an abugida used by the Lepcha people to write the Lepcha language. Unusually for an abugida, syllable-final consonants are written as diacritics.-History:...

     (1C00–1C4F)
  • Ol Chiki
    Ol Chiki script
    The Ol Chiki script, also known as Ol Cemetʼ , Ol Ciki, Ol , was created in 1925 by Raghunath Murmu for the Santali language. Previously, Santali had been written with the Bengali alphabet, Oriya alphabet, or Latin alphabet, on the rare occasions it was written at all...

     (1C50–1C7F)
  • Vedic
    Vedic Sanskrit
    Vedic Sanskrit is an old Indo-Aryan language. It is an archaic form of Sanskrit, an early descendant of Proto-Indo-Iranian. It is closely related to Avestan, the oldest preserved Iranian language...

     Extensions (1CD0–1CFF)
  • Phonetic Extensions (1D00–1D7F)
  • Phonetic Extensions Supplement (1D80–1DBF)
  • Combining Diacritical Marks Supplement (1DC0–1DFF)
  • Latin extended additional (1E00–1EFF)
  • Greek Extended (1F00–1FFF)
  • Symbols
    Unicode Symbols
    In computing, in addition to encoding characters for the various writing systems used throughout the World, Unicode also devotes several blocks of characters to symbols that have a well-defined place in plain text. In Unicode there is a main distinction between "scripts" and "symbols". A character...

    :
  • General Punctuation
    Punctuation
    Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.In written English, punctuation is vital to disambiguate the meaning of sentences...

     (2000–206F)
  • Superscripts and Subscripts (2070–209F)
  • Currency Symbols (20A0–20CF)
  • Combining Diacritical Marks for Symbols (20D0–20FF)
  • Letterlike Symbols
    Letterlike Symbols
    Letterlike Symbols are graphemes which are constructed mainly from the glyphs of one or more letters.In Unicode, Letterlike Symbols are placed in the block U+2100–214F, as in the following table.-See also:*Mapping of Unicode characters...

     (2100–214F)
  • Number Forms
    Number Forms
    Number Forms are Unicode characters which have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and roman numerals. They are placed in the Unicode codepoint range 0x2150 through 0x218F , except for three fractions in ISO-8859-1...

     (2150–218F)
  • Arrows
    Arrow (symbol)
    An arrow is a graphical symbol such as → or ←, used to point or indicate direction, being in its simplest form a line segment with a triangle affixed to one end, and in more complex forms a representation of an actual arrow...

     (2190–21FF)
  • Mathematical Operators
    Unicode Mathematical Operators
    Unicode ranges mathematical operators and symbols in multiple blocks.* Mathematical Operators * Miscellaneous Mathematical Symbols-A * Miscellaneous Mathematical Symbols-B...

     (2200–22FF)
  • Miscellaneous Technical
    Miscellaneous Technical (Unicode)
    Miscellaneous Technical is the name of a a Unicode block ranging from U+2300 to U+23FF, which contains various common symbols which are related to and used in the various technical, programming language and academic professions....

     (2300–23FF)
  • Control Pictures (2400–243F)
  • Optical Character Recognition
    Optical character recognition
    Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...

     (2440–245F)
  • Enclosed Alphanumerics (2460–24FF)
  • Box Drawing
    Box drawing characters
    Box drawing characters, also known as line drawing characters, or pseudographics, are widely used in text user interfaces to draw various frames and boxes...

     (2500–257F)
  • Block Elements (2580–259F)
  • Geometric Shapes (25A0–25FF)
  • Miscellaneous Symbols
    Miscellaneous Symbols
    The Miscellaneous Symbols Unicode block contains various glyphs representing things from a variety of categories: Astrological, Astronomical, Chess, Dice, Ideological symbols, Musical notation, Political symbols, Recycling, Religious symbols, Trigrams, Warning signs and Weather.-Tables:Note: These...

     (2600–26FF)
  • Dingbats (2700–27BF)
  • Miscellaneous Mathematical Symbols-A (27C0–27EF)
  • Supplemental Arrows-A (27F0–27FF)
  • Braille
    Braille
    The Braille system is a method that is widely used by blind people to read and write, and was the first digital form of writing.Braille was devised in 1825 by Louis Braille, a blind Frenchman. Each Braille character, or cell, is made up of six dot positions, arranged in a rectangle containing two...

     Patterns (2800–28FF)
  • Supplemental Arrows-B (2900–297F)
  • Miscellaneous Mathematical Symbols-B (2980–29FF)
  • Supplemental Mathematical Operators (2A00–2AFF)
  • Miscellaneous Symbols and Arrows (2B00–2BFF)
  • Glagolitic
    Glagolitic alphabet
    The Glagolitic alphabet , also known as Glagolitsa, is the oldest known Slavic alphabet. The name was not coined until many centuries after its creation, and comes from the Old Slavic glagolъ "utterance" . The verb glagoliti means "to speak"...

     (2C00–2C5F)
  • Latin Extended-C (2C60–2C7F)
  • Coptic (2C80–2CFF)
  • Georgian Supplement (2D00–2D2F)
  • Tifinagh
    Tifinagh
    Tifinagh is a series of abjad and alphabetic scripts used by some Berber peoples, notably the Tuareg, to write their language.A modern derivate of the traditional script, known as Neo-Tifinagh, was introduced in the 20th century...

     (2D30–2D7F)
  • Ethiopic Extended (2D80–2DDF)
  • Cyrillic Extended-A (2DE0–2DFF)
  • Supplemental Punctuation (2E00–2E7F)
  • East Asian scripts and symbols:
  • CJK
    CJK
    CJK is a collective term for Chinese, Japanese, and Korean, which is used in the field of software and communications internationalization.The term CJKV means CJK plus Vietnamese, which constitute the main East Asian languages.- Characteristics :...

     Radicals
    Radical (Chinese character)
    A Chinese radical is a component of a Chinese character. The term may variously refer to the original semantic element of a character, or to any semantic element, or, loosely, to any element whatever its origin or purpose...

     Supplement (2E80–2EFF)
  • Kangxi Radicals (2F00–2FDF)
  • Ideographic Description Characters (2FF0–2FFF)
  • CJK Symbols and Punctuation (3000–303F)
  • Hiragana
    Hiragana
    is a Japanese syllabary, one basic component of the Japanese writing system, along with katakana, kanji, and the Latin alphabet . Hiragana and katakana are both kana systems, in which each character represents one mora...

     (3040–309F)
  • Katakana
    Katakana
    is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji, and in some cases the Latin alphabet . The word katakana means "fragmentary kana", as the katakana scripts are derived from components of more complex kanji. Each kana represents one mora...

     (30A0–30FF)
  • Bopomofo
    Bopomofo
    Zhuyin fuhao , often abbreviated as zhuyin and colloquially called bopomofo, was introduced in the 1910s as the first official phonetic system for transcribing Chinese, especially Mandarin....

     (3100–312F)
  • Hangul
    Hangul
    Hangul,Pronounced or ; Korean: 한글 Hangeul/Han'gŭl or 조선글 Chosŏn'gŭl/Joseongeul the Korean alphabet, is the native alphabet of the Korean language. It is a separate script from Hanja, the logographic Chinese characters which are also sometimes used to write Korean...

     Compatibility Jamo (3130–318F)
  • Kanbun
    Kanbun
    The Japanese word originally meant "Classical Chinese writings, Chinese classic texts, Classical Chinese literature". This evolved into a Japanese method of reading annotated Classical Chinese in translation . Much Japanese literature was written in literary Chinese using this annotated style...

     (3190–319F)
  • Bopomofo
    Bopomofo
    Zhuyin fuhao , often abbreviated as zhuyin and colloquially called bopomofo, was introduced in the 1910s as the first official phonetic system for transcribing Chinese, especially Mandarin....

     Extended (31A0–31BF)
  • CJK Strokes (31C0–31EF)
  • Katakana
    Katakana
    is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji, and in some cases the Latin alphabet . The word katakana means "fragmentary kana", as the katakana scripts are derived from components of more complex kanji. Each kana represents one mora...

     Phonetic Extensions (31F0–31FF)
  • Enclosed CJK Letters and Months (3200–32FF)
  • CJK Compatibility (3300–33FF)
  • CJK Unified Ideographs Extension A (3400–4DBF)
  • Yijing Hexagram Symbols (4DC0–4DFF)
  • CJK Unified Ideographs
    CJK Unified Ideographs
    The Chinese, Japanese and Korean scripts share a common background. In the process called Han unification the common characters were identified, and named "CJK Unified Ideographs"...

     (4E00–9FFF)
  • Yi Syllables (A000–A48F)
  • Yi Radicals (A490–A4CF)
  • Lisu
    Fraser alphabet
    The Fraser alphabet or Old Lisu Alphabet is an artificial script invented around 1915 by Sara Ba Thaw, a Karen preacher from Myanmar, and improved by the missionary James O. Fraser, to write the Lisu language. It is a single-case alphabet....

     (A4D0–A4FF)
  • Vai (A500–A63F)
  • Cyrillic Extended-B (A640–A69F)
  • Bamum
    Bamum script
    The Bamum scripts are an evolutionary series of six scripts created for the Bamum language by King Njoya of Cameroon at the turn of the 20th century...

     (A6A0–A6FF)
  • Modifier Tone Letters (A700–A71F)
  • Latin Extended-D (A720–A7FF)
  • Syloti Nagri (A800–A82F)
  • Common Indic Number Forms (A830–A83F)
  • Phags-pa
    Phagspa script
    The Phags-pa script was an alphabet designed by the Tibetan Lama 'Gro-mgon Chos-rgyal 'Phags-pa for Yuan emperor Kublai Khan, as a unified script for the literary languages of the Yuan Dynasty....

     (A840–A87F)
  • Saurashtra
    Saurashtra script
    Saurashtra is a script used to write the Saurashtra language. Its usage has declined and Tamil script and Latin are now used more commonly.The Saurashtra Language is written in its own script. Because this is a minority language not taught in schools, people learn to write in Sourashtra Script...

     (A880–A8DF)
  • Devanagari
    Devanagari
    Devanagari |deva]]" and "nāgarī" ), also called Nagari , is an abugida alphabet of India and Nepal...

     Extended (A8E0–A8FF)
  • Kayah Li
    Kayah Li script
    The Kayah Li alphabet is used to write the Kayah languages Eastern Kayah Li and Western Kayah Li, which are members of Karenic branch of the Sino-Tibetan language family. They are also known as Red Karen and Karenni...

     (A900–A92F)
  • Rejang
    Rejang script
    The Rejang script, sometimes spelt Redjang and locally known as Surat Ulu , is an abugida of the Brahmic family, and is related to other scripts of the region, like Batak, Buginese, and others. Rejang is a member of the closely related group of Surat Ulu scripts that include the script variants of...

     (A930–A95F)
  • Hangul Jamo Extended-A (A960–A97F)
  • Javanese
    Javanese script
    The Javanese alphabet, natively known as Hanacaraka or Carakan , known by the Sundanese people as Cacarakan is the pre-colonial script used to write the Javanese language....

     (A980–A9DF)
  • Cham (AA00–AA5F)
  • Myanmar Extended-A (AA60–AA7F)
  • Tai Viet (AA80–AADF)
  • Ethiopic Extended-A (AB00–AB2F)
  • Meetei Mayek (ABC0–ABFF)
  • Hangul
    Hangul
    Hangul,Pronounced or ; Korean: 한글 Hangeul/Han'gŭl or 조선글 Chosŏn'gŭl/Joseongeul the Korean alphabet, is the native alphabet of the Korean language. It is a separate script from Hanja, the logographic Chinese characters which are also sometimes used to write Korean...

     Syllables (AC00–D7AF)
  • Hangul Jamo Extended-B (D7B0–D7FF)
  • Surrogates:
  • High Surrogates (D800–DB7F)
  • High Private Use Surrogates (DB80–DBFF)
  • Low Surrogates (DC00–DFFF)
  • Private Use Area (E000–F8FF)
  • CJK Compatibility Ideographs (F900–FAFF)
  • Alphabetic Presentation Forms (FB00–FB4F)
  • Arabic Presentation Forms-A (FB50–FDFF)
  • Variation Selectors (FE00–FE0F)
  • Vertical Forms (FE10–FE1F)
  • Combining Half Marks (FE20–FE2F)
  • CJK Compatibility Forms (FE30–FE4F)
  • Small Form Variants (FE50–FE6F)
  • Arabic Presentation Forms-B (FE70–FEFF)
  • Halfwidth and Fullwidth Forms
    Halfwidth and Fullwidth Forms
    In CJK computing, graphic characters are traditionally classed into fullwidth and halfwidth characters...

     (FF00–FFEF)
  • Specials
    Unicode Specials
    Specials is the name of a short Unicode block allocated at the very end of the Basic Multilingual Plane, at U+FFF0–FFFF. Of these 16 codepoints, 5 are assigned as of Unicode 6.0:, marks start of annotated text, marks start of annotating text, marks end of annotating text, placeholder in the...

     (FFF0–FFFF)

  • Supplementary Multilingual Plane

    Plane 1, the Supplementary Multilingual Plane (SMP), is mostly used for historic scripts such as Linear B
    Linear B
    Linear B is a syllabic script that was used for writing Mycenaean Greek, an early form of Greek. It pre-dated the Greek alphabet by several centuries and seems to have died out with the fall of Mycenaean civilization...

    , but is also used for musical and mathematical symbols.

    , the SMP comprises the following blocks:
    • Linear B
      Linear B
      Linear B is a syllabic script that was used for writing Mycenaean Greek, an early form of Greek. It pre-dated the Greek alphabet by several centuries and seems to have died out with the fall of Mycenaean civilization...

       Syllabary (10000–1007F)
    • Linear B Ideograms (10080–100FF)
    • Aegean Numbers (10100–1013F)
    • Ancient Greek Numbers (10140–1018F)
    • Ancient Symbols (10190–101CF)
    • Phaistos Disc
      Phaistos Disc
      The Phaistos Disc is a disk of fired clay from the Minoan palace of Phaistos on the Greek island of Crete, possibly dating to the middle or late Minoan Bronze Age . It is about 15 cm in diameter and covered on both sides with a spiral of stamped symbols...

       (101D0–101FF)
    • Lycian (10280–1029F)
    • Carian
      Carian script
      The Carian alphabets are a number of regional scripts used to write the Carian language of western Anatolia. They consisted of some 30 alphabetic letters, with several geographic variants in Caria and a homogeneous variant attested from the Nile delta, where Carian mercenaries fought for the...

       (102A0–102DF)
    • Old Italic
      Old Italic alphabet
      Old Italic refers to several now extinct alphabet systems used on the Italian Peninsula in ancient times for various Indo-European languages and non-Indo-European languages...

       (10300–1032F)
    • Gothic
      Gothic alphabet
      The Gothic alphabet is an alphabet for writing the Gothic language, created in the 4th century by Ulfilas for the purpose of translating the Christian Bible....

       (10330–1034F)
    • Ugaritic
      Ugaritic alphabet
      The Ugaritic script is a cuneiform abjad used from around 1400 BCE for Ugaritic, an extinct Northwest Semitic language, and discovered in Ugarit , Syria, in 1928. It has 30 letters...

       (10380–1039F)
    • Old Persian (103A0–103DF)
    • Deseret
      Deseret alphabet
      The Deseret alphabet is a phonemic English spelling reform developed in the mid-19th century by the board of regents of the University of Deseret under the direction of Brigham Young, second president of The Church of Jesus Christ of Latter-day Saints.In public statements, Young claimed the...

       (10400–1044F)
    • Shavian
      Shavian alphabet
      The Shavian alphabet is an alphabet conceived as a way to provide simple, phonetic orthography for the English language to replace the difficulties of the conventional spelling. It was posthumously funded by and named after Irish playwright George Bernard Shaw...

       (10450–1047F)
    • Osmanya (10480–104AF)
    • Cypriot Syllabary
      Cypriot syllabary
      The Cypriot syllabary is a syllabic script used in Iron Age Cyprus, from ca. the 11th to the 4th centuries BCE, when it was replaced by the Greek alphabet. A pioneer of that change was king Evagoras of Salamis...

       (10800–1083F)
  • Imperial Aramaic
    Aramaic alphabet
    The Aramaic alphabet is adapted from the Phoenician alphabet and became distinctive from it by the 8th century BC. The letters all represent consonants, some of which are matres lectionis, which also indicate long vowels....

     (10840–1085F)
  • Phoenician
    Phoenician alphabet
    The Phoenician alphabet, called by convention the Proto-Canaanite alphabet for inscriptions older than around 1050 BC, was a non-pictographic consonantal alphabet, or abjad. It was used for the writing of Phoenician, a Northern Semitic language, used by the civilization of Phoenicia...

     (10900–1091F)
  • Lydian (10920–1093F)
  • Kharoshthi (10A00–10A5F)
  • Old South Arabian
    South Arabian alphabet
    The ancient Yemeni alphabet branched from the Proto-Sinaitic alphabet in about the 9th century BC. It was used for writing the Yemeni Old South Arabic languages of the Sabaean, Qatabanian, Hadramautic, Minaean, Himyarite, and proto-Ge'ez in Dʿmt...

     (10A60–10A7F)
  • Avestan (10B00–10B3F)
  • Inscriptional Parthian (10B40–10B5F)
  • Inscriptional Pahlavi (10B60–10B7F)
  • Old Turkic (10C00–10C4F)
  • Rumi Numeral Symbols (10E60–10E7F)
  • Brahmi
    Brāhmī script
    Brāhmī is the modern name given to the oldest members of the Brahmic family of scripts. The best-known Brāhmī inscriptions are the rock-cut edicts of Ashoka in north-central India, dated to the 3rd century BCE. These are traditionally considered to be early known examples of Brāhmī writing...

     (11000–1107F)
  • Kaithi
    Kaithi
    Kaithi , also called "Kayathi" or "Kayasthi", is the name of a historical script used widely in parts of North India, primarily in the former North-Western Provinces, Oudh and Bihar...

     (11080–110CF)
  • Cuneiform
    Cuneiform script
    Cuneiform script )) is one of the earliest known forms of written expression. Emerging in Sumer around the 30th century BC, with predecessors reaching into the late 4th millennium , cuneiform writing began as a system of pictographs...

     (12000–123FF)
  • Cuneiform Numbers and Punctuation (12400–1247F)
  • Egyptian Hieroglyphs
    Egyptian hieroglyphs
    Egyptian hieroglyphs were a formal writing system used by the ancient Egyptians that combined logographic and alphabetic elements. Egyptians used cursive hieroglyphs for religious literature on papyrus and wood...

     (13000–1342F)
  • Bamum Supplement (16800–16A3F)
  • Kana Supplement (1B000–1B0FF)
  • Byzantine Musical Symbols (1D000–1D0FF)
  • Musical Symbols (1D100–1D1FF)
  • Ancient Greek Musical Notation (1D200–1D24F)
  • Tai Xuan Jing
    Tai Xuan Jing
    The text Tài Xuán Jīng was composed by the Confucian writer Yáng Xióng . The first draft of this work was completed in 2BCE...

     Symbols (1D300–1D35F)
  • Counting Rod Numerals
    Counting rods
    Counting rods are small bars, typically 3–14 cm long, used by mathematicians for calculation in China, Japan, Korea, and Vietnam. They are placed either horizontally or vertically to represent any number and any fraction....

     (1D360–1D37F)
  • Mathematical Alphanumeric Symbols
    Mathematical alphanumeric symbols
    Mathematical Alphanumeric Symbols is a Unicode block of Latin and Greek letters and decimal digits that enable mathematicians to denote different notions with different letter styles .Unicode now includes many such symbols Mathematical Alphanumeric Symbols is a Unicode block of Latin and Greek...

     (1D400–1D7FF)
  • Mahjong
    Mahjong
    Mahjong, sometimes spelled Mah Jongg, is a game that originated in China, commonly played by four players...

     Tiles (1F000–1F02F)
  • Domino
    Dominoes
    Dominoes generally refers to the collective gaming pieces making up a domino set or to the subcategory of tile games played with domino pieces. In the area of mathematical tilings and polyominoes, the word domino often refers to any rectangle formed from joining two congruent squares edge to edge...

     Tiles (1F030–1F09F)
  • Playing Cards (1F0A0–1F0FF)
  • Enclosed Alphanumeric Supplement
    Enclosed Alphanumeric Supplement
    The Enclosed Alphanumeric Supplement is a Unicode block consisting mostly of Latin alphabet characters enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100 - U+1F1FF in the Supplementary Multilingual Plane....

     (1F100–1F1FF)
  • Enclosed Ideographic Supplement (1F200–1F2FF)
  • Miscellaneous Symbols And Pictographs
    Emoji
    is the Japanese term for the picture characters or emoticons used in Japanese electronic messages and webpages. Originally meaning pictograph, the word literally means e "picture" + moji "letter". The characters are used much like emoticons elsewhere, but a wider range is provided, and the icons...

     (1F300–1F5FF)
  • Emoticons (1F600–1F64F)
  • Transport And Map Symbols (1F680–1F6FF)
  • Alchemical Symbols
    Alchemical symbol
    Alchemical symbols, originally devised as part of alchemy, were used to denote some elements and some compounds until the 18th century. Note that while notation like this was mostly standardized, style and symbol varied between alchemists, so this page lists the most common.-Three primes:According...

     (1F700–1F77F)

  • Supplementary Ideographic Plane

    Plane 2, the Supplementary Ideographic Plane (SIP), is used for Unified Han (CJK) Ideographs
    Han unification
    Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the so-called CJK languages into a single set of unified characters. Han characters are a common feature of written Chinese , Japanese , Korean , and—at least historically—other...

     that were mostly not included in earlier character encoding standards.

    , the SIP comprises the following blocks:
    • CJK Unified Ideographs Extension B (20000–2A6DF)
    • CJK Unified Ideographs Extension C (2A700–2B73F)
    • CJK Unified Ideographs Extension D (2B740–2B81F)
    • CJK Compatibility Ideographs Supplement (2F800–2FA1F)

    Tertiary Ideographic Plane

    Plane 3, the Tertiary Ideographic Plane (TIP), is reserved for Oracle Bone
    Oracle bone script
    Oracle bone script refers to incised ancient Chinese characters found on oracle bones, which are animal bones or turtle shells used in divination in Bronze Age China...

     script, Bronze Script, Small Seal Script, additional CJK unified ideographs, and other historic ideographic scripts.

    , no characters or blocks are assigned in TIP.

    Unassigned planes

    Unicode has not yet assigned any characters to Planes 4 through 13. It is not anticipated that all these planes will be needed, given the total sizes of the known writing systems left to be encoded. However, the number of possible symbol characters that could arise outside of the context of writing systems is potentially limitless.

    Supplementary Special-purpose Plane

    Plane 14 (E in hexadecimal
    Hexadecimal
    In mathematics and computer science, hexadecimal is a positional numeral system with a radix, or base, of 16. It uses sixteen distinct symbols, most often the symbols 0–9 to represent values zero to nine, and A, B, C, D, E, F to represent values ten to fifteen...

    ), the Supplementary Special-purpose Plane (SSP), currently contains non-graphical characters. The first block is for language tag characters for use when language cannot be indicated through other protocols (such as the wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
     
    x
    OK