Western Latin character sets (computing)
Encyclopedia
Several binary representations of character sets for common Western European languages are compared in this article. These encodings were designed for representation of Italian
, Spanish
, Portuguese
, French
, German
, Dutch
, English
, Danish
, Swedish
, Norwegian
, and Icelandic
, which use the Latin alphabet
, a few additional letters and ones with precomposed diacritic
s, some punctuation, and various symbols (including some Greek letters). Although they're called "Western European" many of these languages are spoken all over the world. Also, these character sets happen to support many other languages such as Malay
, Swahili
, or Classical Latin.
, albeit that the same code points have multiple uses that caused some difficulty. The arrival of Unicode
, with a unique code point for every glyph
, resolved these issues.
U.S. ASCII
encoding has characters sufficient to properly represent only US-English, Latin, and Swahili. It is missing some letters and letter-diacritic combinations used in other Latin-alphabet languages. However, since there was no other choice on most U.S.-supplied computer platforms, ASCII was unavoidable in most of the non-English-speaking world (seven-bit encoding was necessitated by the limitations of early computing networks). There was the ISO 646 group of encodings which replaced some of the symbols in ASCII with local characters, but space was very limited, and some of the symbols replaced were quite common in things like programming languages.
Although seven-bit communication was the norm, most computers internally used eight-bit bytes, and they mostly put some form of characters in the 128 higher byte positions. In the early days most of these were system specific, but gradually a few standards were settled on.
In recent years, as storage and memory costs fall, the issues associated with multiple meanings of a given eight-bit code (there are seven ISO-Latin code sets alone) have ceased to be justified. All major operating systems have moved to Unicode
as their main internal representation. However at least on Windows many applications continue to use the non-Unicode versions of the API calls.
introduced significant pressure to support the euro sign (€), and most character sets had to be adapted in some way.
The table is arranged by Unicode
code point. Character sets are referred to here by their IANA
names in upper case.
Italian language
Italian is a Romance language spoken mainly in Europe: Italy, Switzerland, San Marino, Vatican City, by minorities in Malta, Monaco, Croatia, Slovenia, France, Libya, Eritrea, and Somalia, and by immigrant communities in the Americas and Australia...
, Spanish
Spanish language
Spanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...
, Portuguese
Portuguese language
Portuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...
, French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
, German
German language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....
, Dutch
Dutch language
Dutch is a West Germanic language and the native language of the majority of the population of the Netherlands, Belgium, and Suriname, the three member states of the Dutch Language Union. Most speakers live in the European Union, where it is a first language for about 23 million and a second...
, English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
, Danish
Danish language
Danish is a North Germanic language spoken by around six million people, principally in the country of Denmark. It is also spoken by 50,000 Germans of Danish ethnicity in the northern parts of Schleswig-Holstein, Germany, where it holds the status of minority language...
, Swedish
Swedish language
Swedish is a North Germanic language, spoken by approximately 10 million people, predominantly in Sweden and parts of Finland, especially along its coast and on the Åland islands. It is largely mutually intelligible with Norwegian and Danish...
, Norwegian
Norwegian language
Norwegian is a North Germanic language spoken primarily in Norway, where it is the official language. Together with Swedish and Danish, Norwegian forms a continuum of more or less mutually intelligible local and regional variants .These Scandinavian languages together with the Faroese language...
, and Icelandic
Icelandic language
Icelandic is a North Germanic language, the main language of Iceland. Its closest relative is Faroese.Icelandic is an Indo-European language belonging to the North Germanic or Nordic branch of the Germanic languages. Historically, it was the westernmost of the Indo-European languages prior to the...
, which use the Latin alphabet
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
, a few additional letters and ones with precomposed diacritic
Diacritic
A diacritic is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός . Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute and grave are often called accents...
s, some punctuation, and various symbols (including some Greek letters). Although they're called "Western European" many of these languages are spoken all over the world. Also, these character sets happen to support many other languages such as Malay
Malay language
Malay is a major language of the Austronesian family. It is the official language of Malaysia , Indonesia , Brunei and Singapore...
, Swahili
Swahili language
Swahili or Kiswahili is a Bantu language spoken by various ethnic groups that inhabit several large stretches of the Mozambique Channel coastline from northern Kenya to northern Mozambique, including the Comoro Islands. It is also spoken by ethnic minority groups in Somalia...
, or Classical Latin.
Summary
The ISO-8859 series of 8-bit character sets encodes all Latin character sets used in EuropeEurope
Europe is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally 'divided' from Asia to its east by the watershed divides of the Ural and Caucasus Mountains, the Ural River, the Caspian and Black Seas, and the waterways connecting...
, albeit that the same code points have multiple uses that caused some difficulty. The arrival of Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
, with a unique code point for every glyph
Glyph
A glyph is an element of writing: an individual mark on a written medium that contributes to the meaning of what is written. A glyph is made up of one or more graphemes....
, resolved these issues.
- ISO/IEC 8859-1ISO/IEC 8859-1ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin-1. It is generally...
or Latin-1 is the most used and also defines the first 256 codes in UnicodeUnicodeUnicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems... - ISO/IEC 8859-15ISO/IEC 8859-15ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9...
modifies ISO-8859-1 to support Finnish and French and add the euro signEuro signThe euro sign is the currency sign used for the euro, the official currency of the Eurozone in the European Union . The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code for the euro is EUR...
. - In terms of printable characters Windows-1252Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
has everything ISO-8859-1 and ISO-8859-15 have and more. - IBM CP437Code page 437IBM PC or MS-DOS code page 437 is the character set of the original IBM PC. It is also known as CP 437, OEM 437, PC-8, MS-DOS Latin US or sometimes misleadingly referred to as the OEM font, High ASCII or Extended ASCII....
, being intended for English only, has very little in the way of accented letters but has far more graphics characters than the others and also some Greek characters that are useful as technical symbols. - IBM CP850Code page 850Code page 850 is a code page used under MS-DOS in Western Europe. It is the code page commonly used by the version of MS-DOS underlying Windows ME...
has all the printable characters that ISO-8859-1 has (albeit arranged differently) and still manages to have enough graphics characters to build a usable text-mode user interface. - IBM CP858Code page 858Code page 858 is a code page used under MS-DOS to write Western European languages.Code page 858 was created from code page 850 in 1998 by changing code point 213 from dotless I ⟨ı⟩ to the euro sign ⟨€⟩....
differs from CP850 only by one character — a rarely-used dotless i (ı) was replaced by euro currency sign (€). - IBM code pages 037EBCDIC 037IBM code page 37 is an EBCDIC code page with the full Latin-1 character set used in IBM mainframes. It is used in some English and Portuguese speaking countries, including Australia, Brazil, Canada, New Zealand, Portugal, South Africa, and the United States....
, 500EBCDIC 500IBM code page 500 is an EBCDIC code page with full Latin-1-charset used in IBM mainframes.CCSID 1148 is the Euro currency update of code page/CCSID 500. Byte 9F is replaced ¤ with € in that code page.-Codepage layout:...
, and 1047EBCDIC 1047Code page 01047 is an EBCDIC code page with the full Latin-1 character set.It is possible to translate the character codes from the CP 01047 charset to ISO 8859-1 character codes, so that translation back to the CP 01047 charset is an exact value-preserving round-trip conversion....
are EBCDICEBCDICExtended Binary Coded Decimal Interchange Code is an 8-bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems....
encodings that include all of the ISO-8859-1 characters. - The Mac OS RomanMac OS RomanMac OS Roman is a character encoding primarily used by Mac OS to represent text. It encodes 256 characters, the first 128 of which are identical to ASCII, with the remaining characters including mathematical symbols, diacritics, and additional punctuation marks. It is suitable for use to represent...
character set (often referred to as MacRoman and known by the IANAInternet Assigned Numbers AuthorityThe Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...
as simply MACINTOSH) has most, but not all, of the same characters as ISO-8859-1 but in a very different arrangement; and it also adds many technical and mathematical characters and more diacritics. Older Macintosh web browserWeb browserA web browser is a software application for retrieving, presenting, and traversing information resources on the World Wide Web. An information resource is identified by a Uniform Resource Identifier and may be a web page, image, video, or other piece of content...
s were known to munge the few characters that were in ISO-8859-1 but not their native Macintosh character set when editing text from Web sites. Conversely, in Web material prepared on an older Macintosh, many characters were displayed incorrectly when read by other operating systems. - The euro signEuro signThe euro sign is the currency sign used for the euro, the official currency of the Eurozone in the European Union . The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code for the euro is EUR...
post-dates these (ISO-8859 specifications: conflicting ways to retrofit it led to significant difficulty until Unicode became more generally adopted.
History
The earlier seven-bitBit
A bit is the basic unit of information in computing and telecommunications; it is the amount of information stored by a digital device or other physical system that exists in one of two possible distinct states...
U.S. ASCII
ASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
encoding has characters sufficient to properly represent only US-English, Latin, and Swahili. It is missing some letters and letter-diacritic combinations used in other Latin-alphabet languages. However, since there was no other choice on most U.S.-supplied computer platforms, ASCII was unavoidable in most of the non-English-speaking world (seven-bit encoding was necessitated by the limitations of early computing networks). There was the ISO 646 group of encodings which replaced some of the symbols in ASCII with local characters, but space was very limited, and some of the symbols replaced were quite common in things like programming languages.
Although seven-bit communication was the norm, most computers internally used eight-bit bytes, and they mostly put some form of characters in the 128 higher byte positions. In the early days most of these were system specific, but gradually a few standards were settled on.
In recent years, as storage and memory costs fall, the issues associated with multiple meanings of a given eight-bit code (there are seven ISO-Latin code sets alone) have ceased to be justified. All major operating systems have moved to Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
as their main internal representation. However at least on Windows many applications continue to use the non-Unicode versions of the API calls.
The euro sign
The coming of the euroEuro
The euro is the official currency of the eurozone: 17 of the 27 member states of the European Union. It is also the currency used by the Institutions of the European Union. The eurozone consists of Austria, Belgium, Cyprus, Estonia, Finland, France, Germany, Greece, Ireland, Italy, Luxembourg,...
introduced significant pressure to support the euro sign (€), and most character sets had to be adapted in some way.
- MacRoman simply replaced the generic currency sign (¤). This caused significant difficulty because organisations had found other uses for it, such as the company logo.
- ISO introduced ISO 8859-15, which replaced the generic currency sign with the euro sign as well as making some other replacements of symbols with letters with diacritics.
- Windows-1252Windows-1252Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages...
simply placed the euro sign in a gap (position 80hex) in the existing C1 control codesC0 and C1 control codesMost character encodings, in addition to representing printable characters, may also represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received...
.
Comparison table
Code points to U+007F are not shown in this table currently, as they are directly mapped in all character sets listed here. The ASCII coding standard defines the original specification for the mapping of the first 0-127 characters.The table is arranged by Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
code point. Character sets are referred to here by their IANA
Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...
names in upper case.
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 Windows-1252 Windows-1252 or CP-1252 is a character encoding of the Latin alphabet, used by default in the legacy components of Microsoft Windows in English and some other Western languages. It is one version within the group of Windows code pages... | IBM437 | IBM850 | MACINTOSH Mac OS Roman Mac OS Roman is a character encoding primarily used by Mac OS to represent text. It encodes 256 characters, the first 128 of which are identical to ASCII, with the remaining characters including mathematical symbols, diacritics, and additional punctuation marks. It is suitable for use to represent... |
---|---|---|---|---|---|---|---|
NBSP | U+00A0 | A0 | A0 | A0 | FF | FF | CA |
¡ | U+00A1 | A1 | A1 | A1 | AD | AD | C1 |
¢ | U+00A2 | A2 | A2 | A2 | 9B | BD | A2 |
£ Pound sign The pound sign is the symbol for the pound sterling—the currency of the United Kingdom . The same symbol is used for similarly named currencies in some other countries and territories, such as the Irish pound, Gibraltar pound, Australian pound and the Italian lira... |
U+00A3 | A3 | A3 | A3 | 9C | 9C | A3 |
¤ | U+00A4 | A4 | A4 | CF | |||
¥ ¥ ¥ is a currency sign used by the Japanese yen and the Chinese yuan currencies. The symbol resembles a Latin letter Y with a double stroke. The base unit of both currencies shared the same Chinese character pronounced yuán in Mandarin Chinese and en in Standard Japanese... |
U+00A5 | A5 | A5 | A5 | 9D | BE | B4 |
¦ | U+00A6 | A6 | A6 | DD | |||
§ | U+00A7 | A7 | A7 | A7 | F5 | A4 | |
¨ | U+00A8 | A8 | A8 | F9 | AC | ||
© | U+00A9 | A9 | A9 | A9 | B8 | A9 | |
ª | U+00AA | AA | AA | AA | A6 | A6 | BB |
« | U+00AB | AB | AB | AB | AE | AE | C7 |
¬ | U+00AC | AC | AC | AC | AA | AA | C2 |
SHY Soft hyphen In computing and typesetting, a soft hyphen is a type of hyphen used to specify a place in text where a hyphenated break is allowed without forcing a line break in an inconvenient place if the text is re-flowed.... |
U+00AD | AD | AD | AD | F0 | ||
® | U+00AE | AE | AE | AE | A9 | A8 | |
¯ | U+00AF | AF | AF | AF | EE | F8 | |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
° ° ˚ "modifier letter ring above" is a character of the Spacing Modifier Letters range .It is used in the transliteration of Abkhaz to represent the letter .... |
U+00B0 | B0 | B0 | B0 | F8 | F8 | A1 |
± | U+00B1 | B1 | B1 | B1 | F1 | F1 | B1 |
² | U+00B2 | B2 | B2 | B2 | FD | FD | |
³ | U+00B3 | B3 | B3 | B3 | FC | ||
´ | U+00B4 | B4 | B4 | EF | AB | ||
µ | U+00B5 | B5 | B5 | B5 | E6 | E6 | B5 |
¶ | U+00B6 | B6 | B6 | B6 | F4 | A6 | |
· | U+00B7 | B7 | B7 | B7 | FA | FA | E1 |
¸ | U+00B8 | B8 | B8 | F7 | FC | ||
¹ | U+00B9 | B9 | B9 | B9 | FB | ||
º | U+00BA | BA | BA | BA | A7 | A7 | BC |
» | U+00BB | BB | BB | BB | AF | AF | C8 |
¼ | U+00BC | BC | BC | AC | AC | ||
½ | U+00BD | BD | BD | AB | AB | ||
¾ | U+00BE | BE | BE | F3 | |||
¿ | U+00BF | BF | BF | BF | A8 | A8 | C0 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
À À is a letter of the Catalan, French, Galician, Italian, Portuguese, Scottish Gaelic and Vietnamese languages, consisting of the Latin letter A and a grave accent. À is also used in Pinyin transliteration. In most languages, it represents the vowel a. This letter is also a letter in Taos.When... |
U+00C0 | C0 | C0 | C0 | B7 | CB | |
Á Á is a letter of the Czech, Faroese, Hungarian, Icelandic, Slovak and Sámi languages. This letter also appears in Dutch, Galician, Irish, Occitan, Portuguese, Spanish, Lakota, Navajo, and Vietnamese as a variant of the letter “a”. Some writers use á incorrectly to denote a quantity, often used on... |
U+00C1 | C1 | C1 | C1 | B5 | E7 | |
  is a letter of the Friulian, Romanian, Vietnamese, French, Galician, Portuguese, Frisian, Welsh, Turkish, and Walloon alphabets.- Croatian and Serbian :... |
U+00C2 | C2 | C2 | C2 | B6 | E5 | |
à à Ã/ã is a letter used in some languages, generally considered a variant of the letter A.In Portuguese, Ã/ã represents a nasal central unrounded vowel, . The combination ãe represents the Diphthong , and ão represents... |
U+00C3 | C3 | C3 | C3 | C7 | CC | |
Ä Ä "Ä" and "ä" are both characters that represent either a letter from several extended Latin alphabets, or the letter A with an umlaut mark or diaeresis.- Independent letter :... |
U+00C4 | C4 | C4 | C4 | 8E | 8E | 80 |
Å Å Å represents various sounds in several languages. Å is part of the alphabets used for the Alemannic and the Bavarian-Austrian dialects of German... |
U+00C5 | C5 | C5 | C5 | 8F | 8F | 81 |
Æ Æ Æ is a grapheme formed from the letters a and e. Originally a ligature representing a Latin diphthong, it has been promoted to the full status of a letter in the alphabets of some languages, including Danish, Faroese, Norwegian and Icelandic... |
U+00C6 | C6 | C6 | C6 | 92 | 92 | AE |
Ç Ç is a Latin script letter, used in the Albanian, Azerbaijani, Ligurian, Tatar, Turkish, Turkmen, Kurdish and Zazaki alphabets. This letter also appears in Catalan, French, Friulian, Occitan and Portuguese as a variant of the letter “c”... |
U+00C7 | C7 | C7 | C7 | 80 | 80 | 82 |
È È or can be*The letter E with a Grave accent.*In Shakespeare's works, è would be used in the -ed suffix to indicate alternate pronunciation, for example with winged/wingèd, the è would be added to produce a pronunciation of instead of .... |
U+00C8 | C8 | C8 | C8 | D4 | E9 | |
É É is a letter of the Czech, Hungarian, Icelandic, Kashubian, Luxembourgish, Slovak, and Catalan, Danish, English, French, Galician, Irish, Italian, Occitan, Norwegian, Portuguese, Spanish, Swedish, and Vietnamese language as a variant of the letter “e”... |
U+00C9 | C9 | C9 | C9 | 90 | 90 | 83 |
Ê Ê is a letter in the Friulan, Kurdish and Vietnamese languages. The letter also appears in Afrikaans, French, Portuguese, Welsh, and Albanian dialects as a variant of the letter "e", as well as being used in certain Chinese and Ukrainian transliteration systems.-Afrikaans:Ê is not considered a... |
U+00CA | CA | CA | CA | D2 | E6 | |
Ë Ë is a letter in the Albanian, Ripuarian, Uyghur Latin Script, Ladin, and Kashubian languages. This letter also appears in Afrikaans, Dutch, French, Abruzzese dialect , and Luxembourgish language as a variant of letter "e"... |
U+00CB | CB | CB | CB | D3 | E8 | |
Ì Ì Ì is used in the ISO 9:1995 system of Ukrainian transliteration as the Cyrillic letter І.In the Pinyin system of Chinese romanization, ì is an i with a falling tone.This appears in Catalan, Galician, Italian, Taos, and Vietnamese. Also Alcozauca Mixtec.... |
U+00CC | CC | CC | CC | DE | ED | |
Í Í is a letter in the Faroese, Hungarian, Icelandic, Czech, Slovak, and Tatar languages. This letter also appears in Catalan, Irish, Occitan, Portuguese, Spanish, Galician, Leonese, Navajo, and Vietnamese language as a variant of letter “i”.... |
U+00CD | CD | CD | CD | D6 | EA | |
Î Î is a letter in the Friulian, Kurdish, and Romanian alphabets. This letter also appears in French, Welsh and Walon language as a variant of letter “i”.- Afrikaans :... |
U+00CE | CE | CE | CE | D7 | EB | |
Ï Ï ', lowercase ', is a symbol used in various languages written with the Latin alphabet and in Ukrainian language which is written with the Cyrillic based Ukrainian alphabet; it can be read as the letter I with diaeresis or I-umlaut.... |
U+00CF | CF | CF | CF | D8 | EC | |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
Ð Ð A Latin capital letter D with a stroke through its vertical bar is the uppercase form of several different letters:*D with stroke , used in Vietnamese, some South Slavic , Moro and Sami languages... |
U+00D0 | D0 | D0 | D0 | D1 | ||
Ñ Ñ Ñ is a letter of the modern Latin alphabet, formed by an N with a diacritical tilde. It is used in the Spanish alphabet, Galician alphabet, Asturian alphabet, Basque alphabet, Aragonese old alphabet , Filipino alphabet, Chamorro alphabet and the Guarani alphabet, where it represents... |
U+00D1 | D1 | D1 | D1 | A5 | A5 | 84 |
Ò Ò is a letter in the Kashubian language. This letter also appears in Catalan, Italian, Occitan, Scottish Gaelic, Taos, and Vietnamese language as a variant of letter “o”.-Character mappings:-External links:***... |
U+00D2 | D2 | D2 | D2 | E3 | F1 | |
Ó Ó is a letter in the Faroese, Hungarian, Icelandic, Kashubian, Polish, Czech, Slovak, and Sorbian languages. This letter also appears in the Catalan, Irish, Occitan, Portuguese, Spanish, Italian and Vietnamese languages as a variant of letter “o”. It is also used in English for other purposes... |
U+00D3 | D3 | D3 | D3 | E0 | EE | |
Ô O O is the fifteenth letter and a vowel in the basic modern Latin alphabet.The letter was derived from the Semitic `Ayin , which represented a consonant, probably , the sound represented by the Arabic letter ع called `Ayn. This Semitic letter in its original form seems to have been inspired by a... |
U+00D4 | D4 | D4 | D4 | E2 | EF | |
Õ Õ "Õ", or "õ" is a composition of the Latin letter O with the diacritic mark tilde.The HTML entity is Õ for Õ and õ for õ.-Estonian:... |
U+00D5 | D5 | D5 | D5 | E5 | CD | |
Ö Ö "Ö", or "ö", is a character used in several extended Latin alphabets, or the letter O with umlaut to denote the front vowels or . In languages without umlaut, the character is also used as a "O with diaeresis" to denote a syllable break, wherein its pronunciation remains an unmodified .- O-Umlaut... |
U+00D6 | D6 | D6 | D6 | 99 | 99 | 85 |
× × The multiplication sign is the symbol ×. The symbol is similar to the lowercase letter x but is a more symmetric saltire, and has different uses. It is also known as St... |
U+00D7 | D7 | D7 | D7 | 9E | ||
Ø Ø Ø — minuscule: "ø", is a vowel and a letter used in the Danish, Faroese, Norwegian and Southern Sami languages.It's mostly used as a representation of mid front rounded vowels, such as ø œ, except for Southern Sami where it's used as an [oe] diphtong.The name of this letter is the same as the sound... |
U+00D8 | D8 | D8 | D8 | 9D | AF | |
Ù U U is the twenty-first letter and a vowel in the basic modern Latin alphabet.-History:The letter U ultimately comes from the Semitic letter Waw by way of the letter Y. See the letter Y for details.... |
U+00D9 | D9 | D9 | D9 | EB | F4 | |
Ú Ú Ú or ú is a Latin letter used in the Czech, Faroese, Hungarian, Icelandic, and Slovak writing systems. This letter also appears in Dutch, Irish, Occitan, Pinyin, Portuguese, Spanish, Italian, and Vietnamese as a variant of the letter "U".... |
U+00DA | DA | DA | DA | E9 | F2 | |
Û Û is a letter of the French, Friulian, Kurdish, and Turkish alphabets. This letter was used in the ISO 9:1995 system of Cyrillic transliteration as the letter Ю and also in Wade-Giles for apical dental unrounded vowel as in tzû, tz'û, ssû, corresponds to present zi, ci, si in Pinyin respectively... |
U+00DB | DB | DB | DB | EA | F3 | |
Ü Ü Ü, or ü, is a character which can be either a letter from several extended Latin alphabets, or the letter U with an umlaut or a diaeresis... |
U+00DC | DC | DC | DC | 9A | 9A | 86 |
Ý Y Y is the twenty-fifth letter in the basic modern Latin alphabet and represents either a vowel or a consonant in English.-Name:In Latin, Y was named Y Graeca "Greek Y". This was pronounced as I Graeca "Greek I", since Latin speakers had trouble pronouncing , which was not a native sound... |
U+00DD | DD | DD | DD | ED | ||
Þ | U+00DE | DE | DE | DE | E8 | ||
ß ß In the German alphabet, ß is a letter that originated as a ligature of ss or sz. Like double "s", it is pronounced as an , but in standard spelling, it is only used after long vowels and diphthongs, while ss is used after short vowels... |
U+00DF | DF | DF | DF | E1 | E1 | A7 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
à À is a letter of the Catalan, French, Galician, Italian, Portuguese, Scottish Gaelic and Vietnamese languages, consisting of the Latin letter A and a grave accent. À is also used in Pinyin transliteration. In most languages, it represents the vowel a. This letter is also a letter in Taos.When... |
U+00E0 | E0 | E0 | E0 | 85 | 85 | 88 |
á Á is a letter of the Czech, Faroese, Hungarian, Icelandic, Slovak and Sámi languages. This letter also appears in Dutch, Galician, Irish, Occitan, Portuguese, Spanish, Lakota, Navajo, and Vietnamese as a variant of the letter “a”. Some writers use á incorrectly to denote a quantity, often used on... |
U+00E1 | E1 | E1 | E1 | A0 | A0 | 87 |
â Â is a letter of the Friulian, Romanian, Vietnamese, French, Galician, Portuguese, Frisian, Welsh, Turkish, and Walloon alphabets.- Croatian and Serbian :... |
U+00E2 | E2 | E2 | E2 | 83 | 83 | 89 |
ã Ã Ã/ã is a letter used in some languages, generally considered a variant of the letter A.In Portuguese, Ã/ã represents a nasal central unrounded vowel, . The combination ãe represents the Diphthong , and ão represents... |
U+00E3 | E3 | E3 | E3 | C6 | 8B | |
ä Ä "Ä" and "ä" are both characters that represent either a letter from several extended Latin alphabets, or the letter A with an umlaut mark or diaeresis.- Independent letter :... |
U+00E4 | E4 | E4 | E4 | 84 | 84 | 8A |
å Å Å represents various sounds in several languages. Å is part of the alphabets used for the Alemannic and the Bavarian-Austrian dialects of German... |
U+00E5 | E5 | E5 | E5 | 86 | 86 | 8C |
æ Æ Æ is a grapheme formed from the letters a and e. Originally a ligature representing a Latin diphthong, it has been promoted to the full status of a letter in the alphabets of some languages, including Danish, Faroese, Norwegian and Icelandic... |
U+00E6 | E6 | E6 | E6 | 91 | 91 | BE |
ç Ç is a Latin script letter, used in the Albanian, Azerbaijani, Ligurian, Tatar, Turkish, Turkmen, Kurdish and Zazaki alphabets. This letter also appears in Catalan, French, Friulian, Occitan and Portuguese as a variant of the letter “c”... |
U+00E7 | E7 | E7 | E7 | 87 | 87 | 8D |
è È or can be*The letter E with a Grave accent.*In Shakespeare's works, è would be used in the -ed suffix to indicate alternate pronunciation, for example with winged/wingèd, the è would be added to produce a pronunciation of instead of .... |
U+00E8 | E8 | E8 | E8 | 8A | 8A | 8F |
é É is a letter of the Czech, Hungarian, Icelandic, Kashubian, Luxembourgish, Slovak, and Catalan, Danish, English, French, Galician, Irish, Italian, Occitan, Norwegian, Portuguese, Spanish, Swedish, and Vietnamese language as a variant of the letter “e”... |
U+00E9 | E9 | E9 | E9 | 82 | 82 | 8E |
ê Ê is a letter in the Friulan, Kurdish and Vietnamese languages. The letter also appears in Afrikaans, French, Portuguese, Welsh, and Albanian dialects as a variant of the letter "e", as well as being used in certain Chinese and Ukrainian transliteration systems.-Afrikaans:Ê is not considered a... |
U+00EA | EA | EA | EA | 88 | 88 | 90 |
ë Ë is a letter in the Albanian, Ripuarian, Uyghur Latin Script, Ladin, and Kashubian languages. This letter also appears in Afrikaans, Dutch, French, Abruzzese dialect , and Luxembourgish language as a variant of letter "e"... |
U+00EB | EB | EB | EB | 89 | 89 | 91 |
ì Ì Ì is used in the ISO 9:1995 system of Ukrainian transliteration as the Cyrillic letter І.In the Pinyin system of Chinese romanization, ì is an i with a falling tone.This appears in Catalan, Galician, Italian, Taos, and Vietnamese. Also Alcozauca Mixtec.... |
U+00EC | EC | EC | EC | 8D | 8D | 93 |
í Í is a letter in the Faroese, Hungarian, Icelandic, Czech, Slovak, and Tatar languages. This letter also appears in Catalan, Irish, Occitan, Portuguese, Spanish, Galician, Leonese, Navajo, and Vietnamese language as a variant of letter “i”.... |
U+00ED | ED | ED | ED | A1 | A1 | 92 |
î Î is a letter in the Friulian, Kurdish, and Romanian alphabets. This letter also appears in French, Welsh and Walon language as a variant of letter “i”.- Afrikaans :... |
U+00EE | EE | EE | EE | 8C | 8C | 94 |
ï Ï ', lowercase ', is a symbol used in various languages written with the Latin alphabet and in Ukrainian language which is written with the Cyrillic based Ukrainian alphabet; it can be read as the letter I with diaeresis or I-umlaut.... |
U+00EF | EF | EF | EF | 8B | 8B | 95 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
ð Ð A Latin capital letter D with a stroke through its vertical bar is the uppercase form of several different letters:*D with stroke , used in Vietnamese, some South Slavic , Moro and Sami languages... |
U+00F0 | F0 | F0 | F0 | D0 | ||
ñ Ñ Ñ is a letter of the modern Latin alphabet, formed by an N with a diacritical tilde. It is used in the Spanish alphabet, Galician alphabet, Asturian alphabet, Basque alphabet, Aragonese old alphabet , Filipino alphabet, Chamorro alphabet and the Guarani alphabet, where it represents... |
U+00F1 | F1 | F1 | F1 | A4 | A4 | 96 |
ò Ò is a letter in the Kashubian language. This letter also appears in Catalan, Italian, Occitan, Scottish Gaelic, Taos, and Vietnamese language as a variant of letter “o”.-Character mappings:-External links:***... |
U+00F2 | F2 | F2 | F2 | 95 | 95 | 98 |
ó Ó is a letter in the Faroese, Hungarian, Icelandic, Kashubian, Polish, Czech, Slovak, and Sorbian languages. This letter also appears in the Catalan, Irish, Occitan, Portuguese, Spanish, Italian and Vietnamese languages as a variant of letter “o”. It is also used in English for other purposes... |
U+00F3 | F3 | F3 | F3 | A2 | A2 | 97 |
ô O O is the fifteenth letter and a vowel in the basic modern Latin alphabet.The letter was derived from the Semitic `Ayin , which represented a consonant, probably , the sound represented by the Arabic letter ع called `Ayn. This Semitic letter in its original form seems to have been inspired by a... |
U+00F4 | F4 | F4 | F4 | 93 | 93 | 99 |
õ Õ "Õ", or "õ" is a composition of the Latin letter O with the diacritic mark tilde.The HTML entity is Õ for Õ and õ for õ.-Estonian:... |
U+00F5 | F5 | F5 | F5 | E4 | 9B | |
ö Ö "Ö", or "ö", is a character used in several extended Latin alphabets, or the letter O with umlaut to denote the front vowels or . In languages without umlaut, the character is also used as a "O with diaeresis" to denote a syllable break, wherein its pronunciation remains an unmodified .- O-Umlaut... |
U+00F6 | F6 | F6 | F6 | 94 | 94 | 9A |
÷ | U+00F7 | F7 | F7 | F7 | F6 | F6 | D6 |
ø Ø Ø — minuscule: "ø", is a vowel and a letter used in the Danish, Faroese, Norwegian and Southern Sami languages.It's mostly used as a representation of mid front rounded vowels, such as ø œ, except for Southern Sami where it's used as an [oe] diphtong.The name of this letter is the same as the sound... |
U+00F8 | F8 | F8 | F8 | 9B | BF | |
ù U U is the twenty-first letter and a vowel in the basic modern Latin alphabet.-History:The letter U ultimately comes from the Semitic letter Waw by way of the letter Y. See the letter Y for details.... |
U+00F9 | F9 | F9 | F9 | 97 | 97 | 9D |
ú Ú Ú or ú is a Latin letter used in the Czech, Faroese, Hungarian, Icelandic, and Slovak writing systems. This letter also appears in Dutch, Irish, Occitan, Pinyin, Portuguese, Spanish, Italian, and Vietnamese as a variant of the letter "U".... |
U+00FA | FA | FA | FA | A3 | A3 | 9C |
û Û is a letter of the French, Friulian, Kurdish, and Turkish alphabets. This letter was used in the ISO 9:1995 system of Cyrillic transliteration as the letter Ю and also in Wade-Giles for apical dental unrounded vowel as in tzû, tz'û, ssû, corresponds to present zi, ci, si in Pinyin respectively... |
U+00FB | FB | FB | FB | 96 | 96 | 9E |
ü Ü Ü, or ü, is a character which can be either a letter from several extended Latin alphabets, or the letter U with an umlaut or a diaeresis... |
U+00FC | FC | FC | FC | 81 | 81 | 9F |
ý Y Y is the twenty-fifth letter in the basic modern Latin alphabet and represents either a vowel or a consonant in English.-Name:In Latin, Y was named Y Graeca "Greek Y". This was pronounced as I Graeca "Greek I", since Latin speakers had trouble pronouncing , which was not a native sound... |
U+00FD | FD | FD | FD | EC | ||
þ | U+00FE | FE | FE | FE | E7 | ||
ÿ Y Y is the twenty-fifth letter in the basic modern Latin alphabet and represents either a vowel or a consonant in English.-Name:In Latin, Y was named Y Graeca "Greek Y". This was pronounced as I Graeca "Greek I", since Latin speakers had trouble pronouncing , which was not a native sound... |
U+00FF | FF | FF | FF | 98 | 98 | D8 |
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
ı | U+0131 | D5 | F5 | ||||
Œ Œ Œ œŒ is a Latin alphabet grapheme, a ligature of o and e. In medieval and early modern Latin, it was used to represent the Greek diphthong οι, a usage which continues in English and French... |
U+0152 | BC | 8C | CE | |||
œ Œ Œ œŒ is a Latin alphabet grapheme, a ligature of o and e. In medieval and early modern Latin, it was used to represent the Greek diphthong οι, a usage which continues in English and French... |
U+0153 | BD | 9C | CF | |||
Š Š The grapheme Š, š is used in various contexts, usually denoting the voiceless postalveolar fricative. In the International Phonetic Alphabet this sound is denoted with , but the lowercase š is used in the Americanist phonetic notation, as well as in the Uralic Phonetic Alphabet.For use in computer... |
U+0160 | A6 | 8A | ||||
š Š The grapheme Š, š is used in various contexts, usually denoting the voiceless postalveolar fricative. In the International Phonetic Alphabet this sound is denoted with , but the lowercase š is used in the Americanist phonetic notation, as well as in the Uralic Phonetic Alphabet.For use in computer... |
U+0161 | A8 | 9A | ||||
Ÿ Y Y is the twenty-fifth letter in the basic modern Latin alphabet and represents either a vowel or a consonant in English.-Name:In Latin, Y was named Y Graeca "Greek Y". This was pronounced as I Graeca "Greek I", since Latin speakers had trouble pronouncing , which was not a native sound... |
U+0178 | BE | 9F | D9 | |||
Ž Ž The grapheme Ž is formed from Latin Z with the addition of caron . It is used in various contexts, usually denoting the voiced postalveolar fricative, a sound similar to English g in mirage, or Portuguese and French j... |
U+017D | B4 | 8E | ||||
ž Ž The grapheme Ž is formed from Latin Z with the addition of caron . It is used in various contexts, usually denoting the voiced postalveolar fricative, a sound similar to English g in mirage, or Portuguese and French j... |
U+017E | B8 | 9E | ||||
ƒ ƒ The letter ' is a letter of the Latin alphabet, based on the italic form of f; or on its regular form with a descender hook added... |
U+0192 | 83 | 9F | 9F | C4 | ||
ˆ | U+02C6 | 88 | F6 | ||||
ˇ | U+02C7 | FF | |||||
˘ | U+02D8 | F9 | |||||
˙ | U+02D9 | FA | |||||
˚ | U+02DA | FB | |||||
˛ | U+02DB | FE | |||||
˜ | U+02DC | 98 | F7 | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
˝ | U+02DD | FD | |||||
Γ | U+0393 | E2 | |||||
Θ | U+0398 | E9 | |||||
Σ | U+03A3 | E4 | |||||
Φ | U+03A6 | E8 | |||||
Ω | U+03A9 | EA | BD | ||||
α | U+03B1 | E0 | |||||
δ | U+03B4 | EB | |||||
ε | U+03B5 | EE | |||||
π | U+03C0 | E3 | B9 | ||||
σ | U+03C3 | E5 | |||||
τ | U+03C4 | E7 | |||||
φ | U+03C6 | ED | |||||
– | U+2013 | 96 | D0 | ||||
— | U+2014 | 97 | D1 | ||||
‗ | U+2017 | F2 | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
‘ | U+2018 | 91 | D4 | ||||
’ | U+2019 | 92 | D5 | ||||
‚ | U+201A | 82 | E2 | ||||
“ | U+201C | 93 | D2 | ||||
” | U+201D | 94 | D3 | ||||
„ | U+201E | 84 | E3 | ||||
† | U+2020 | 86 | A0 | ||||
‡ | U+2021 | 87 | E0 | ||||
• | U+2022 | 95 | A5 | ||||
… | U+2026 | 85 | C9 | ||||
‰ | U+2030 | 89 | E4 | ||||
‹ | U+2039 | 8B | DC | ||||
› | U+203A | 9B | DD | ||||
⁄ | U+2044 | DA | |||||
ⁿ | U+207F | FC | |||||
₧ | U+20A7 | 9E | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
€ Euro sign The euro sign is the currency sign used for the euro, the official currency of the Eurozone in the European Union . The design was presented to the public by the European Commission on 12 December 1996. The international three-letter code for the euro is EUR... |
U+20AC | A4 | 80 | DB | |||
™ | U+2122 | 99 | AA | ||||
∂ | U+2202 | B6 | |||||
∆ Delta (letter) Delta is the fourth letter of the Greek alphabet. In the system of Greek numerals it has a value of 4. It was derived from the Phoenician letter Dalet... |
U+2206 | C6 | |||||
∏ | U+220F | B8 | |||||
∑ | U+2211 | B7 | |||||
∙ | U+2219 | F9 | |||||
√ | U+221A | FB | C3 | ||||
∞ | U+221E | EC | B0 | ||||
∩ | U+2229 | EF | |||||
∫ | U+222B | BA | |||||
≈ | U+2248 | F7 | C5 | ||||
≠ | U+2260 | AD | |||||
≡ Triple bar The triple bar, ≡, is a symbol used in formal logic. It has the appearance of a "=" sign with a third line.Logically, it has a similar meaning to the if and only if coupler ⇔... |
U+2261 | F0 | |||||
≤ | U+2264 | F3 | B2 | ||||
≥ | U+2265 | F2 | B3 | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
⌐ | U+2310 | A9 | |||||
⌠ | U+2320 | F4 | |||||
⌡ | U+2321 | F5 | |||||
─ | U+2500 | C4 | C4 | ||||
│ | U+2502 | B3 | B3 | ||||
┌ | U+250C | DA | DA | ||||
┐ | U+2510 | BF | BF | ||||
└ | U+2514 | C0 | C0 | ||||
┘ | U+2518 | D9 | D9 | ||||
├ | U+251C | C3 | C3 | ||||
┤ | U+2524 | B4 | B4 | ||||
┬ | U+252C | C2 | C2 | ||||
┴ | U+2534 | C1 | C1 | ||||
┼ | U+253C | C5 | C5 | ||||
═ | U+2550 | CD | CD | ||||
║ | U+2551 | BA | BA | ||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
╒ | U+2552 | D5 | |||||
╓ | U+2553 | D6 | |||||
╔ | U+2554 | C9 | C9 | ||||
╕ | U+2555 | B8 | |||||
╖ | U+2556 | B7 | |||||
╗ | U+2557 | BB | BB | ||||
╘ | U+2558 | D4 | |||||
╙ | U+2559 | D3 | |||||
╚ | U+255A | C8 | C8 | ||||
╛ | U+255B | BE | |||||
╜ | U+255C | BD | |||||
╝ | U+255D | BC | BC | ||||
╞ | U+255E | C6 | |||||
╟ | U+255F | C7 | |||||
╠ | U+2560 | CC | CC | ||||
╡ | U+2561 | B5 | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
╢ | U+2562 | B6 | |||||
╣ | U+2563 | B9 | B9 | ||||
╤ | U+2564 | D1 | |||||
╥ | U+2565 | D2 | |||||
╦ | U+2566 | CB | CB | ||||
╧ | U+2567 | CF | |||||
╨ | U+2568 | D0 | |||||
╩ | U+2569 | CA | CA | ||||
╪ | U+256A | D8 | |||||
╫ | U+256B | D7 | |||||
╬ | U+256C | CE | CE | ||||
▀ | U+2580 | DF | DF | ||||
▄ | U+2584 | DC | DC | ||||
█ | U+2588 | DB | DB | ||||
▌ | U+258C | DD | |||||
▐ | U+2590 | DE | |||||
Character | Code point | ISO-8859-1 | ISO-8859-15 | WINDOWS-1252 | IBM437 | IBM850 | MACINTOSH |
░ | U+2591 | B0 | B0 | ||||
▒ | U+2592 | B1 | B1 | ||||
▓ | U+2593 | B2 | B2 | ||||
■ | U+25A0 | FE | FE | ||||
◊ | U+25CA | D7 | |||||
| U+F8FF | F0 | |||||
fi | U+FB01 | DE | |||||
fl | U+FB02 | DF |