Windows-1252
Encyclopedia
Windows-1252 or CP-1252 is a character encoding
of the Latin alphabet
, used by default in the legacy components of Microsoft Windows
in English and some other Western languages. It is one version within the group of Windows code page
s. In LaTeX
packages, it is referred to as ansinew.
of ISO 8859-1
, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 0x80 to 0x9F range. It is known to Windows by the code page
number 1252, and by the IANA
-approved name "windows-1252". This code page also contains all the printable characters that are in ISO 8859-15
(though some are mapped to different code point
s).
It is very common to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in Microsoft software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 in order to accommodate such mislabeling. This is now standard behavior in the draft HTML 5
specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.
Historically, the term "ANSI code page" (ACP) is used in Windows to refer to various code pages considered as native. The intention was that most of these would be ANSI
standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft now states that "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."
equivalent and its decimal code.
Microsoft cites Unicode mappings of Windows-1252 with "best fit", which also includes the five unmapped C1 code points as well as code points that map to 1252 in a lossy fashion.
>
Legend: yellow cells are control characters, blue cells are punctuation, purple cells are numbers, green cells are ASCII letters, and tan cells are international letters. Differences from ISO-8859-1 are marked with thick green borders.
According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused. However, the Windows API call for converting from code pages to Unicode maps these to the corresponding C1 control code
s. The euro character at position 80 was not present in earlier versions of this code page, nor were the S and Z with caron
(háček).
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...
of the Latin alphabet
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
, used by default in the legacy components of Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
in English and some other Western languages. It is one version within the group of Windows code page
Windows code page
Windows code pages are sets of characters or code pages used in Microsoft Windows from the 1980s and 1990s...
s. In LaTeX
LaTeX
LaTeX is a document markup language and document preparation system for the TeX typesetting program. Within the typesetting system, its name is styled as . The term LaTeX refers only to the language in which documents are written, not to the editor used to write those documents. In order to...
packages, it is referred to as ansinew.
Details
The encoding is a supersetSuperSet
SuperSet Software was a group founded by friends and former Eyring Research Institute co-workers Drew Major, Dale Neibaur, Kyle Powell and later joined by Mark Hurst...
of ISO 8859-1
ISO/IEC 8859-1
ISO/IEC 8859-1:1998, Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1987. It is informally referred to as Latin-1. It is generally...
, but differs from the IANA's ISO-8859-1 by using displayable characters rather than control characters in the 0x80 to 0x9F range. It is known to Windows by the code page
Code page
Code page is another term for character encoding. It consists of a table of values that describes the character set for a particular language. The term code page originated from IBM's EBCDIC-based mainframe systems, but many vendors use this term including Microsoft, SAP, and Oracle Corporation...
number 1252, and by the IANA
Internet Assigned Numbers Authority
The Internet Assigned Numbers Authority is the entity that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System , media types, and other Internet Protocol-related symbols and numbers...
-approved name "windows-1252". This code page also contains all the printable characters that are in ISO 8859-15
ISO/IEC 8859-15
ISO/IEC 8859-15:1999, Information technology — 8-bit single-byte coded graphic character sets — Part 15: Latin alphabet No. 9, is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 1999. It is informally referred to as Latin-9...
(though some are mapped to different code point
Code point
In character encoding terminology, a code point or code position is any of the numerical values that make up the code space . For example, ASCII comprises 128 code points in the range 0hex to 7Fhex, Extended ASCII comprises 256 code points in the range 0hex to FFhex, and Unicode comprises 1,114,112...
s).
It is very common to mislabel Windows-1252 text with the charset label ISO-8859-1. A common result was that all the quotes and apostrophes (produced by "smart quotes" in Microsoft software) were replaced with question marks or boxes on non-Windows operating systems, making text difficult to read. Most modern web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 in order to accommodate such mislabeling. This is now standard behavior in the draft HTML 5
HTML 5
HTML5 is a language for structuring and presenting content for the World Wide Web, and is a core technology of the Internet originally proposed by Opera Software. It is the fifth revision of the HTML standard and is still under development...
specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.
Historically, the term "ANSI code page" (ACP) is used in Windows to refer to various code pages considered as native. The intention was that most of these would be ANSI
Ansi
Ansi is a village in Kaarma Parish, Saare County, on the island of Saaremaa, Estonia....
standards such as ISO-8859-1. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft now states that "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."
Codepage layout
The following table shows Windows-1252. Each character is shown with its UnicodeUnicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
equivalent and its decimal code.
Microsoft cites Unicode mappings of Windows-1252 with "best fit", which also includes the five unmapped C1 code points as well as code points that map to 1252 in a lossy fashion.
Legend: yellow cells are control characters, blue cells are punctuation, purple cells are numbers, green cells are ASCII letters, and tan cells are international letters. Differences from ISO-8859-1 are marked with thick green borders.
According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused. However, the Windows API call for converting from code pages to Unicode maps these to the corresponding C1 control code
C0 and C1 control codes
Most character encodings, in addition to representing printable characters, may also represent additional information about the text, such as the position of a cursor, an instruction to start a new line, or a message that the text has been received...
s. The euro character at position 80 was not present in earlier versions of this code page, nor were the S and Z with caron
Caron
A caron or háček , also known as a wedge, inverted circumflex, inverted hat, is a diacritic placed over certain letters to indicate present or historical palatalization, iotation, or postalveolar pronunciation in the orthography of some Baltic, Slavic, Finno-Lappic, and other languages.It looks...
(háček).