KOI8-U
Encyclopedia
KOI8-U is an 8-bit character encoding
Character encoding
A character encoding system consists of a code that pairs each character from a given repertoire with something else, such as a sequence of natural numbers, octets or electrical pulses, in order to facilitate the transmission of data through telecommunication networks or storage of text in...

, designed to cover Ukrainian
Ukrainian language
Ukrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....

, which uses the Cyrillic alphabet. It is based on KOI8-R
KOI8-R
KOI8-R is an 8-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. It also happens to cover Bulgarian, but is not used since CP1251 is accepted. A derivative encoding is KOI8-U, which adds Ukrainian characters...

, which covers Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...

 and Bulgarian
Bulgarian language
Bulgarian is an Indo-European language, a member of the Slavic linguistic group.Bulgarian, along with the closely related Macedonian language, demonstrates several linguistic characteristics that set it apart from all other Slavic languages such as the elimination of case declension, the...

, but replaces eight graphic characters with four Ukrainian letters Ґ, Є
Ukrainian Ye
Ukrainian Ye is a character of the Cyrillic script. It is considered as an individual letter of modern Ukrainian alphabet and as a variant form of Ye in modern Church Slavonic language...

, І
Ukrainian I
Dotted I , also called Decimal I, is a letter of the Cyrillic alphabet.It commonly represents the close front unrounded vowel , like the pronunciation of ⟨i⟩ in "machine"....

, and Ї
Yi (Cyrillic)
Yi is a letter of the Cyrillic alphabet.It represents the iotated vowel sound , like the pronunciation of ⟨yi⟩ in "playing", and is used in the Rusyn and Ukrainian alphabets....

 in both upper case and lower case.

In Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...

, KOI8-U is assigned the code page number 21866. In IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...

, KOI8-U is assigned code page 1168.

KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251
Windows-1251
Windows-1251 is a popular 8-bit character encoding, designed to cover languages that use the Cyrillic alphabet such as Russian, Bulgarian, Serbian Cyrillic and other languages...

. In the future, both may eventually give way to Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

.

In Russian, KOI8 stands for Kod Obmena Informatsiey, 8 bit (Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".

The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the natural Cyrillic alphabetical order as in ISO 8859-5. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text can still be read (or at least deciphered) in case-reversed transliteration on an ordinary ASCII terminal. For instance, "Русский Текст" in KOI8-U becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped.

Codepage layout

>
]]|125}}
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
|style="border-width:3px;border:medium solid black" |
||
|style="border-width:3px;border:medium solid black" |
|style="border-width:3px;border:medium solid black" |
||
||
||
||
||
|style="border-width:3px;border:medium solid black" |
||
||
|-
!
||
||
||
||
|style="border-width:3px;border:medium solid black" |
||
|style="border-width:3px;border:medium solid black" |
|style="border-width:3px;border:medium solid black" |
||
||
||
||
||
|style="border-width:3px;border:medium solid black" |
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
|-
!
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||
||

|}

In the table above, 20 is the regular SPACE character, and 9A is the NO-BREAK SPACE.

The difference with KOI8-R
KOI8-R
KOI8-R is an 8-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. It also happens to cover Bulgarian, but is not used since CP1251 is accepted. A derivative encoding is KOI8-U, which adds Ukrainian characters...

 consists of the positions 0xA4; 0xA6; 0xA7; 0xAD; and 0xB4; 0xB6; 0xB7; 0xBD; which consist of extra letters that don't exist in Russian.

Although RFC 2319 says that character 95 should be U+2219 (∙), it may also be U+2022 (•) to match the bullet character in Windows-1251
Windows-1251
Windows-1251 is a popular 8-bit character encoding, designed to cover languages that use the Cyrillic alphabet such as Russian, Bulgarian, Serbian Cyrillic and other languages...

.

Some references have a typo and incorrectly state that character B4 is U+0403, rather than the correct U+0404. This typo is present in Appendix A of RFC 2319 (but the table in the main text of the RFC gives the correct mapping).

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK