ISO 11940
Encyclopedia
ISO 11940 is an ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...

 standard for the romanization
Romanization
In linguistics, romanization or latinization is the representation of a written word or spoken speech with the Roman script, or a system for doing so, where the original word or language uses a different writing system . Methods of romanization include transliteration, for representing written...

 of the Thai alphabet
Thai alphabet
Thai script , is used to write the Thai language and other, minority, languages in Thailand. It has forty-four consonants , fifteen vowel symbols that combine into at least twenty-eight vowel forms, and four tone marks ....

, published in 1998 and updated in September 2003.

Consonants

Thai
ISO k k̄h ḳ̄h kh k̛h ḳh ng c c̄h ch s c̣h
 
Thai  
ISO ṭ̄h ṯh t̛h d t t̄h th ṭh n
 
Thai  
ISO b p p̄h ph f p̣h m
 
Thai
ISO y r v l ł w ṣ̄ s̛̄ x


The transliteration of the pure consonants is derived from their usual pronunciation as an initial consonant. An unmarked h is used to form digraphs denoting aspirated consonants. High and low pairs of consonants are systematically differentiated by applying a macron to the high class consonant. Further differentiation of consonants with identical phonetic function is obtained by leaving the most frequent unmarked, marking the second commonest by a dot below, marking the third commonest by a horn, and marking the fourth commonest by underlining. The use of a dot below has a similar effect to the Indological practice of distinguishing retroflex consonants by a dot below, but there are subtle differences – it is the transliterations of ธ tho thong and ศ so sala that are dotted below, not those of the corresponding retroflex consonants. The transliterations of consonants should be entered in the order base letter, macron if any, and then dot below, horn or "macron below".

Only three consonants have the horn in their transliteration, ฅ kho khon, ฒ tho phuthao and ษ so ruesi, and only one consonant has an underline, ฑ tho nang montho.

Vowels

Thai –ั  ำ –ิ –ี –ึ –ื –ุ –ู ฤๅ ฦๅ
ISO a ā å i ī ụ̄ u ū e æ o ı v ł łɨ y w x


The letter å is the only precomposed character specified in the output of transliteration.

Lakkhangyao (ๅ) has been shown only in combination with the vowel letters ฤ and ฦ. The standard simply lists ฤ and ฦ with the consonants and lakkhangyao with the vowels. An isolated lakkhangyao would also be transliterated by a small letter "i" with stroke (ɨ), but such should not occur in Thai, Pāli or Sanskrit.

The transliterations of ว wo waen and อ o ang have been included here because of their use as complete vowel symbols, but their transliteration does not depend on how they are being used and the standard simply lists them with the consonants.

Compound vowel symbols are transliterated in accordance with their constituents.

Other combining marks

Thai –่ –้ –๊ –๋ –็ –์ –๎ –ํ –ฺ
ISO –̀ –̂ –́ –̌ –̆ –̒ ~ –̊ –̥

Note that yamakkan is represented by a spacing tilde, not a superscript tilde.

Punctuation and Digits

Thai
ISO « ǂ § ǀ ǁ » 0 1 2 3 4 5 6 7 8 9


ISO 11940:1998 distinguishes the abbreviation symbol paiyannoi from the sentence terminator angkhandiao, even though neither the national character standard TIS 620-2533
TIS 620
Thai Industrial Standard 620-2533, commonly referred to as TIS-620, is the most common character set and character encoding for the Thai language. The standard is published by the Thai Industrial Standards Institute , an organ of the Ministry of Industry under the Royal Thai Government, and is...

 nor Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

 Version 5.0 distinguishes them. Paiyannoi is transliterated as ǂ and angkhandiao is transliterated as ǀ. Note that paiyannoi, angkhandiao and angkhankhu are transliterated by the letters used for click consonants, not by double dagger, vertical bars or danda
Danda
In the Devanāgarī script, the danda is a punctuation character. The glyph consists of a single vertical stroke. The character can be found at code point U+0964 in Unicode. The "double danda" is at U+0965 . ISCII encodes danda at 0xEA....

s
.

Character Sequencing

In general characters are transliterated from left to right and, where characters have the same horizontal position, from top to bottom. The vertical sequencing is in fact simply specified as tone marks and thanthakhat preceding any other marks above or below the consonant. The standard denies at the end of Section 4.2 that the combination of sara u and nikkhahit can occur and then gives an example of it when specifying the transliteration of nikkhahit, but does not show the transliteration of the combination. The effect of these rules is that, except for nikkhahit, all the non-vowel marks attached to a consonant in Thai are attached to the consonant in the Roman transliteration.

The standard concedes that attempting to transpose preposed vowels and consonants may be comforting to those used to the Roman alphabet, but recommends that preposed vowels not be transposed.

For example, would transliterate to and to , if their preposed vowels were transscribed in orthographic sequence.

Finally, this implementation normalises its outputs using composed characters (Normalisation Form Composed).

Causes

The standard specifies the order in which the accents should be typed, but not all input systems will record accents in the order in which they are typed. Unicode specifies two normalised forms for letters with multiple accents, and transliterated text is highly likely to be stored in one of these forms. This complicates automatic back-transliteration. As Unicode-compliant processes must handle such variations correctly, the transliterations on this page have been chosen for ease of display – present day rendering systems may display equivalent forms differently.

Many fonts display novel combinations of consonants and accents badly. For example, the Institute of the Estonian Language publishes on the web an explanation of the application of the standard to Thai, and with one exception this seems to be a comply with the standard. The exception is that, except for the macron, accents over consonants are actually offset to the right, giving the impression that they have been entered as the corresponding non-combining characters. The standard specifies the transliterations in codepoints, but someone working from this free explanation could easily deduce that the spacing forms of the tone accents should be used.

ICU (CLDR 1.4.1)

The ICU
International Components for Unicode
International Components for Unicode is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all...

 implementation, recorded in Version 1.4.1 of the Common Local Data Repository sponsored by Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...

, uses a prime instead of a horn in the transliteration of consonants. This affects the transliteration of ฅ kho khon, ฒ tho phuthao and ษ so bo ruesi. ฏ to patak is also transliterated differently, as rather than .

This implementation transliterates ำ as  instead of å to avoid ambiguity with the hypothetical Thai script sequence ะํ (sara a, nikkhahit). The ICU implementation transliterates ฺ phinthu as ˌ instead of  ̥  to avoid problems with Unicode normalisation. This has the side effect of improving legibility when applied to a underdotted consonant.

The ICU implementation transliterates ฯ paiyannoi as (double dagger) and angkhankhu as || (two ASCII vertical bars). As the ICU implementation uses Unicode, it cannot reliably distinguish angkhandiao from paiyannoi without a semantic analysis, and makes no such attempt.

The character sequencing of the ICU implementation is different. It transposes preposed vowels with the following consonant, and processes the marks on a consonant in the order in which they are stored in memory. Most Thai input methods ensure that the marks are stored in bottom to top order.

For example, under this implementation transliterates to and to (not ?).

Finally, this implementation generates transliterations in Unicode Normalisation Form C (NFC).

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK