European ordering rules
Encyclopedia
The European ordering rules (EOR / EN 13710), define an ordering for strings written in languages that are written with the Latin
, Greek
and Cyrillic
alphabet
s. The standard covers languages used by the European Union
, the European Free Trade Association
, and parts of the former Soviet Union. It is a tailoring of the Common Tailorable Template of ISO/IEC 14651
. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.
, upon which EOR is based, EOR has 4 levels of weights.
Level 1 sorts the letters. The following Latin letters are concerned by this level, in order:
The Greek alphabet
has the following order:
The Cyrillic alphabet
has the following order:
The order for the three alphabets is:
The Georgian
and Armenian alphabet
s have not been included in ENV 13710. However, they are covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". Note also that all scripts encoded in ISO/IEC 10646 and Unicode are covered by ISO/IEC 14651
(and its datafile CTT) as well as Unicode Collation Algorithm
(UCA and the associated DUCET), both of which are available at no charge.
Level 2 is where different additions, such as diacritic
s and variations, to the letters are ordered. Letters with diacritical marks (like ⟨å⟩, ⟨ä⟩, ⟨ö⟩, and ⟨ø⟩) are ordered as variants of the base letter. ⟨æ⟩, ⟨œ⟩, ⟨ij⟩ and ⟨ŋ⟩ are ordered as modifications of ⟨ae⟩, ⟨oe⟩, ⟨ij⟩ and ⟨n⟩ respectively, similarly for similar cases.
Level 2 defines the following order of diacritics and other modifications:
Level 3 makes the distinction between capital and small letters, as in "Polish" and "polish".
Level 4 concerns punctuation
and whitespace characters. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".
An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is italic, normal or bold.
Latin alphabet
The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...
, Greek
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...
and Cyrillic
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...
alphabet
Alphabet
An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...
s. The standard covers languages used by the European Union
European Union
The European Union is an economic and political union of 27 independent member states which are located primarily in Europe. The EU traces its origins from the European Coal and Steel Community and the European Economic Community , formed by six countries in 1958...
, the European Free Trade Association
European Free Trade Association
The European Free Trade Association or EFTA is a free trade organisation between four European countries that operates parallel to, and is linked to, the European Union . EFTA was established on 3 May 1960 as a trade bloc-alternative for European states who were either unable to, or chose not to,...
, and parts of the former Soviet Union. It is a tailoring of the Common Tailorable Template of ISO/IEC 14651
ISO 14651
ISO/IEC 14651:2007, Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering, is an ISO Standard specifying an algorithm that can be used when comparing two strings. This comparison...
. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.
Method
Just as for ISO/IEC 14651ISO 14651
ISO/IEC 14651:2007, Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering, is an ISO Standard specifying an algorithm that can be used when comparing two strings. This comparison...
, upon which EOR is based, EOR has 4 levels of weights.
Level 1 sorts the letters. The following Latin letters are concerned by this level, in order:
- a b c d e f g h i j k l m n o p q r s t u v w x y z þ
The Greek alphabet
Greek alphabet
The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...
has the following order:
- α β γ δ ε Ϝ Ϛ ζ η θ ι κ λ μ ν ξ ο π Ϟ ρ σ τ υ φ χ ψ ω Ϡ
The Cyrillic alphabet
Cyrillic alphabet
The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...
has the following order:
- а ӑ ӓ ә ӛ ӕ б в г ғ ҕ д ђ ҙ е ӗ є ж ӝ җ з ӟ s ӡ и ӥ і ї й ј к қ ӄ ҡ ҟ ҝ л љ м н ң ӊ ҥ њ о ӧ ө ӫ п ҧ р с ҫ т ҭ ћ у ў ӱ ӳ ү ұ ф х ҳ һ ц ҵ ч ӵ ҷ ӌ ҹ ҽ ҿ џ ш щ ъ ы ӹ ь э ю я ҩ Ӏ
The order for the three alphabets is:
- Latin alphabet
- Greek alphabet
- Cyrillic alphabet
The Georgian
Georgian alphabet
The Georgian alphabet is the writing system used to write the Georgian language and other Kartvelian languages , and occasionally other languages of the Caucasus such as Ossetic and Abkhaz during the 1940s...
and Armenian alphabet
Armenian alphabet
The Armenian alphabet is an alphabet that has been used to write the Armenian language since the year 405 or 406. It was devised by Saint Mesrop Mashtots, an Armenian linguist and ecclesiastical leader, and contained originally 36 letters. Two more letters, օ and ֆ, were added in the Middle Ages...
s have not been included in ENV 13710. However, they are covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". Note also that all scripts encoded in ISO/IEC 10646 and Unicode are covered by ISO/IEC 14651
ISO 14651
ISO/IEC 14651:2007, Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering, is an ISO Standard specifying an algorithm that can be used when comparing two strings. This comparison...
(and its datafile CTT) as well as Unicode Collation Algorithm
Unicode collation algorithm
The Unicode collation algorithm is an algorithm defined in Unicode Technical Report #10, which defines a customizable method to compare two strings. These comparisons can then be used to collate or sort text in any writing system and language that can be represented with Unicode.Unicode Technical...
(UCA and the associated DUCET), both of which are available at no charge.
Level 2 is where different additions, such as diacritic
Diacritic
A diacritic is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός . Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute and grave are often called accents...
s and variations, to the letters are ordered. Letters with diacritical marks (like ⟨å⟩, ⟨ä⟩, ⟨ö⟩, and ⟨ø⟩) are ordered as variants of the base letter. ⟨æ⟩, ⟨œ⟩, ⟨ij⟩ and ⟨ŋ⟩ are ordered as modifications of ⟨ae⟩, ⟨oe⟩, ⟨ij⟩ and ⟨n⟩ respectively, similarly for similar cases.
Level 2 defines the following order of diacritics and other modifications:
- Acute accentAcute accentThe acute accent is a diacritic used in many modern written languages with alphabets based on the Latin, Cyrillic, and Greek scripts.-Apex:An early precursor of the acute accent was the apex, used in Latin inscriptions to mark long vowels.-Greek:...
(á) - Grave accentGrave accentThe grave accent is a diacritical mark used in written Breton, Catalan, Corsican, Dutch, French, Greek , Italian, Mohawk, Norwegian, Occitan, Portuguese, Scottish Gaelic, Vietnamese, Welsh, Romansh, and other languages.-Greek:The grave accent was first used in the polytonic orthography of Ancient...
(à) - BreveBreveA breve is a diacritical mark ˘, shaped like the bottom half of a circle. It resembles the caron , but is rounded, while the caron has a sharp tip...
(ă) - CircumflexCircumflexThe circumflex is a diacritic used in the written forms of many languages, and is also commonly used in various romanization and transcription schemes. It received its English name from Latin circumflexus —a translation of the Greek περισπωμένη...
(â) - Hacek (háček) (š)
- RingRing (diacritic)A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.-Ring above:...
(å) - Trema (ä)
- Double acute accentDouble acute accentThe double acute accent is a diacritic mark of the Latin script. It is used primarily in written Hungarian, and consequently is sometimes referred to as Hungarumlaut, a portmanteau of Hungarian umlaut...
(ő) - TildeTildeThe tilde is a grapheme with several uses. The name of the character comes from Portuguese and Spanish, from the Latin titulus meaning "title" or "superscription", though the term "tilde" has evolved and now has a different meaning in linguistics....
(ã) - DotDot (diacritic)When used as a diacritic mark, the term dot is usually reserved for the Interpunct , or to the glyphs 'combining dot above' and 'combining dot below'...
(ż) - CedillaCedillaA cedilla , also known as cedilha or cédille, is a hook added under certain letters as a diacritical mark to modify their pronunciation.-Origin:...
(ş) - OgonekOgonekThe ogonek is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European and Native American languages.-Use:...
(ą) - MacronMacronA macron, from the Greek , meaning "long", is a diacritic placed above a vowel . It was originally used to mark a long or heavy syllable in Greco-Roman metrics, but now marks a long vowel...
(ā) - With stroke trough (ø)
- Modified letter(s) (æ)
Level 3 makes the distinction between capital and small letters, as in "Polish" and "polish".
Level 4 concerns punctuation
Punctuation
Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.In written English, punctuation is vital to disambiguate the meaning of sentences...
and whitespace characters. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".
An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is italic, normal or bold.
See also
- CollationCollationCollation is the assembly of written information into a standard order. One common type of collation is called alphabetization, though collation is not limited to ordering letters of the alphabet...
- ISO/IEC 14651ISO 14651ISO/IEC 14651:2007, Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering, is an ISO Standard specifying an algorithm that can be used when comparing two strings. This comparison...
- Unicode collation algorithmUnicode collation algorithmThe Unicode collation algorithm is an algorithm defined in Unicode Technical Report #10, which defines a customizable method to compare two strings. These comparisons can then be used to collate or sort text in any writing system and language that can be represented with Unicode.Unicode Technical...
(UCA) - Common Locale Data RepositoryCommon Locale Data RepositoryThe Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in the XML format for use in computer applications. CLDR contains locale specific information that an operating system will typically provide to applications. CLDR is...
(CLDR)
External links
- European Ordering Rules, ENV 13710 – a "European Pre-Standard"