European ordering rules - AbsoluteAstronomy.com

The European ordering rules (EOR / EN 13710), define an ordering for strings written in languages that are written with the Latin

Latin alphabet

The Latin alphabet, also called the Roman alphabet, is the most recognized alphabet used in the world today. It evolved from a western variety of the Greek alphabet called the Cumaean alphabet, which was adopted and modified by the Etruscans who ruled early Rome...

, Greek

Greek alphabet

The Greek alphabet is the script that has been used to write the Greek language since at least 730 BC . The alphabet in its classical and modern form consists of 24 letters ordered in sequence from alpha to omega...

and Cyrillic

Cyrillic alphabet

The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

alphabet

Alphabet

An alphabet is a standard set of letters—basic written symbols or graphemes—each of which represents a phoneme in a spoken language, either as it exists now or as it was in the past. There are other systems, such as logographies, in which each character represents a word, morpheme, or semantic...

s. The standard covers languages used by the European Union

European Union

The European Union is an economic and political union of 27 independent member states which are located primarily in Europe. The EU traces its origins from the European Coal and Steel Community and the European Economic Community , formed by six countries in 1958...

, the European Free Trade Association

European Free Trade Association

The European Free Trade Association or EFTA is a free trade organisation between four European countries that operates parallel to, and is linked to, the European Union . EFTA was established on 3 May 1960 as a trade bloc-alternative for European states who were either unable to, or chose not to,...

, and parts of the former Soviet Union. It is a tailoring of the Common Tailorable Template of ISO/IEC 14651

ISO 14651

ISO/IEC 14651:2007, Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering, is an ISO Standard specifying an algorithm that can be used when comparing two strings. This comparison...

. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.

Method

Just as for ISO/IEC 14651

ISO 14651

, upon which EOR is based, EOR has 4 levels of weights.

Level 1 sorts the letters. The following Latin letters are concerned by this level, in order:

a b c d e f g h i j k l m n o p q r s t u v w x y z þ

The Greek alphabet

Greek alphabet

has the following order:

α β γ δ ε Ϝ Ϛ ζ η θ ι κ λ μ ν ξ ο π Ϟ ρ σ τ υ φ χ ψ ω Ϡ

The Cyrillic alphabet

Cyrillic alphabet

The Cyrillic script or azbuka is an alphabetic writing system developed in the First Bulgarian Empire during the 10th century AD at the Preslav Literary School...

has the following order:

а ӑ ӓ ә ӛ ӕ б в г ғ ҕ д ђ ҙ е ӗ є ж ӝ җ з ӟ s ӡ и ӥ і ї й ј к қ ӄ ҡ ҟ ҝ л љ м н ң ӊ ҥ њ о ӧ ө ӫ п ҧ р с ҫ т ҭ ћ у ў ӱ ӳ ү ұ ф х ҳ һ ц ҵ ч ӵ ҷ ӌ ҹ ҽ ҿ џ ш щ ъ ы ӹ ь э ю я ҩ Ӏ

The order for the three alphabets is:

Latin alphabet
Greek alphabet
Cyrillic alphabet

The Georgian

Georgian alphabet

The Georgian alphabet is the writing system used to write the Georgian language and other Kartvelian languages , and occasionally other languages of the Caucasus such as Ossetic and Abkhaz during the 1940s...

and Armenian alphabet

Armenian alphabet

The Armenian alphabet is an alphabet that has been used to write the Armenian language since the year 405 or 406. It was devised by Saint Mesrop Mashtots, an Armenian linguist and ecclesiastical leader, and contained originally 36 letters. Two more letters, օ and ֆ, were added in the Middle Ages...

s have not been included in ENV 13710. However, they are covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". Note also that all scripts encoded in ISO/IEC 10646 and Unicode are covered by ISO/IEC 14651

ISO 14651

(and its datafile CTT) as well as Unicode Collation Algorithm

Unicode collation algorithm

The Unicode collation algorithm is an algorithm defined in Unicode Technical Report #10, which defines a customizable method to compare two strings. These comparisons can then be used to collate or sort text in any writing system and language that can be represented with Unicode.Unicode Technical...

(UCA and the associated DUCET), both of which are available at no charge.

Level 2 is where different additions, such as diacritic

Diacritic

A diacritic is a glyph added to a letter, or basic glyph. The term derives from the Greek διακριτικός . Diacritic is both an adjective and a noun, whereas diacritical is only an adjective. Some diacritical marks, such as the acute and grave are often called accents...

s and variations, to the letters are ordered. Letters with diacritical marks (like ⟨å⟩, ⟨ä⟩, ⟨ö⟩, and ⟨ø⟩) are ordered as variants of the base letter. ⟨æ⟩, ⟨œ⟩, ⟨ĳ⟩ and ⟨ŋ⟩ are ordered as modifications of ⟨ae⟩, ⟨oe⟩, ⟨ij⟩ and ⟨n⟩ respectively, similarly for similar cases.

Level 2 defines the following order of diacritics and other modifications:

Acute accent
Acute accent
The acute accent is a diacritic used in many modern written languages with alphabets based on the Latin, Cyrillic, and Greek scripts.-Apex:An early precursor of the acute accent was the apex, used in Latin inscriptions to mark long vowels.-Greek:...

(á)
Grave accent
Grave accent
The grave accent is a diacritical mark used in written Breton, Catalan, Corsican, Dutch, French, Greek , Italian, Mohawk, Norwegian, Occitan, Portuguese, Scottish Gaelic, Vietnamese, Welsh, Romansh, and other languages.-Greek:The grave accent was first used in the polytonic orthography of Ancient...

(à)
Breve
Breve
A breve is a diacritical mark ˘, shaped like the bottom half of a circle. It resembles the caron , but is rounded, while the caron has a sharp tip...

(ă)
Circumflex
Circumflex
The circumflex is a diacritic used in the written forms of many languages, and is also commonly used in various romanization and transcription schemes. It received its English name from Latin circumflexus —a translation of the Greek περισπωμένη...

(â)
Hacek (háček) (š)
Ring
Ring (diacritic)
A ring diacritic may appear above or below letters. It may be combined with some letters of the extended Latin alphabets in various contexts.-Ring above:...

(å)
Trema (ä)
Double acute accent
Double acute accent
The double acute accent is a diacritic mark of the Latin script. It is used primarily in written Hungarian, and consequently is sometimes referred to as Hungarumlaut, a portmanteau of Hungarian umlaut...

(ő)
Tilde
Tilde
The tilde is a grapheme with several uses. The name of the character comes from Portuguese and Spanish, from the Latin titulus meaning "title" or "superscription", though the term "tilde" has evolved and now has a different meaning in linguistics....

(ã)
Dot
Dot (diacritic)
When used as a diacritic mark, the term dot is usually reserved for the Interpunct , or to the glyphs 'combining dot above' and 'combining dot below'...

(ż)
Cedilla
Cedilla
A cedilla , also known as cedilha or cédille, is a hook added under certain letters as a diacritical mark to modify their pronunciation.-Origin:...

(ş)
Ogonek
Ogonek
The ogonek is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European and Native American languages.-Use:...

(ą)
Macron
Macron
A macron, from the Greek , meaning "long", is a diacritic placed above a vowel . It was originally used to mark a long or heavy syllable in Greco-Roman metrics, but now marks a long vowel...

(ā)
With stroke trough (ø)
Modified letter(s) (æ)

Level 3 makes the distinction between capital and small letters, as in "Polish" and "polish".

Level 4 concerns punctuation

Punctuation

Punctuation marks are symbols that indicate the structure and organization of written language, as well as intonation and pauses to be observed when reading aloud.In written English, punctuation is vital to disambiguate the meaning of sentences...

and whitespace characters. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".

An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is italic, normal or bold.

External links

European Ordering Rules, ENV 13710 – a "European Pre-Standard"

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.

Method

See also

External links