Apertium
Encyclopedia
Apertium is a rule-based machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

 platform. It is free software
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...

 and released under the terms of the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....

.

History

Apertium originated as one of the machine translation engines in the project OpenTrad, which was funded by the Spanish
Spain
Spain , officially the Kingdom of Spain languages]] under the European Charter for Regional or Minority Languages. In each of these, Spain's official name is as follows:;;;;;;), is a country and member state of the European Union located in southwestern Europe on the Iberian Peninsula...

 government. It was originally designed to translate between closely related languages, although it has recently been expanded to treat more divergent language pairs. To create a new machine translation system, one just has to develop linguistic data (dictionaries, rules) in well-specified XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....

 formats.

Language data developed for it (in collaboration with the Universidade de Vigo, the Universitat Politècnica de Catalunya and the Universitat Pompeu Fabra) currently support (in stable version) the Asturian
Asturian language
Asturian is a Romance language of the West Iberian group, Astur-Leonese Subgroup, spoken in the Spanish Region of Asturias by the Asturian people...

, Basque
Basque language
Basque is the ancestral language of the Basque people, who inhabit the Basque Country, a region spanning an area in northeastern Spain and southwestern France. It is spoken by 25.7% of Basques in all territories...

, Breton
Breton language
Breton is a Celtic language spoken in Brittany , France. Breton is a Brythonic language, descended from the Celtic British language brought from Great Britain to Armorica by migrating Britons during the Early Middle Ages. Like the other Brythonic languages, Welsh and Cornish, it is classified as...

, Bulgarian
Bulgarian language
Bulgarian is an Indo-European language, a member of the Slavic linguistic group.Bulgarian, along with the closely related Macedonian language, demonstrates several linguistic characteristics that set it apart from all other Slavic languages such as the elimination of case declension, the...

, Catalan
Catalan language
Catalan is a Romance language, the national and only official language of Andorra and a co-official language in the Spanish autonomous communities of Catalonia, the Balearic Islands and Valencian Community, where it is known as Valencian , as well as in the city of Alghero, on the Italian island...

, Danish
Danish language
Danish is a North Germanic language spoken by around six million people, principally in the country of Denmark. It is also spoken by 50,000 Germans of Danish ethnicity in the northern parts of Schleswig-Holstein, Germany, where it holds the status of minority language...

, English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...

, Esperanto
Esperanto
is the most widely spoken constructed international auxiliary language. Its name derives from Doktoro Esperanto , the pseudonym under which L. L. Zamenhof published the first book detailing Esperanto, the Unua Libro, in 1887...

, French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

, Galician
Galician language
Galician is a language of the Western Ibero-Romance branch, spoken in Galicia, an autonomous community located in northwestern Spain, where it is co-official with Castilian Spanish, as well as in border zones of the neighbouring territories of Asturias and Castile and León.Modern Galician and...

, Icelandic
Icelandic language
Icelandic is a North Germanic language, the main language of Iceland. Its closest relative is Faroese.Icelandic is an Indo-European language belonging to the North Germanic or Nordic branch of the Germanic languages. Historically, it was the westernmost of the Indo-European languages prior to the...

, Italian
Italian language
Italian is a Romance language spoken mainly in Europe: Italy, Switzerland, San Marino, Vatican City, by minorities in Malta, Monaco, Croatia, Slovenia, France, Libya, Eritrea, and Somalia, and by immigrant communities in the Americas and Australia...

, Macedonian
Macedonian language
Macedonian is a South Slavic language spoken as a first language by approximately 2–3 million people principally in the region of Macedonia but also in the Macedonian diaspora...

, Norwegian
Norwegian language
Norwegian is a North Germanic language spoken primarily in Norway, where it is the official language. Together with Swedish and Danish, Norwegian forms a continuum of more or less mutually intelligible local and regional variants .These Scandinavian languages together with the Faroese language...

 (Bokmål
Bokmål
Bokmål is one of two official Norwegian written standard languages, the other being Nynorsk. Bokmål is used by 85–90% of the population in Norway, and is the standard most commonly taught to foreign students of the Norwegian language....

 and Nynorsk
Nynorsk
Nynorsk or New Norwegian is one of two official written standards for the Norwegian language, the other being Bokmål. The standard language was created by Ivar Aasen during the mid-19th century, to provide a Norwegian alternative to the Danish language which was commonly written in Norway at the...

), Occitan, Portuguese
Portuguese language
Portuguese is a Romance language that arose in the medieval Kingdom of Galicia, nowadays Galicia and Northern Portugal. The southern part of the Kingdom of Galicia became independent as the County of Portugal in 1095...

, Romanian
Romanian language
Romanian Romanian Romanian (or Daco-Romanian; obsolete spellings Rumanian, Roumanian; self-designation: română, limba română ("the Romanian language") or românește (lit. "in Romanian") is a Romance language spoken by around 24 to 28 million people, primarily in Romania and Moldova...

, Spanish
Spanish language
Spanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...

, Swedish
Swedish language
Swedish is a North Germanic language, spoken by approximately 10 million people, predominantly in Sweden and parts of Finland, especially along its coast and on the Åland islands. It is largely mutually intelligible with Norwegian and Danish...

 and Welsh
Welsh language
Welsh is a member of the Brythonic branch of the Celtic languages spoken natively in Wales, by some along the Welsh border in England, and in Y Wladfa...

 languages. A full list is available below. Several companies are also involved in the development of Apertium, including Prompsit Language Engineering, Imaxin Software and Eleka Ingeniaritza Linguistikoa.

Apertium is a shallow-transfer machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

 system, which uses finite state transducer
Finite state transducer
A finite state transducer is a finite state machine with two tapes: an input tape and an output tape. This contrasts with an ordinary finite state automaton , which has a single tape.-Overview:...

s for all of its lexical transformations, and hidden Markov model
Hidden Markov model
A hidden Markov model is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved states. An HMM can be considered as the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E...

s for part-of-speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...

 or word category disambiguation. Constraint Grammar
Constraint Grammar
Constraint Grammar is a methodological paradigm for Natural language processing . Linguist-written, context dependent rules are compiled into a grammar that assigns grammatical tags to words or other tokens in running text...

 taggers are also used for some language pairs (eg. Breton
Breton language
Breton is a Celtic language spoken in Brittany , France. Breton is a Brythonic language, descended from the Celtic British language brought from Great Britain to Armorica by migrating Britons during the Early Middle Ages. Like the other Brythonic languages, Welsh and Cornish, it is classified as...

-French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...

).

The project has taken part in the 2009 and 2010 editions of the Google Summer of Code
Google Summer of Code
The Google Summer of Code is an annual program, first held from May to August 2005, in which Google awards stipends to hundreds of students who successfully complete a requested free or open-source software coding project during the summer...

 and the 2010 edition of the Google Code-in.

Language pairs

List of currently stable language pairs, hover over the language codes to see the languages that they represent.
Asturian (⇄)
Basque (→)
Bretonă (→)
Bulgarian (⇄)
Catalan (⇄) (→) (⇄) (←) (⇄) (⇄) (⇄)
Danish (←)
English (⇄) (⇄) (⇄) (←) (←) (⇄) (←)
Esperanto (←) (⇄) (←) (←)
French (←) (⇄) (→) (⇄)
Galician (⇄) (⇄) (⇄)
Icelandic (→)
Italian (→)
Macedonian (⇄) (→)
Norwegian (Bokmål) (⇄)
Norwegian (Nynorsk) (⇄)
Occitan (⇄) (⇄)
Portuguese (⇄) (⇄) (⇄)
Romanian (←)
Spanish (⇄) (←) (⇄) (⇄) (→) (⇄) (⇄) (⇄) (←)
Swedish (→)
Welsh (→)

See also

  • Comparison of machine translation applications
    Comparison of machine translation applications
    A machine translation application is a program which can translate text or speech from one natural language to another. Machine translation applications are essential to the modern language industry...

  • Moses
    Moses (machine translation)
    Moses is a free software statistical machine translation engine that allows automatically training translation models for any language pair given a collection of source and target text pairs...

  • OpenLogos
    OpenLogos
    OpenLogos is the Open Source version of the Logos Machine Translation System, one of the earliest and longest running commercial machine translation products in the world. It was developed by Logos Corporation in the United States, with additional development teams in Germany and Italy...

  • Machine translation
    Machine translation
    Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

  • Matxin

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK