Pivot language
Encyclopedia
A pivot language, sometimes also called a bridge language, is an artificial or natural language
used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translates A to the pivot language P, then from P to B. Using a pivot language avoids the combinatorial explosion of having translators across every combination of the supported languages, as the number of combinations of language is linear (), rather than quadratic – one need only know the language A and the pivot language P (and someone else the language B and the pivot P), rather than needing a different translator for every possible combination of A and B.
The disadvantage of a pivot language is that each step of retranslation introduces possible mistakes and ambiguities – using a pivot language involves two steps, rather than one. For example, when Hernán Cortés
communicated with Mesoamerican Indians, he would speak Spanish to Gerónimo de Aguilar
, who would speak Mayan to Malintzin, who would speak Nahuatl to the locals.
, French
, Russian
, and Arabic
are often used as pivot languages. Interlingua
has been used as a pivot language in international conferences and has been proposed as a pivot language for the European Union
. Esperanto
was proposed as a pivot language in the Distributed Language Translation
project and has been used in this way in the Majstro Tradukvortaro at the Esperanto website Majstro.com. The Universal Networking Language
is an artificial language specifically designed for use as a pivot language.
, XML
and high level languages are pivot codings of computer data which are then often rendered into internal binary formats for particular computer systems.
Unicode
was designed to be usable as a pivot coding between various major existing character encodings, though its widespread adoption as a coding in its own right has made this usage unimportant.
) systems use parallel corpora for source (s) and target (t) languages to achieve their good results, but good parallel corpora are not available for all languages. A pivot language (p) enables the bridge between two languages, to which existing parallel corpora are entirely or partially not yet at hand.
Pivot translation can be problematic because of the potential lack of fidelity of the information forwarded in the use of different corpora. From the use of two bilingual corpora (s-p & p-t) to set up the s-t bridge, linguistic data are inevitably lost. Rule-based machine translation (RBMT) helps the system rescue this information, so that the system does not rely entirely on statistics but also on structural linguistic information.
Three basic techniques are used to employ pivot language in MT: (1) triangulation, which focuses on phrase paralleling between source and pivot (s-p) and between pivot and target (p-t); (2) transfer
, which translates the whole sentence of the source language to a pivot language and then to the target language; and (3) synthesis, which builds a corpus of its own for system training.
The triangulation method (also called phrase table multiplication) calculates the probability of both translation correspondences and lexical weight in s-p and p-t, to try to induce a new s-t phrase table. The transfer method (also called sentence translation strategy) simply carries a straightforward translation of s into p and then another translation of p into t without using probabilistic tests (as in triangulation). The synthetic method uses an existing corpus of s and tries to build an own synthetic corpus out of it that is used by the system to train itself. Then a bilingual s-p corpus is synthesized to enable a p-t translation.
A direct comparison between triangulation and transfer methods for SMT systems has shown that triangulation achieves much better results than transfer.
All three pivot language techniques enhance the performance of SMT systems. However, the synthetic technique doesn't work well with RBMT, and systems' performances are lower than expected. Hybrid SMT/RBMT systems achieve better translation quality than strict-SMT systems that rely on bad parallel corpora.
The key role of RBMT systems is that they help fill the gap left in the translation process of s-p → p-t, in the sense that these parallels are included in the SMT model for s-t.
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translates A to the pivot language P, then from P to B. Using a pivot language avoids the combinatorial explosion of having translators across every combination of the supported languages, as the number of combinations of language is linear (), rather than quadratic – one need only know the language A and the pivot language P (and someone else the language B and the pivot P), rather than needing a different translator for every possible combination of A and B.
The disadvantage of a pivot language is that each step of retranslation introduces possible mistakes and ambiguities – using a pivot language involves two steps, rather than one. For example, when Hernán Cortés
Hernán Cortés
Hernán Cortés de Monroy y Pizarro, 1st Marquis of the Valley of Oaxaca was a Spanish Conquistador who led an expedition that caused the fall of the Aztec Empire and brought large portions of mainland Mexico under the rule of the King of Castile in the early 16th century...
communicated with Mesoamerican Indians, he would speak Spanish to Gerónimo de Aguilar
Gerónimo de Aguilar
Gerónimo de Aguilar O.F.M. was a Franciscan friar born in Écija, Spain. Aguilar was later involved with the 1519 Spanish conquest of Mexico, and with La Malinche he assisted Hernán Cortés in translating indigenous language to Spanish....
, who would speak Mayan to Malintzin, who would speak Nahuatl to the locals.
Examples
EnglishEnglish language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
, French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
, Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
, and Arabic
Arabic language
Arabic is a name applied to the descendants of the Classical Arabic language of the 6th century AD, used most prominently in the Quran, the Islamic Holy Book...
are often used as pivot languages. Interlingua
Interlingua
Interlingua is an international auxiliary language , developed between 1937 and 1951 by the International Auxiliary Language Association...
has been used as a pivot language in international conferences and has been proposed as a pivot language for the European Union
European Union
The European Union is an economic and political union of 27 independent member states which are located primarily in Europe. The EU traces its origins from the European Coal and Steel Community and the European Economic Community , formed by six countries in 1958...
. Esperanto
Esperanto
is the most widely spoken constructed international auxiliary language. Its name derives from Doktoro Esperanto , the pseudonym under which L. L. Zamenhof published the first book detailing Esperanto, the Unua Libro, in 1887...
was proposed as a pivot language in the Distributed Language Translation
Distributed Language Translation
Distributed Language Translation or Distribuita Lingvo-Tradukado was a project to develop an interlingual machine translation system for twelve European languages...
project and has been used in this way in the Majstro Tradukvortaro at the Esperanto website Majstro.com. The Universal Networking Language
Universal Networking Language
Universal Networking Language is a declarative formal language specifically designed to represent semantic data extracted from natural language texts...
is an artificial language specifically designed for use as a pivot language.
In computing
Pivot coding is also a common method of translating data for computer systems. For example, the internet protocolInternet Protocol
The Internet Protocol is the principal communications protocol used for relaying datagrams across an internetwork using the Internet Protocol Suite...
, XML
XML
Extensible Markup Language is a set of rules for encoding documents in machine-readable form. It is defined in the XML 1.0 Specification produced by the W3C, and several other related specifications, all gratis open standards....
and high level languages are pivot codings of computer data which are then often rendered into internal binary formats for particular computer systems.
Unicode
Unicode
Unicode is a computing industry standard for the consistent encoding, representation and handling of text expressed in most of the world's writing systems...
was designed to be usable as a pivot coding between various major existing character encodings, though its widespread adoption as a coding in its own right has made this usage unimportant.
In machine translation (MT)
Current statistical machine translation (SMTStatistical machine translation
Statistical machine translation is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora...
) systems use parallel corpora for source (s) and target (t) languages to achieve their good results, but good parallel corpora are not available for all languages. A pivot language (p) enables the bridge between two languages, to which existing parallel corpora are entirely or partially not yet at hand.
Pivot translation can be problematic because of the potential lack of fidelity of the information forwarded in the use of different corpora. From the use of two bilingual corpora (s-p & p-t) to set up the s-t bridge, linguistic data are inevitably lost. Rule-based machine translation (RBMT) helps the system rescue this information, so that the system does not rely entirely on statistics but also on structural linguistic information.
Three basic techniques are used to employ pivot language in MT: (1) triangulation, which focuses on phrase paralleling between source and pivot (s-p) and between pivot and target (p-t); (2) transfer
Transfer-based machine translation
Transfer-based machine translation is a type of machine translation. It is based on the idea of interlingua and is currently one of the most widely used methods of machine translation-Overview:...
, which translates the whole sentence of the source language to a pivot language and then to the target language; and (3) synthesis, which builds a corpus of its own for system training.
The triangulation method (also called phrase table multiplication) calculates the probability of both translation correspondences and lexical weight in s-p and p-t, to try to induce a new s-t phrase table. The transfer method (also called sentence translation strategy) simply carries a straightforward translation of s into p and then another translation of p into t without using probabilistic tests (as in triangulation). The synthetic method uses an existing corpus of s and tries to build an own synthetic corpus out of it that is used by the system to train itself. Then a bilingual s-p corpus is synthesized to enable a p-t translation.
A direct comparison between triangulation and transfer methods for SMT systems has shown that triangulation achieves much better results than transfer.
All three pivot language techniques enhance the performance of SMT systems. However, the synthetic technique doesn't work well with RBMT, and systems' performances are lower than expected. Hybrid SMT/RBMT systems achieve better translation quality than strict-SMT systems that rely on bad parallel corpora.
The key role of RBMT systems is that they help fill the gap left in the translation process of s-p → p-t, in the sense that these parallels are included in the SMT model for s-t.