Interlingual machine translation
Encyclopedia
Interlingual machine translation is one of the classic approaches to machine translation
. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. Within the rule-based machine translation paradigm, the interlingual approach is an alternative to the direct approach
and the transfer approach
.
In the direct approach, words are translated directly without passing through an additional representation. In the transfer approach the source language is transformed into an abstract, less language-specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated.
The interlingual approach to machine translation has advantages and disadvantages. The advantages are that it requires fewer components in order to relate each source language to each target language, it takes fewer components to add a new language, it supports paraphrases of the input in the original language, it allows both the analysers and generators to be written by monolingual system developers, and it handles languages that are very different from each other (e.g. English and Arabic). The obvious disadvantage is that the definition of an interlingua is difficult and maybe even impossible for a wider domain. The ideal context for interlingual machine translation is thus multilingual machine translation in a very specific domain.
, Athanasius Kircher
and Johann Joachim Becher worked on developing an unambiguous universal language based on the principles logic
and iconographs. In 1668, John Wilkins
described his interlingua in his "Essay towards a Real Character and a Philosophical Language". In the 18th and 19th centuries many proposals for "universal" international languages were developed, the most well known being Esperanto
.
That said, applying the idea of a universal language to machine translation did not appear in any of the first significant approaches. Instead, work started on pairs of languages. However, during the 1950s and 60s, researchers in Cambridge
headed by Margaret Masterman
, in Leningrad
headed by Nikolai Andreev and in Milan
by Silvio Ceccato
started work in this area. The idea was discussed extensively by the Israeli philosopher Yehoshua Bar-Hillel
in 1969.
During the 1970s, noteworthy research was done in Grenoble
by researchers attempting to translate physics and mathematical texts from Russian
to French
, and in Texas
a similar project (METAL
) was ongoing for Russian to English
. Early interlingual MT systems were also built at Stanford in the 1970s by Roger Schank
and Yorick Wilks
; the former became the basis of a commercial system for the transfer of funds, and the latter's code is preserved at The Computer Museum
at Boston
as the first interlingual machine translation system.
In the 1980s, renewed relevance was given to interlingua-based, and knowledge-based approaches to machine translation in general, with much research going on in the field. The uniting factor in this research was that high-quality translation required abandoning the idea of requiring total comprehension of the text. Instead, the translation should be based on linguistic knowledge and the specific domain in which the system would be used. The most important research of this era was done in distributed language translation
(DLT) in Utrecht
, which worked with a modified version of Esperanto
, and the Fujitsu system in Japan.
Sometimes two interlinguas are used in translation. It is possible that one of the two covers more of the characteristics of the source language, and the other possess more of the characteristics of the target language. The translation then proceeds by converting sentences from the first language into sentences closer to the target language through two stages. The system may also be set up such that the second interlingua uses a more specific vocabulary that is closer, or more aligned with the target language, and this could improve the translation quality.
The above-mentioned system is based on the idea of using linguistic proximity to improve the translation quality from a text in one original language to many other structurally similar languages from only one original analysis. This principle is also used in pivot machine translation, where a natural language
is used as a "bridge" between two more distant languages. For example in the case of translating to English
from Ukrainian
using Russian
as an intermediate language.
(from 1987 in Japan and the research at the universities of Southern California and Carnegie Mellon). The first type of system corresponds to that outlined in Figure 1. while the other types would be approximated by the diagram in Figure 4.
The following resources are necessary to an interlingual machine translation system:
One of the problems of knowledge-based machine translation systems is that it becomes impossible to create databases for domains larger than very specific areas. Another is that processing these databases is very computationally expensive.
The main disadvantage of this strategy is the difficulty of creating an adequate interlingua. It should be both abstract and independent of the source and target languages. The more languages added to the translation system, and the more different they are, the more potent the interlingua must be to express all possible translation directions. Another problem is that it is difficult to extract meaning from texts in the original languages to create the intermediate representation.
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. Within the rule-based machine translation paradigm, the interlingual approach is an alternative to the direct approach
Dictionary-based machine translation
Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does – word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or without morphological analysis or lemmatisation...
and the transfer approach
Transfer-based machine translation
Transfer-based machine translation is a type of machine translation. It is based on the idea of interlingua and is currently one of the most widely used methods of machine translation-Overview:...
.
In the direct approach, words are translated directly without passing through an additional representation. In the transfer approach the source language is transformed into an abstract, less language-specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated.
The interlingual approach to machine translation has advantages and disadvantages. The advantages are that it requires fewer components in order to relate each source language to each target language, it takes fewer components to add a new language, it supports paraphrases of the input in the original language, it allows both the analysers and generators to be written by monolingual system developers, and it handles languages that are very different from each other (e.g. English and Arabic). The obvious disadvantage is that the definition of an interlingua is difficult and maybe even impossible for a wider domain. The ideal context for interlingual machine translation is thus multilingual machine translation in a very specific domain.
History
The first ideas about interlingual machine translation appeared in the 17th century with Descartes and Leibniz, who came up with theories of how to create dictionaries using universal numerical codes. Others, such as Cave BeckCave Beck
Cave Beck was an English schoolmaster and clergyman, the author of The Universal Character in which he proposed a universal language based on a numerical system.-Life:...
, Athanasius Kircher
Athanasius Kircher
Athanasius Kircher was a 17th century German Jesuit scholar who published around 40 works, most notably in the fields of oriental studies, geology, and medicine...
and Johann Joachim Becher worked on developing an unambiguous universal language based on the principles logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
and iconographs. In 1668, John Wilkins
John Wilkins
John Wilkins FRS was an English clergyman, natural philosopher and author, as well as a founder of the Invisible College and one of the founders of the Royal Society, and Bishop of Chester from 1668 until his death....
described his interlingua in his "Essay towards a Real Character and a Philosophical Language". In the 18th and 19th centuries many proposals for "universal" international languages were developed, the most well known being Esperanto
Esperanto
is the most widely spoken constructed international auxiliary language. Its name derives from Doktoro Esperanto , the pseudonym under which L. L. Zamenhof published the first book detailing Esperanto, the Unua Libro, in 1887...
.
That said, applying the idea of a universal language to machine translation did not appear in any of the first significant approaches. Instead, work started on pairs of languages. However, during the 1950s and 60s, researchers in Cambridge
Cambridge
The city of Cambridge is a university town and the administrative centre of the county of Cambridgeshire, England. It lies in East Anglia about north of London. Cambridge is at the heart of the high-technology centre known as Silicon Fen – a play on Silicon Valley and the fens surrounding the...
headed by Margaret Masterman
Margaret Masterman
Margaret Masterman was a British linguist and philosopher, most known for her pioneering work in the field of computational linguistics and especially machine translation.- Biography :...
, in Leningrad
Leningrad
Leningrad is the former name of Saint Petersburg, Russia.Leningrad may also refer to:- Places :* Leningrad Oblast, a federal subject of Russia, around Saint Petersburg* Leningrad, Tajikistan, capital of Muminobod district in Khatlon Province...
headed by Nikolai Andreev and in Milan
Milan
Milan is the second-largest city in Italy and the capital city of the region of Lombardy and of the province of Milan. The city proper has a population of about 1.3 million, while its urban area, roughly coinciding with its administrative province and the bordering Province of Monza and Brianza ,...
by Silvio Ceccato
Silvio Ceccato
Silvio Ceccato was an Italian philosopher and linguist.Born in Montecchio Maggiore, he studied law and music. In 1949 he founded the international magazine Methodos, which was published until 1964....
started work in this area. The idea was discussed extensively by the Israeli philosopher Yehoshua Bar-Hillel
Yehoshua Bar-Hillel
Yehoshua Bar-Hillel was an Israeli philosopher, mathematician, and linguist at the Hebrew University of Jerusalem, best known for his pioneering work in machine translation and formal linguistics.- Biography :...
in 1969.
During the 1970s, noteworthy research was done in Grenoble
Grenoble
Grenoble is a city in southeastern France, at the foot of the French Alps where the river Drac joins the Isère. Located in the Rhône-Alpes region, Grenoble is the capital of the department of Isère...
by researchers attempting to translate physics and mathematical texts from Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
to French
French language
French is a Romance language spoken as a first language in France, the Romandy region in Switzerland, Wallonia and Brussels in Belgium, Monaco, the regions of Quebec and Acadia in Canada, and by various communities elsewhere. Second-language speakers of French are distributed throughout many parts...
, and in Texas
Texas
Texas is the second largest U.S. state by both area and population, and the largest state by area in the contiguous United States.The name, based on the Caddo word "Tejas" meaning "friends" or "allies", was applied by the Spanish to the Caddo themselves and to the region of their settlement in...
a similar project (METAL
Metal
A metal , is an element, compound, or alloy that is a good conductor of both electricity and heat. Metals are usually malleable and shiny, that is they reflect most of incident light...
) was ongoing for Russian to English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
. Early interlingual MT systems were also built at Stanford in the 1970s by Roger Schank
Roger Schank
Roger Schank is an American artificial intelligence theorist, cognitive psychologist, learning scientist, educational reformer, and entrepreneur.-Academic career:...
and Yorick Wilks
Yorick Wilks
Yorick Wilks FBCS is a British Computer Scientist who is Professor of Artificial Intelligence at the University of Sheffield, a Senior Research Fellow at the Oxford Internet Institute, and a Senior Scientist at the Florida Institute for Human and Machine Cognition.__FORCETOC__- Biography :Wilks...
; the former became the basis of a commercial system for the transfer of funds, and the latter's code is preserved at The Computer Museum
The Computer Museum, Boston
The Computer Museum was a Boston, Massachusetts museum that opened in 1979 and operated in three different locations until 1999. It was once referred to as TCM and is sometimes called the Boston Computer Museum....
at Boston
Boston
Boston is the capital of and largest city in Massachusetts, and is one of the oldest cities in the United States. The largest city in New England, Boston is regarded as the unofficial "Capital of New England" for its economic and cultural impact on the entire New England region. The city proper had...
as the first interlingual machine translation system.
In the 1980s, renewed relevance was given to interlingua-based, and knowledge-based approaches to machine translation in general, with much research going on in the field. The uniting factor in this research was that high-quality translation required abandoning the idea of requiring total comprehension of the text. Instead, the translation should be based on linguistic knowledge and the specific domain in which the system would be used. The most important research of this era was done in distributed language translation
Distributed Language Translation
Distributed Language Translation or Distribuita Lingvo-Tradukado was a project to develop an interlingual machine translation system for twelve European languages...
(DLT) in Utrecht
Utrecht (city)
Utrecht city and municipality is the capital and most populous city of the Dutch province of Utrecht. It is located in the eastern corner of the Randstad conurbation, and is the fourth largest city of the Netherlands with a population of 312,634 on 1 Jan 2011.Utrecht's ancient city centre features...
, which worked with a modified version of Esperanto
Esperanto
is the most widely spoken constructed international auxiliary language. Its name derives from Doktoro Esperanto , the pseudonym under which L. L. Zamenhof published the first book detailing Esperanto, the Unua Libro, in 1887...
, and the Fujitsu system in Japan.
Outline
In this method of translation, the interlingua can be thought of as a way of describing the analysis of a text written in a source language such that it is possible to convert its morphological, syntactic, semantic (and even pragmatic) characteristics, that is "meaning" into a target language. This interlingua is able to describe all of the characteristics of all of the languages which are to be translated, instead of simply translating from one language to another.Sometimes two interlinguas are used in translation. It is possible that one of the two covers more of the characteristics of the source language, and the other possess more of the characteristics of the target language. The translation then proceeds by converting sentences from the first language into sentences closer to the target language through two stages. The system may also be set up such that the second interlingua uses a more specific vocabulary that is closer, or more aligned with the target language, and this could improve the translation quality.
The above-mentioned system is based on the idea of using linguistic proximity to improve the translation quality from a text in one original language to many other structurally similar languages from only one original analysis. This principle is also used in pivot machine translation, where a natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
is used as a "bridge" between two more distant languages. For example in the case of translating to English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
from Ukrainian
Ukrainian language
Ukrainian is a language of the East Slavic subgroup of the Slavic languages. It is the official state language of Ukraine. Written Ukrainian uses a variant of the Cyrillic alphabet....
using Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
as an intermediate language.
Translation process
In interlingual machine translation systems, there are two monolingual components: the analysis of the source language and the interlingual, and the generation of the interlingua and the target language. It is however necessary to distinguish between interlingual systems using only syntactic methods (for example the systems developed in the 1970s at the universities of Grenoble and Texas) and those based on artificial intelligenceArtificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
(from 1987 in Japan and the research at the universities of Southern California and Carnegie Mellon). The first type of system corresponds to that outlined in Figure 1. while the other types would be approximated by the diagram in Figure 4.
The following resources are necessary to an interlingual machine translation system:
- Dictionaries (or lexicons) for analysis and generation (specific to the domain and the languages involved).
- A conceptual lexicon (specific to the domain), which is the knowledge baseKnowledge baseA knowledge base is a special kind of database for knowledge management. A Knowledge Base provides a means for information to be collected, organised, shared, searched and utilised.-Types:...
about events and entities known in the domain. - A set of projection rules (specific to the domain and the languages).
- Grammars for the analysis and generation of the languages involved.
One of the problems of knowledge-based machine translation systems is that it becomes impossible to create databases for domains larger than very specific areas. Another is that processing these databases is very computationally expensive.
Efficacy
One of the main advantages of this strategy is that it provides an economical way to make multilingual translation systems. With an interlingua it becomes unnecessary to make a translation pair between each pair of languages in the system. So instead of creating language pairs, where is the number of languages in the system, it is only necessary to make pairs between the languages and the interlingua.The main disadvantage of this strategy is the difficulty of creating an adequate interlingua. It should be both abstract and independent of the source and target languages. The more languages added to the translation system, and the more different they are, the more potent the interlingua must be to express all possible translation directions. Another problem is that it is difficult to extract meaning from texts in the original languages to create the intermediate representation.
See also
- Pivot languagePivot languageA pivot language, sometimes also called a bridge language, is an artificial or natural language used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translates A to the pivot language P, then from P to B...
- Universal Networking LanguageUniversal Networking LanguageUniversal Networking Language is a declarative formal language specifically designed to represent semantic data extracted from natural language texts...
- Knowledge representation and reasoning
External links
- Interlingua Methods
- Slides
- [ftp://ftp.umiacs.umd.edu/pub/bonnie/Interlingual-MT-Dorr-Hovy-Levin.pdf Paper]