Automatic Language Translator
Encyclopedia
IBM
's Automatic Translator was a machine translation
system that converted Russian
documents into English
. The Translator used an optical disk that stored 170,000 word-for-word and statement-for-statement translations and a custom computer to look them up at high speed. Built for the US Air Force's Foreign Technology Division, the AN/GSQ-16 (or XW-2), as it was known to the Air Force, was primarily used to convert Soviet technical documents for distribution to western scientists. The Translator was installed in 1959, dramatically upgraded in 1964, and was eventually replaced by a mainframe
running SYSTRAN
in 1970.
The system was developed by Gilbert King, chief of engineering at ITC, along with a team that included Louis Ridenour
. It evolved into a 16 inch plastic disk with data recorded as a series of microscopic black rectangles or clear spots. Only the outermost 4 inches of the disk were used for storage, which increased the linear speed of the portion being accessed. When the disk spun at 2,400 RPM it had an access speed of about 1 Mbit/sec. In total, the system stored 30 Mbits, making it the highest density online system of its era.
". Run on an IBM 704
mainframe
, the translation system knew only 250 words of Russian limited to the field of organic chemistry, and only 6 grammar rules for combining them. Nevertheless the results were extremely promising, and widely reported in the press.
At the time, most researchers in the nascent machine translation field felt that the major challenge to providing reasonable translations was building a large library, as storage devices of the era were both too small and too slow to be useful in this role. King felt that the photoscopic store was a natural solution to the problem, and pitched the idea of an automated translation system based on the photostore to the Air Force. RADC proved interested, and provided a research grant in May 1956. At the time, the Air Force also provided a grant to researchers at the University of Washington
who were working on the problem of producing an optimal translation dictionary for the project.
King advocated a simple word-for-word approach to translations. He thought that the natural redundancies in language would allow even a poor translation to be understood, and that local context was alone enough to provide reasonable guesses when faced with ambiguous terms. He stated that "the success of the human in achieving a probability of .50 in anticipating the words in a sentence is largely due to his experience and the real meanings of the words already discovered." In other words, simply translating the words alone would allow a human to effectively read a document, because they would be able to reason out the proper meaning from the context provided by earlier words.
In 1958 King moved to IBM's Thomas J. Watson Research Center
, and continued development of the photostore-based translator. Over time, King changed the approach from a pure word-for-word translator to one that stored "stems and endings", which broke words into parts that could be combined back together to form complete words again.
The first machine, "Mark I", was demonstrated in July 1959 and consisted of a 65,000 word dictionary and a custom tube-based computer to do the lookups. Texts were hand-copied onto punched card
s using custom Cyrillic terminals, and then input into the machine for translation. The results were less than impressive, but were enough to suggest that a larger and faster machine would be a reasonable development. In the meantime, the Mark I was applied to translations of the Soviet newspaper, Pravda
. The results continued to be questionable, but King declared it a success, stating in Scientific American
that the system was "...found, in an operational evaluation, to be quite useful by the Government."
, the first artificial satellite. This caused a wave of concern in the US, whose own Project Vanguard
was caught flat-footed and then proved to repeatedly fail in spectacular fashion. This embarrassing turn of events led to a huge investment in US science and technology, including the formation of DARPA, NASA
and a variety of intelligence efforts that would attempt to avoid being surprised in this fashion again.
After a short period, the intelligence efforts centralized at the Wright Patterson Air Force Base as the Foreign Technology Division (FTD, now known as the National Air and Space Intelligence Center), run by the Air Force with input from the DIA
and other organizations. FTD was tasked with the translation of Soviet and other Warsaw Bloc technical and scientific journals so researchers in the "west" could keep up to date on developments behind the Iron Curtain
. Most of these documents were publicly available, but FTD also made a number of one-off translations of other materials upon request.
Assuming there was a shortage of qualified translators, the FTD became extremely interested in King's efforts at IBM. Funding for an upgraded machine was soon forthcoming, and work began on a "Mark II" system based around a transistorized computer with a faster and higher-capacity 10 inch glass-based optical disk spinning at 2,400 RPM. Another addition was an optical character reader
provided by the third party, which they hoped would eliminate the time consuming process of copying the Russian text into machine-readable cards.
In 1960 the Washington team also joined IBM, bringing their dictionary efforts with them. The dictionary continued to expand as additional storage was made available, reaching 170,000 words and terms by the time it was installed at the FTD. A major software update was also incorporated in the Mark II, which King referred to as "dictionary stuffing". Stuffing was an attempt to deal with the problems of ambiguous words by "stuffing" prefixes onto them from earlier words in the text. These modified words would match with similarly stuffed words in the dictionary, making the matches more unique.
In 1962 King left IBM for Itek
, a military contractor in the process of rapidly acquiring new technologies. Development at IBM continued, and the system went fully operational at FTD in February 1964. The system was demonstrated at the 1964 New York World's Fair
. The version at the Fair included a 150,000 word dictionary, with about 1/3 of the words in phrases. About 3,500 of these were stored in core memory to improve performance, and an average speed of 20 words per minute was claimed. The results of the carefully selected input text was quite impressive. After its return to the FTD, it was used continually until 1970, when it was replaced by a machine running SYSTRAN
.
commissioned the United States National Academy of Sciences
(NAS) to prepare a report on the state of machine translation. The NAS formed the "Automatic Language Processing Advisory Committee", or ALPAC
, and published their findings in 1966. The report, Language and Machines: Computers in Translation and Linguistics, was highly critical of the existing efforts, demonstrating that the systems were no faster than human translations, while also demonstrating that the supposed lack of translators was in fact a surplus, and as a result of supply and demand
issues, human translation was relatively inexpensive — about $6 per 1,000 words. Worse, the FTD was slower as well; tests using physics papers as input demonstrated that the Translator was "10 percent less accurate, 21 percent slower, and had a comprehension level 29 percent lower than when he used human translation."
The ALPAC report was as influential as the Georgetown experiment had been a decade earlier; in the immediate aftermath of its publication, the US government suspended almost all funding for machine translation research. Ongoing work at IBM and Idek had ended by 1966, leaving the field to the Europeans, who continued development of systems like SYSTRAN and Logos.
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
's Automatic Translator was a machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
system that converted Russian
Russian language
Russian is a Slavic language used primarily in Russia, Belarus, Uzbekistan, Kazakhstan, Tajikistan and Kyrgyzstan. It is an unofficial but widely spoken language in Ukraine, Moldova, Latvia, Turkmenistan and Estonia and, to a lesser extent, the other countries that were once constituent republics...
documents into English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
. The Translator used an optical disk that stored 170,000 word-for-word and statement-for-statement translations and a custom computer to look them up at high speed. Built for the US Air Force's Foreign Technology Division, the AN/GSQ-16 (or XW-2), as it was known to the Air Force, was primarily used to convert Soviet technical documents for distribution to western scientists. The Translator was installed in 1959, dramatically upgraded in 1964, and was eventually replaced by a mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
running SYSTRAN
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
in 1970.
Photoscopic store
The Translator can trace its history to a June 1953 contract from the US Navy to the International Telemeter Corporation (ITC) of Los Angeles. This was not for a translation system, but a pure research and development contract for a high-performance photographic online storage medium consisting of small black rectangles embedded in a plastic disk. When the initial contract ran out, what was then the Rome Air Development Center (RADC) took up further funding in 1954 and onwards.The system was developed by Gilbert King, chief of engineering at ITC, along with a team that included Louis Ridenour
Louis Ridenour
Dr. Louis N. Ridenour was a physicist instrumental in U.S. development of radar, Vice President of Lockheed, and an advisor to President Eisenhower.- Biography and positions Held :During World War II, Ridenour worked at the MIT Radiation Laboratory....
. It evolved into a 16 inch plastic disk with data recorded as a series of microscopic black rectangles or clear spots. Only the outermost 4 inches of the disk were used for storage, which increased the linear speed of the portion being accessed. When the disk spun at 2,400 RPM it had an access speed of about 1 Mbit/sec. In total, the system stored 30 Mbits, making it the highest density online system of its era.
Mark I
In 1954 IBM gave an influential demonstration of machine translation, known today as the "Georgetown-IBM experimentGeorgetown-IBM experiment
The Georgetown-IBM experiment was an influential demonstration of machine translation, which was performed during January 7, 1954. Developed jointly by the Georgetown University and IBM, the experiment involved completely automatic translation of more than sixty Russian sentences into...
". Run on an IBM 704
IBM 704
The IBM 704, the first mass-produced computer with floating point arithmetic hardware, was introduced by IBM in 1954. The 704 was significantly improved over the IBM 701 in terms of architecture as well as implementations which were not compatible with its predecessor.Changes from the 701 included...
mainframe
Mainframe computer
Mainframes are powerful computers used primarily by corporate and governmental organizations for critical applications, bulk data processing such as census, industry and consumer statistics, enterprise resource planning, and financial transaction processing.The term originally referred to the...
, the translation system knew only 250 words of Russian limited to the field of organic chemistry, and only 6 grammar rules for combining them. Nevertheless the results were extremely promising, and widely reported in the press.
At the time, most researchers in the nascent machine translation field felt that the major challenge to providing reasonable translations was building a large library, as storage devices of the era were both too small and too slow to be useful in this role. King felt that the photoscopic store was a natural solution to the problem, and pitched the idea of an automated translation system based on the photostore to the Air Force. RADC proved interested, and provided a research grant in May 1956. At the time, the Air Force also provided a grant to researchers at the University of Washington
University of Washington
University of Washington is a public research university, founded in 1861 in Seattle, Washington, United States. The UW is the largest university in the Northwest and the oldest public university on the West Coast. The university has three campuses, with its largest campus in the University...
who were working on the problem of producing an optimal translation dictionary for the project.
King advocated a simple word-for-word approach to translations. He thought that the natural redundancies in language would allow even a poor translation to be understood, and that local context was alone enough to provide reasonable guesses when faced with ambiguous terms. He stated that "the success of the human in achieving a probability of .50 in anticipating the words in a sentence is largely due to his experience and the real meanings of the words already discovered." In other words, simply translating the words alone would allow a human to effectively read a document, because they would be able to reason out the proper meaning from the context provided by earlier words.
In 1958 King moved to IBM's Thomas J. Watson Research Center
Thomas J. Watson Research Center
The Thomas J. Watson Research Center is the headquarters for the IBM Research Division.The center is on three sites, with the main laboratory in Yorktown Heights, New York, 38 miles north of New York City, a building in Hawthorne, New York, and offices in Cambridge, Massachusetts.- Overview :The...
, and continued development of the photostore-based translator. Over time, King changed the approach from a pure word-for-word translator to one that stored "stems and endings", which broke words into parts that could be combined back together to form complete words again.
The first machine, "Mark I", was demonstrated in July 1959 and consisted of a 65,000 word dictionary and a custom tube-based computer to do the lookups. Texts were hand-copied onto punched card
Punched card
A punched card, punch card, IBM card, or Hollerith card is a piece of stiff paper that contains digital information represented by the presence or absence of holes in predefined positions...
s using custom Cyrillic terminals, and then input into the machine for translation. The results were less than impressive, but were enough to suggest that a larger and faster machine would be a reasonable development. In the meantime, the Mark I was applied to translations of the Soviet newspaper, Pravda
Pravda
Pravda was a leading newspaper of the Soviet Union and an official organ of the Central Committee of the Communist Party between 1912 and 1991....
. The results continued to be questionable, but King declared it a success, stating in Scientific American
Scientific American
Scientific American is a popular science magazine. It is notable for its long history of presenting science monthly to an educated but not necessarily scientific public, through its careful attention to the clarity of its text as well as the quality of its specially commissioned color graphics...
that the system was "...found, in an operational evaluation, to be quite useful by the Government."
Mark II
On 4 October 1957 the USSR launched Sputnik 1Sputnik 1
Sputnik 1 ) was the first artificial satellite to be put into Earth's orbit. It was launched into an elliptical low Earth orbit by the Soviet Union on 4 October 1957. The unanticipated announcement of Sputnik 1s success precipitated the Sputnik crisis in the United States and ignited the Space...
, the first artificial satellite. This caused a wave of concern in the US, whose own Project Vanguard
Project Vanguard
Project Vanguard was a program managed by the United States Naval Research Laboratory , which intended to launch the first artificial satellite into Earth orbit using a Vanguard rocket as the launch vehicle from Cape Canaveral Missile Annex, Florida....
was caught flat-footed and then proved to repeatedly fail in spectacular fashion. This embarrassing turn of events led to a huge investment in US science and technology, including the formation of DARPA, NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...
and a variety of intelligence efforts that would attempt to avoid being surprised in this fashion again.
After a short period, the intelligence efforts centralized at the Wright Patterson Air Force Base as the Foreign Technology Division (FTD, now known as the National Air and Space Intelligence Center), run by the Air Force with input from the DIA
Dia
Dia is free and open source general-purpose diagramming software, developed originally by Alexander Larsson. Dia uses a controlled single document interface similar to GIMP and Sodipodi.- Features :...
and other organizations. FTD was tasked with the translation of Soviet and other Warsaw Bloc technical and scientific journals so researchers in the "west" could keep up to date on developments behind the Iron Curtain
Iron Curtain
The concept of the Iron Curtain symbolized the ideological fighting and physical boundary dividing Europe into two separate areas from the end of World War II in 1945 until the end of the Cold War in 1989...
. Most of these documents were publicly available, but FTD also made a number of one-off translations of other materials upon request.
Assuming there was a shortage of qualified translators, the FTD became extremely interested in King's efforts at IBM. Funding for an upgraded machine was soon forthcoming, and work began on a "Mark II" system based around a transistorized computer with a faster and higher-capacity 10 inch glass-based optical disk spinning at 2,400 RPM. Another addition was an optical character reader
Optical character recognition
Optical character recognition, usually abbreviated to OCR, is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping...
provided by the third party, which they hoped would eliminate the time consuming process of copying the Russian text into machine-readable cards.
In 1960 the Washington team also joined IBM, bringing their dictionary efforts with them. The dictionary continued to expand as additional storage was made available, reaching 170,000 words and terms by the time it was installed at the FTD. A major software update was also incorporated in the Mark II, which King referred to as "dictionary stuffing". Stuffing was an attempt to deal with the problems of ambiguous words by "stuffing" prefixes onto them from earlier words in the text. These modified words would match with similarly stuffed words in the dictionary, making the matches more unique.
In 1962 King left IBM for Itek
Itek
Itek Corporation was a US defense contractor that initially specialized in the field of camera systems for spy satellites. In the early 1960s they built a conglomerate in a fashion similar to LTV or Litton, during which time they developed the first CAD system and explored optical disk technology...
, a military contractor in the process of rapidly acquiring new technologies. Development at IBM continued, and the system went fully operational at FTD in February 1964. The system was demonstrated at the 1964 New York World's Fair
1964 New York World's Fair
The 1964/1965 New York World's Fair was the third major world's fair to be held in New York City. Hailing itself as a "universal and international" exposition, the fair's theme was "Peace Through Understanding," dedicated to "Man's Achievement on a Shrinking Globe in an Expanding Universe";...
. The version at the Fair included a 150,000 word dictionary, with about 1/3 of the words in phrases. About 3,500 of these were stored in core memory to improve performance, and an average speed of 20 words per minute was claimed. The results of the carefully selected input text was quite impressive. After its return to the FTD, it was used continually until 1970, when it was replaced by a machine running SYSTRAN
SYSTRAN
SYSTRAN, founded by Dr. Peter Toma in 1968, is one of the oldest machine translation companies. SYSTRAN has done extensive work for the United States Department of Defense and the European Commission....
.
ALPAC Report
In 1964 the United States Department of DefenseUnited States Department of Defense
The United States Department of Defense is the U.S...
commissioned the United States National Academy of Sciences
United States National Academy of Sciences
The National Academy of Sciences is a corporation in the United States whose members serve pro bono as "advisers to the nation on science, engineering, and medicine." As a national academy, new members of the organization are elected annually by current members, based on their distinguished and...
(NAS) to prepare a report on the state of machine translation. The NAS formed the "Automatic Language Processing Advisory Committee", or ALPAC
ALPAC
ALPAC was a committee of seven scientists led by John R. Pierce, established in 1964 by the U. S. Government in order to evaluate the progress in computational linguistics in general and machine translation in particular...
, and published their findings in 1966. The report, Language and Machines: Computers in Translation and Linguistics, was highly critical of the existing efforts, demonstrating that the systems were no faster than human translations, while also demonstrating that the supposed lack of translators was in fact a surplus, and as a result of supply and demand
Supply and demand
Supply and demand is an economic model of price determination in a market. It concludes that in a competitive market, the unit price for a particular good will vary until it settles at a point where the quantity demanded by consumers will equal the quantity supplied by producers , resulting in an...
issues, human translation was relatively inexpensive — about $6 per 1,000 words. Worse, the FTD was slower as well; tests using physics papers as input demonstrated that the Translator was "10 percent less accurate, 21 percent slower, and had a comprehension level 29 percent lower than when he used human translation."
The ALPAC report was as influential as the Georgetown experiment had been a decade earlier; in the immediate aftermath of its publication, the US government suspended almost all funding for machine translation research. Ongoing work at IBM and Idek had ended by 1966, leaving the field to the Europeans, who continued development of systems like SYSTRAN and Logos.