Q-systems
Encyclopedia
Q-Systems are a method of directed graph transformations according to given grammar
rules, developed at the Université de Montréal
by Alain Colmerauer
in 1967--70 for use in natural language processing
. The Université de Montréal's machine translation
system, TAUM-73
, used the Q-Systems as its language formalism.
The data structure manipulated by a Q-system is a Q-graph, which is a directed acyclic graph with one entry node and one exit node, where each arc bears a labelled ordered tree. An input sentence is usually represented by a linear Q-graph where each arc bears a word (tree reduced to one node labelled by this word). After analysis, the Q-graph is usually a bundle of 1-arc paths, each arc bearing a possible analysis tree. After generation, the goal is usually to produce as many paths as desired outputs, with again one word per arc.
A Q-System consists of a sequence of Q-treatments, each being a set of Q-rules, of the form
" must appear in the part. Variables are local to rules.
A Q-treatment works in 2 steps, addition and cleaning. It first applies all its rules exhaustively, using instantiation
(one-way unification), thereby adding new paths to the current Q-graph (added arcs and their trees can be used to produce new paths). If and when this addition process halts, all arcs used in some successful rule application are erased, as well as all unused arcs that are no more on any path from the entry node to the exit node. Hence, the result, if any (if the addition step terminates), is again a Q-graph. That allows several Q-Systems to be chained, each of them performing a specialized task, together forming a complex system. For example, TAUM 73 consisted of fifteen chained Q-Systems.
An extension of the basic idea of the Q-Systems, namely to replace instantiation by unification
(to put it simply, allow "new" variables in the right hand side part of a rule, and replace parametrized labelled trees by logical terms) led to Prolog
, designed by Alain Colmerauer
and Philippe Roussel in 1972. Refinements in the other direction (reducing non-determinism and introducing typed labels) by John Chandioux led to GramR, used for programming METEO from 1985 onward.
In 2009, Hong Thai Nguyen of GETALP GETALP, LIG (Laboratoire d'Informatique de Grenoble) reimplemented the Q-language in C, using ANTLR to compile the Q-systems and the Q-graphs, and an algorithm proposed by Christian Boitet (as none had been published and sources of the previous Fortran implementation had been lost). That implementation was corrected, completed and extended (to labels using Unicode characters and not only the printable characters of the CDC6600 of the historical version) by David Cattanéo in 2010-11.
, known as TAUM-METEO. As many authors confuse the prototype with the actual system, a bit of history is in order.
The initial motivation to develop that prototype was that a junior translator came to TAUM to ask for help in doing the extremely boring (and at the same time difficult) job of translating weather bulletins at Environment Canada he had to do at that time.
Indeed, since all official communications emanating from the Canadian government must be available in French and English, because of the official bilingual services act of 1968, and weather bulletins represent a large amount of translation in real time, junior translators had to spend several months of purgatory producing first draft translations, then revised by seniors. That was in fact a quite difficult job, because of the specificities of the English and French sublanguages used, and not very motivating, as the lifetime of a bulletin is only 4 hours.
TAUM proposed to build a prototype MT system, and Environment Canada accepted to fund the project. A prototype was ready after a few months, with a crude integration in the workflow of translation (source and target bulletins travelled over telex lines at the time and MT was performed on a mainframe). The first version of the system (METEO 1) went into operation on a Control Data 7600 supercomputer in March 1977.
John Chandioux then left the TAUM group to manage its operation and improve it, while the TAUM group embarked on a very different project (TAUM-aviation, 1977–81). With Benoit Thouin and one year later alone, he made lots of improvements to the initial prototype, and transformed it into a really operational system. After 3 years, METEO 1 had demonstrated the feasibility of microcomputer-based machine translation to the satisfaction of the Canadian government's Translation Bureau.
METEO 1 was formally adopted in 1981, replacing the junior translators in the workflow. Because of the need for very high quality, the revision step, done by senior translators, was maintained. The quality, measured as the percentage of edit operations (inserting or deleting a word counts as 1, replacing as 2) on the MT results, reached 85% in 1985.
Until that time, the MT part was still implemented as a sequence of Q-systems. The [Q-systems] formalism is a rule-based SLLP (Specialized Language for Linguistic Programming) invented by Alain Colmerauer in 1967 as he was a postdoc coopérant at the TAUM group. (He invented the famous Prolog language in 1972 after returning to France and becoming a university professor in Marseille-Luminy.)
As the engine of the Q-systems is highly non-deterministic, and the manipulated data structures are in some way too simple, without any types such as string or number, J. Chandioux encountered limitations in his efforts to raise translation quality and lower computation time to the point he could run it on microcomputers.
In 1981, he decided to create a new SLLP, or metalanguage for linguistic applications, based on the same basic algorithmic ideas as the Q-systems, but more deterministic, and offering typed labels on tree nodes. Following the advices of Bernard Vauquois and Alain Colmerauer, he created GramR, and developed it for microcomputers.
In 1982, he could start developing in GramR a new system for translating the weather bulletins on a high-end Cromemco
microcomputer. METEO 2 went into operation in 1983. The software then ran in 48Kb of central memory with a 5Mb hard disk for paging. METEO 2 is believed to have been the first MT application to run on a microcomputer.
In 1985, the system had nothing left of the initial prototype, and was officially renamed METEO
. It translated about 20 M words per year from English into French, and 10 M words from French into English, with a quality of 97%. Typically, it took only 4 minutes for a bulletin in English to be sent from Winnipeg and come back in French after MT and human revision.
In 1996, John Chandioux developed a special version of his system (METEO 96) which was used to translate the weather forecasts (different kinds of bulletins) issued by the US Weather Service during the Atlanta Olympic Games.
The latest known version of the system, METEO 5, dates from 1997 and ran on a standard IBM PC network under Windows NT. It translated 10 pages per second, while occupying so little space that it fitted on a 1.44Mb diskette. It seems that Chandioux lost his contract with Environment Canada in 2001 or 2002 to a competitor. The hot legal debate which followed was once described in Wikipedia, but that article seems to have disappeared. No document could be found on Environment Canada website on which system is currently in use (in 2011).
Grammar
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...
rules, developed at the Université de Montréal
Université de Montréal
The Université de Montréal is a public francophone research university in Montreal, Quebec, Canada. It comprises thirteen faculties, more than sixty departments and two affiliated schools: the École Polytechnique and HEC Montréal...
by Alain Colmerauer
Alain Colmerauer
Alain Colmerauer is a French computer scientist.After completing his Ph.D. at the University of Grenoble, he spent 1967–1970 as Assistant Professor at the University of Montreal, where he created Q-Systems, one of the earliest linguistic formalisms used in the development of the TAUM-METEO machine...
in 1967--70 for use in natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
. The Université de Montréal's machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
system, TAUM-73
TAUM system
TAUM is the name of a research group which was set up at the Université de Montréal in 1965...
, used the Q-Systems as its language formalism.
The data structure manipulated by a Q-system is a Q-graph, which is a directed acyclic graph with one entry node and one exit node, where each arc bears a labelled ordered tree. An input sentence is usually represented by a linear Q-graph where each arc bears a word (tree reduced to one node labelled by this word). After analysis, the Q-graph is usually a bundle of 1-arc paths, each arc bearing a possible analysis tree. After generation, the goal is usually to produce as many paths as desired outputs, with again one word per arc.
A Q-System consists of a sequence of Q-treatments, each being a set of Q-rules, of the form
[]. The Q-treatments are applied in sequence, unless one of them produces the empty Q-graph, in which case the result is the last Q-graph obtained. The 3 parts of a rule can contain variables for labels, trees, and forests. All variables after "
" must appear in the A Q-treatment works in 2 steps, addition and cleaning. It first applies all its rules exhaustively, using instantiation
Instantiation
Instantiation or instance*Philosophy:*A modern concept similar to participation in classical Platonism, see the Theory of Forms.*The principle of instantiation, the idea that in order for a property to exist, it must be had by some object or substance.*Universal instantiation and existential...
(one-way unification), thereby adding new paths to the current Q-graph (added arcs and their trees can be used to produce new paths). If and when this addition process halts, all arcs used in some successful rule application are erased, as well as all unused arcs that are no more on any path from the entry node to the exit node. Hence, the result, if any (if the addition step terminates), is again a Q-graph. That allows several Q-Systems to be chained, each of them performing a specialized task, together forming a complex system. For example, TAUM 73 consisted of fifteen chained Q-Systems.
An extension of the basic idea of the Q-Systems, namely to replace instantiation by unification
Unification
Unification, in computer science and logic, is an algorithmic process by which one attempts to solve the satisfiability problem. The goal of unification is to find a substitution which demonstrates that two seemingly different terms are in fact either identical or just equal...
(to put it simply, allow "new" variables in the right hand side part of a rule, and replace parametrized labelled trees by logical terms) led to Prolog
Prolog
Prolog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...
, designed by Alain Colmerauer
Alain Colmerauer
Alain Colmerauer is a French computer scientist.After completing his Ph.D. at the University of Grenoble, he spent 1967–1970 as Assistant Professor at the University of Montreal, where he created Q-Systems, one of the earliest linguistic formalisms used in the development of the TAUM-METEO machine...
and Philippe Roussel in 1972. Refinements in the other direction (reducing non-determinism and introducing typed labels) by John Chandioux led to GramR, used for programming METEO from 1985 onward.
In 2009, Hong Thai Nguyen of GETALP GETALP, LIG (Laboratoire d'Informatique de Grenoble) reimplemented the Q-language in C, using ANTLR to compile the Q-systems and the Q-graphs, and an algorithm proposed by Christian Boitet (as none had been published and sources of the previous Fortran implementation had been lost). That implementation was corrected, completed and extended (to labels using Unicode characters and not only the printable characters of the CDC6600 of the historical version) by David Cattanéo in 2010-11.
History
The METEO System is a Very High Quality Machine Translation system for weather bulletins that has been in operational use at Envinonnement Canada from 1982 to 2001. It stems from a prototype developed in 1975-76 by the TAUM GroupTAUM system
TAUM is the name of a research group which was set up at the Université de Montréal in 1965...
, known as TAUM-METEO. As many authors confuse the prototype with the actual system, a bit of history is in order.
The initial motivation to develop that prototype was that a junior translator came to TAUM to ask for help in doing the extremely boring (and at the same time difficult) job of translating weather bulletins at Environment Canada he had to do at that time.
Indeed, since all official communications emanating from the Canadian government must be available in French and English, because of the official bilingual services act of 1968, and weather bulletins represent a large amount of translation in real time, junior translators had to spend several months of purgatory producing first draft translations, then revised by seniors. That was in fact a quite difficult job, because of the specificities of the English and French sublanguages used, and not very motivating, as the lifetime of a bulletin is only 4 hours.
TAUM proposed to build a prototype MT system, and Environment Canada accepted to fund the project. A prototype was ready after a few months, with a crude integration in the workflow of translation (source and target bulletins travelled over telex lines at the time and MT was performed on a mainframe). The first version of the system (METEO 1) went into operation on a Control Data 7600 supercomputer in March 1977.
John Chandioux then left the TAUM group to manage its operation and improve it, while the TAUM group embarked on a very different project (TAUM-aviation, 1977–81). With Benoit Thouin and one year later alone, he made lots of improvements to the initial prototype, and transformed it into a really operational system. After 3 years, METEO 1 had demonstrated the feasibility of microcomputer-based machine translation to the satisfaction of the Canadian government's Translation Bureau.
METEO 1 was formally adopted in 1981, replacing the junior translators in the workflow. Because of the need for very high quality, the revision step, done by senior translators, was maintained. The quality, measured as the percentage of edit operations (inserting or deleting a word counts as 1, replacing as 2) on the MT results, reached 85% in 1985.
Until that time, the MT part was still implemented as a sequence of Q-systems. The [Q-systems] formalism is a rule-based SLLP (Specialized Language for Linguistic Programming) invented by Alain Colmerauer in 1967 as he was a postdoc coopérant at the TAUM group. (He invented the famous Prolog language in 1972 after returning to France and becoming a university professor in Marseille-Luminy.)
As the engine of the Q-systems is highly non-deterministic, and the manipulated data structures are in some way too simple, without any types such as string or number, J. Chandioux encountered limitations in his efforts to raise translation quality and lower computation time to the point he could run it on microcomputers.
In 1981, he decided to create a new SLLP, or metalanguage for linguistic applications, based on the same basic algorithmic ideas as the Q-systems, but more deterministic, and offering typed labels on tree nodes. Following the advices of Bernard Vauquois and Alain Colmerauer, he created GramR, and developed it for microcomputers.
In 1982, he could start developing in GramR a new system for translating the weather bulletins on a high-end Cromemco
Cromemco
Cromemco was a Mountain View, California microcomputer company known for its high-end Z80-based S-100 bus computers in the early days of the home computer revolution. The Cromemco Dazzler was the first color graphics card available for personal computers....
microcomputer. METEO 2 went into operation in 1983. The software then ran in 48Kb of central memory with a 5Mb hard disk for paging. METEO 2 is believed to have been the first MT application to run on a microcomputer.
In 1985, the system had nothing left of the initial prototype, and was officially renamed METEO
Meteo
Meteo may refer to:*The spelling, without accents, of Météo*Meteo is an asteroid belt in the Star Fox series of video games*Meteo is also a prominent magic spell in some of the Final Fantasy video games, most notably Final Fantasy IV and Final Fantasy VII, where it has plot significance; the name...
. It translated about 20 M words per year from English into French, and 10 M words from French into English, with a quality of 97%. Typically, it took only 4 minutes for a bulletin in English to be sent from Winnipeg and come back in French after MT and human revision.
In 1996, John Chandioux developed a special version of his system (METEO 96) which was used to translate the weather forecasts (different kinds of bulletins) issued by the US Weather Service during the Atlanta Olympic Games.
The latest known version of the system, METEO 5, dates from 1997 and ran on a standard IBM PC network under Windows NT. It translated 10 pages per second, while occupying so little space that it fitted on a 1.44Mb diskette. It seems that Chandioux lost his contract with Environment Canada in 2001 or 2002 to a competitor. The hot legal debate which followed was once described in Wikipedia, but that article seems to have disappeared. No document could be found on Environment Canada website on which system is currently in use (in 2011).
External links
- The birth of Prolog (PDF) (+mirror) — describes the Q-systems in Part II
- http://unldeco.imag.fr/unldeco/SystemsQ.po?localhost=/home/nguyenht/SYS-Q/MONITEUR/] new Q-systems demonstration