Van Wijngaarden grammar
Encyclopedia
In computer science
, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a two-level grammar
which provides a technique to define potentially infinite context-free grammar
s in a finite number of rules. The formalism was invented by Adriaan van Wijngaarden
to define rigorously some syntactic restrictions which previously had to be formulated in natural language
, despite their essentially syntactical content. Typical applications are the treatment of gender
and number
in natural language
syntax and the well-definedness of identifiers in programming language
s.
The technique was used and developed in the definition of the programming language
ALGOL 68
. It is an example of the larger class of affix grammar
s.
-rules, which are used to derive (possibly infinitely many) production rules from a finite set of hyper-rules. Meta-rules are restricted to those defined by a context-free grammar
. Hyper-rules restrict the admissible contexts at the upper level. Essentially, the consistent substitution used in the derivation process is equivalent to unification, as in Prolog
, as was noted by Alain Colmerauer
.
the language ALGOL 60
was formalised using the context-free Backus–Naur form
. And so the appearance of new context-sensitive
two-level grammar presented a challenge to some readers of the 1968 ALGOL 68
"Final Report". Subsequently the final report was revised by Wijngaarden and his colleagues and published as the 1973 ALGOL 68 "Revised Report".
library prelude option, particular program, exit,
library postlude option, standard postlude, close symbol.
b) standard prelude : declaration prelude sequence.
c) library prelude : declaration prelude sequence.
d) particular program :
label sequence option, strong CLOSED void clause.
e) exit : go on symbol , letter e letter x letter i letter t, label symbol.
f) library postlude : statement interlude.
g) standard postlude : strong void clause train
A) EXTERNAL :: standard ; library ; system ; particular.
B) STOP :: label letter s letter t letter o letter p.
a) program text : STYLE begin token, new LAYER1 preludes,
parallel token, new LAYER1 tasks PACK,
STYLE end token.
b) NEST1 preludes : NEST1 standard prelude with DECS1,
NEST1 library prelude with DECSETY2,
NEST1 system prelude with DECSETY3, where (NEST1) is
(new EMPTY new DECS1 DECSETY2 DECSETY3).
c) NEST1 EXTERNAL prelude with DECSETY1 :
strong void NEST1 series with DECSETY1, go on token ;
where (DECSETY1) is (EMPTY), EMPTY.
d) NEST1 tasks : NEST1 system task list, and also token,
NEST1 user task PACK list.
e) NEST1 system task : strong void NEST1 unit.
f) NEST1 user task : NEST2 particular prelude with DECS,
NEST2 particular program PACK, go on token,
NEST2 particular postlude,
where (NEST2) is (NEST1 new DECS STOP).
g) NEST2 particular program :
NEST2 new LABSETY3 joined label definition
of LABSETY3, strong void NEST2 new LABSETY3
ENCLOSED clause.
h) NEST joined label definition of LABSETY :
where (LABSETY) is (EMPTY), EMPTY ;
where (LABSETY) is (LAB1 LABSETY1),
NEST label definition of LAB1,
NEST joined label definition of$ LABSETY1.
i) NEST2 particular postlude :
strong void NEST2 series with STOP.
.)
, used to constrain the syntax and to specify the semantics. This idea was well known at the time; e.g. Donald Knuth
visited the ALGOL 68 design committee while developing his own version of it, attribute grammar
s. Quite peculiar to W-grammars was their strict treatment of attributes as strings, defined by a context-free grammar, on which concatenation is the only possible operation; in attribute grammars, attributes can be of any data type, and any kind of operation can be applied to them.
After their introduction in the Algol 68 report, W-grammars were widely considered as too powerful and unconstrained to be practical. This was partly a consequence of the way in which they had been applied; the revised Algol 68 report contained a much more readable grammar, without modifying the W-grammar formalism itself.
Meanwhile, it became clear that W-grammars are indeed too powerful.
They are Turing complete, which makes parsing impossible in general: it is an undecidable problem
to decide whether a given string can be generated by a given W-grammar. Their use must be seriously constrained when used for automatic parsing or translation. Restricted and modified variants of W-grammars were developed to address this, e.g.
Dick Grune
created a C
program that would generate all possible productions of a 2-level grammar.
The applications of EAG
s mentioned above can effectively be regarded as applications of W-grammars, since EAGs are so close to W-grammars.
W-grammars have also been proposed for the description of complex human actions in ergonomics
.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a Van Wijngaarden grammar (also vW-grammar or W-grammar) is a two-level grammar
Two-level grammar
A two-level grammar is a formal grammar that is used to generate another formal grammar , such as one with an infinite rule set . This is how a Van Wijngaarden grammar was used to specify Algol68 . A context free grammar that defines the rules for a second grammar can yield an effectively infinite...
which provides a technique to define potentially infinite context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
s in a finite number of rules. The formalism was invented by Adriaan van Wijngaarden
Adriaan van Wijngaarden
Adriaan van Wijngaarden was an important mathematician and computer scientist who is considered by many to have been the founding father of informatica in the Netherlands...
to define rigorously some syntactic restrictions which previously had to be formulated in natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
, despite their essentially syntactical content. Typical applications are the treatment of gender
Grammatical gender
Grammatical gender is defined linguistically as a system of classes of nouns which trigger specific types of inflections in associated words, such as adjectives, verbs and others. For a system of noun classes to be a gender system, every noun must belong to one of the classes and there should be...
and number
Grammatical number
In linguistics, grammatical number is a grammatical category of nouns, pronouns, and adjective and verb agreement that expresses count distinctions ....
in natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
syntax and the well-definedness of identifiers in programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s.
The technique was used and developed in the definition of the programming language
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
ALGOL 68
ALGOL 68
ALGOL 68 isan imperative computerprogramming language that was conceived as a successor to theALGOL 60 programming language, designed with the goal of a...
. It is an example of the larger class of affix grammar
Affix grammar
An affix grammar is a kind of formal grammar; it is used to describe the syntax of languages, mainly computer languages, using an approach based on how natural language is typically described....
s.
Overview
A W-grammar consists of a finite set of metaMeta
Meta- , is a prefix used in English to indicate a concept which is an abstraction from another concept, used to complete or add to the latter....
-rules, which are used to derive (possibly infinitely many) production rules from a finite set of hyper-rules. Meta-rules are restricted to those defined by a context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
. Hyper-rules restrict the admissible contexts at the upper level. Essentially, the consistent substitution used in the derivation process is equivalent to unification, as in Prolog
Prolog
Prolog is a general purpose logic programming language associated with artificial intelligence and computational linguistics.Prolog has its roots in first-order logic, a formal logic, and unlike many other programming languages, Prolog is declarative: the program logic is expressed in terms of...
, as was noted by Alain Colmerauer
Alain Colmerauer
Alain Colmerauer is a French computer scientist.After completing his Ph.D. at the University of Grenoble, he spent 1967–1970 as Assistant Professor at the University of Montreal, where he created Q-Systems, one of the earliest linguistic formalisms used in the development of the TAUM-METEO machine...
.
ALGOL 68 examples
Prior to ALGOL 68ALGOL 68
ALGOL 68 isan imperative computerprogramming language that was conceived as a successor to theALGOL 60 programming language, designed with the goal of a...
the language ALGOL 60
ALGOL 60
ALGOL 60 is a member of the ALGOL family of computer programming languages. It gave rise to many other programming languages, including BCPL, B, Pascal, Simula, C, and many others. ALGOL 58 introduced code blocks and the begin and end pairs for delimiting them...
was formalised using the context-free Backus–Naur form
Backus–Naur form
In computer science, BNF is a notation technique for context-free grammars, often used to describe the syntax of languages used in computing, such as computer programming languages, document formats, instruction sets and communication protocols.It is applied wherever exact descriptions of...
. And so the appearance of new context-sensitive
Context-sensitive grammar
A context-sensitive grammar is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols...
two-level grammar presented a challenge to some readers of the 1968 ALGOL 68
ALGOL 68
ALGOL 68 isan imperative computerprogramming language that was conceived as a successor to theALGOL 60 programming language, designed with the goal of a...
"Final Report". Subsequently the final report was revised by Wijngaarden and his colleagues and published as the 1973 ALGOL 68 "Revised Report".
ALGOL 68 as in the 1968 Final Report §2.1
a) program : open symbol, standard prelude,library prelude option, particular program, exit,
library postlude option, standard postlude, close symbol.
b) standard prelude : declaration prelude sequence.
c) library prelude : declaration prelude sequence.
d) particular program :
label sequence option, strong CLOSED void clause.
e) exit : go on symbol , letter e letter x letter i letter t, label symbol.
f) library postlude : statement interlude.
g) standard postlude : strong void clause train
ALGOL 68 as in the 1973 Revised Report §2.2.1, §10.1.1
program : strong void new closed clauseA) EXTERNAL :: standard ; library ; system ; particular.
B) STOP :: label letter s letter t letter o letter p.
a) program text : STYLE begin token, new LAYER1 preludes,
parallel token, new LAYER1 tasks PACK,
STYLE end token.
b) NEST1 preludes : NEST1 standard prelude with DECS1,
NEST1 library prelude with DECSETY2,
NEST1 system prelude with DECSETY3, where (NEST1) is
(new EMPTY new DECS1 DECSETY2 DECSETY3).
c) NEST1 EXTERNAL prelude with DECSETY1 :
strong void NEST1 series with DECSETY1, go on token ;
where (DECSETY1) is (EMPTY), EMPTY.
d) NEST1 tasks : NEST1 system task list, and also token,
NEST1 user task PACK list.
e) NEST1 system task : strong void NEST1 unit.
f) NEST1 user task : NEST2 particular prelude with DECS,
NEST2 particular program PACK, go on token,
NEST2 particular postlude,
where (NEST2) is (NEST1 new DECS STOP).
g) NEST2 particular program :
NEST2 new LABSETY3 joined label definition
of LABSETY3, strong void NEST2 new LABSETY3
ENCLOSED clause.
h) NEST joined label definition of LABSETY :
where (LABSETY) is (EMPTY), EMPTY ;
where (LABSETY) is (LAB1 LABSETY1),
NEST label definition of LAB1,
NEST joined label definition of$ LABSETY1.
i) NEST2 particular postlude :
strong void NEST2 series with STOP.
Implementations
yo-yo parser for van Wijngaarden grammars with example grammars for expressions, eva, sal and Pascal. (The actual ISO 7185 standard for Pascal uses Extended Backus–Naur FormExtended Backus–Naur form
In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...
.)
History
W-grammars are based on the idea of providing the nonterminal symbols of context-free grammars with attributes (or affixes) that pass information between the nodes of the parse treeParse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...
, used to constrain the syntax and to specify the semantics. This idea was well known at the time; e.g. Donald Knuth
Donald Knuth
Donald Ervin Knuth is a computer scientist and Professor Emeritus at Stanford University.He is the author of the seminal multi-volume work The Art of Computer Programming. Knuth has been called the "father" of the analysis of algorithms...
visited the ALGOL 68 design committee while developing his own version of it, attribute grammar
Attribute grammar
An attribute grammar is a formal way to define attributes for the productions of a formal grammar, associating these attributes to values. The evaluation occurs in the nodes of the abstract syntax tree, when the language is processed by some parser or compiler....
s. Quite peculiar to W-grammars was their strict treatment of attributes as strings, defined by a context-free grammar, on which concatenation is the only possible operation; in attribute grammars, attributes can be of any data type, and any kind of operation can be applied to them.
After their introduction in the Algol 68 report, W-grammars were widely considered as too powerful and unconstrained to be practical. This was partly a consequence of the way in which they had been applied; the revised Algol 68 report contained a much more readable grammar, without modifying the W-grammar formalism itself.
Meanwhile, it became clear that W-grammars are indeed too powerful.
They are Turing complete, which makes parsing impossible in general: it is an undecidable problem
Undecidable problem
In computability theory and computational complexity theory, an undecidable problem is a decision problem for which it is impossible to construct a single algorithm that always leads to a correct yes-or-no answer....
to decide whether a given string can be generated by a given W-grammar. Their use must be seriously constrained when used for automatic parsing or translation. Restricted and modified variants of W-grammars were developed to address this, e.g.
- Extended Affix GrammarExtended Affix GrammarIn computer science, Extended Affix Grammars are a formal grammar formalism for describing the context free and context sensitive syntax of language, both natural language and programming languages....
s, applied to describe the grammars of natural languageNatural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
such as English and Spanish) - Q-systemsQ-systemsQ-Systems are a method of directed graph transformations according to given grammar rules, developed at the Université de Montréal by Alain Colmerauer in 1967--70 for use in natural language processing...
, also applied to natural language processing - the CDLCompiler Description LanguageCompiler Description Language, or CDL, is a Computer language based on affix grammars. It is very similar to Backus–Naur form notation. It was designed for the development of compilers. It is very limited in its capabilities and control flow; and intentionally so. The benefits of these limitations...
series of languages, applied as compiler constructionCompiler constructionCompiler construction is an area of computer science that deals with the theory and practice of developing programming languages and their associated compilers....
languages for programming languageProgramming languageA programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
s
Applications outside of ALGOL 68
Anthony Fisher has written a parser for a large class of W-grammars.Dick Grune
Dick Grune
Dick Grune is a Dutch computer scientist and university lecturer best known for inventing and developing the first version of CVS, the Concurrent Versions System. Grune was involved in the construction of Algol 68 compilers in the 1970s and the Amsterdam Compiler Kit in the 1980s.- Selected...
created a C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
program that would generate all possible productions of a 2-level grammar.
The applications of EAG
Extended Affix Grammar
In computer science, Extended Affix Grammars are a formal grammar formalism for describing the context free and context sensitive syntax of language, both natural language and programming languages....
s mentioned above can effectively be regarded as applications of W-grammars, since EAGs are so close to W-grammars.
W-grammars have also been proposed for the description of complex human actions in ergonomics
Ergonomics
Ergonomics is the study of designing equipment and devices that fit the human body, its movements, and its cognitive abilities.The International Ergonomics Association defines ergonomics as follows:...
.
- A W-Grammar Description for ADA - "W-grammar description of Ada is still useful to computer scientists who need more than a simple understanding of the syntax and rudimentary description of the semantics"
External links
- Use in Algol68
- Tutorial introduction by Steven PembertonSteven PembertonSteven Pemberton is one of the developers of the ABC programming language and of the Views system.He is chair of the W3C XHTML2 and XForms Working Groups and also member of RDFa Taskforce....
.