Wirth syntax notation
Encyclopedia
Wirth syntax notation is a metasyntax
, that is, a formal way to describe formal language
s. Originally proposed by Niklaus Wirth
in 1977 as an alternative to Backus-Naur form (BNF), it has several advantages over BNF in that it can be defined using itself, it contains an explicit iteration construct, and it avoids the use of an explicit symbol for the empty string (such as or ε).
WSN has been used in several international standards
, starting with ISO 10303-21
. It was also used to define the syntax of EXPRESS
, the data modelling language of STEP
.
PRODUCTION = IDENTIFIER "=" EXPRESSION "." .
EXPRESSION = TERM { "|" TERM } .
TERM = FACTOR { FACTOR } .
FACTOR = IDENTIFIER
| LITERAL
| "[" EXPRESSION "]"
| "(" EXPRESSION ")"
| "{" EXPRESSION "}" .
IDENTIFIER = letter { letter } .
LITERAL = """" character { character } """" .
The equals sign indicates a production. The element on the left is defined to be the combination of elements on the right. A production is terminated by a full stop (period).
We take these concepts for granted today, but they
were novel and even controversial in 1977. Wirth later incorporated some
of the concepts (with a different syntax and notation) into Extended Backus-Naur form.
Notice that
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .
upper-case = "A" | "B" | … | "Y" | "Z" .
lower-case = "a" | "b" | … | "y" | "z" .
letter = upper-case | lower-case .
If
characters, then it diverges even more from
) characters.
syntax = rule [ syntax ] .
rule = opt-whitespace "<" rule-name ">" opt-whitespace "::="
opt-whitespace expression line-end .
opt-whitespace = { " " } .
expression = list [ "|" expression ] .
line-end = opt-whitespace EOL | line-end line-end .
list = term [ opt-whitespace list ] .
term = literal | "<" rule-name ">" .
literal = """" text """" | "'" text "'" .
This definition appears overly complicated because the concept of "optional whitespace
" must be explicitly defined in BNF, but it is implicit in WSN. Even in this example,
One of the problems with BNF which this example illustrates is that by allowing both single-quote and double-quote characters to be used for a
Metasyntax
A metasyntax describes the allowable structure and composition of phrases and sentences of a metalanguage, which is used to describe either a natural language or a computer programming language...
, that is, a formal way to describe formal language
Formal language
A formal language is a set of words—that is, finite strings of letters, symbols, or tokens that are defined in the language. The set from which these letters are taken is the alphabet over which the language is defined. A formal language is often defined by means of a formal grammar...
s. Originally proposed by Niklaus Wirth
Niklaus Wirth
Niklaus Emil Wirth is a Swiss computer scientist, best known for designing several programming languages, including Pascal, and for pioneering several classic topics in software engineering. In 1984 he won the Turing Award for developing a sequence of innovative computer languages.-Biography:Wirth...
in 1977 as an alternative to Backus-Naur form (BNF), it has several advantages over BNF in that it can be defined using itself, it contains an explicit iteration construct, and it avoids the use of an explicit symbol for the empty string (such as
WSN has been used in several international standards
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
, starting with ISO 10303-21
ISO 10303-21
STEP-File is the most widely used data exchange form of STEP. Due to its ASCII structure it is easy to read with typically one instance per line. The format of a STEP-File is defined in ISO 10303-21 Clear Text Encoding of the Exchange Structure....
. It was also used to define the syntax of EXPRESS
ISO 10303-11
EXPRESS is a standard data modeling language for product data. EXPRESS is formalized in the ISO Standard for the Exchange of Product model STEP , and standardized as ISO 10303-11.- Overview :...
, the data modelling language of STEP
ISO 10303
ISO 10303 is an ISO standard for the computer-interpretable representation and exchange of product manufacturing information. Its official title is: Automation systems and integration — Product data representation and exchange...
.
WSN defined in itself
SYNTAX = { PRODUCTION } .PRODUCTION = IDENTIFIER "=" EXPRESSION "." .
EXPRESSION = TERM { "|" TERM } .
TERM = FACTOR { FACTOR } .
FACTOR = IDENTIFIER
| LITERAL
| "[" EXPRESSION "]"
| "(" EXPRESSION ")"
| "{" EXPRESSION "}" .
IDENTIFIER = letter { letter } .
LITERAL = """" character { character } """" .
The equals sign indicates a production. The element on the left is defined to be the combination of elements on the right. A production is terminated by a full stop (period).
- Repetition is denoted by curly brackets, e.g.,
{a} stands for ε | a | aa | aaa | …. - Optionality is expressed by square brackets, e.g.,
[a]b stands for ab | b. - Parenthesis serve for groupings, e.g., (a|b)c stands for ac | bc.
We take these concepts for granted today, but they
were novel and even controversial in 1977. Wirth later incorporated some
of the concepts (with a different syntax and notation) into Extended Backus-Naur form.
Notice that
letter
and character
are left undefined. This is because numeric characters (digits 0 through 9) may be included in both definitions or excluded from one, depending on the language being defined, e.g.:digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" .
upper-case = "A" | "B" | … | "Y" | "Z" .
lower-case = "a" | "b" | … | "y" | "z" .
letter = upper-case | lower-case .
If
character
goes on to include digit
and other printable ASCIIASCII
The American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
characters, then it diverges even more from
letter
, which one can assume does not include the digit characters or any of the special (non-alphanumericAlphanumeric
Alphanumeric is a combination of alphabetic and numeric characters, and is used to describe the collection of Latin letters and Arabic digits or a text constructed from this collection. There are either 36 or 62 alphanumeric characters. The alphanumeric character set consists of the numbers 0 to...
) characters.
Another example
The syntax of BNF can be represented with WSN as follows, based on translating the BNF example of itself:syntax = rule [ syntax ] .
rule = opt-whitespace "<" rule-name ">" opt-whitespace "::="
opt-whitespace expression line-end .
opt-whitespace = { " " } .
expression = list [ "|" expression ] .
line-end = opt-whitespace EOL | line-end line-end .
list = term [ opt-whitespace list ] .
term = literal | "<" rule-name ">" .
literal = """" text """" | "'" text "'" .
This definition appears overly complicated because the concept of "optional whitespace
Whitespace (computer science)
In computer science, whitespace is any single character or series of characters that represents horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visual mark, but typically does occupy an area on a page...
" must be explicitly defined in BNF, but it is implicit in WSN. Even in this example,
text
is left undefined, but it is assumed to mean "ASCII-characterASCIIThe American Standard Code for Information Interchange is a character-encoding scheme based on the ordering of the English alphabet. ASCII codes represent text in computers, communications equipment, and other devices that use text...
{ ASCII-character }
". (EOL
is also left undefined.) Notice how the kludge "<" rule-name ">"
has been used twice because text
was not explicitly defined.One of the problems with BNF which this example illustrates is that by allowing both single-quote and double-quote characters to be used for a
literal
, there is an added potential for human error in attempting to create a machine-readable syntax. One of the concepts migrated to later metasyntaxes was the idea that giving the user multiple choices made it harder to write parsers for grammars defined by the syntax, so computer languages in general have become more restrictive in how a quoted-literal is defined.