![](http://image.absoluteastronomy.com/images//topicimages/noimage.gif)
Chomsky normal form
Encyclopedia
In computer science
, a context-free grammar
is said to be in Chomsky normal form if all of its production rules are of the form:
where
,
and
are nonterminal symbols, α is a terminal symbol (a symbol that represents a constant value),
is the start symbol, and ε is the empty string
. Also, neither
nor
may be the start symbol.
Every grammar in Chomsky normal form is context-free
, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as (Hopcroft and Ullman, 1979).
As pointed out by Lange and Leiß, the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. Using
to denote the size of the original grammar
, the size blow-up in the worst case may range from
to
, depending on the transformation algorithm used (Lange and Leiß, 2009).
A formal grammar
is in Chomsky reduced form if all of its production rules are of the form:
where
,
and
are nonterminal symbols, and α is a terminal symbol. When using this definition,
or
may be the start symbol.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
is said to be in Chomsky normal form if all of its production rules are of the form:
-
or
-
or
-
where
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-4.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-5.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-6.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-7.gif)
Empty string
In computer science and formal language theory, the empty string is the unique string of length zero. It is denoted with λ or sometimes Λ or ε....
. Also, neither
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-8.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-9.gif)
Every grammar in Chomsky normal form is context-free
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
, and conversely, every context-free grammar can be transformed into an equivalent one which is in Chomsky normal form. Several algorithms for performing such a transformation are known. Transformations are described in most textbooks on automata theory, such as (Hopcroft and Ullman, 1979).
As pointed out by Lange and Leiß, the drawback of these transformations is that they can lead to an undesirable bloat in grammar size. Using
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-10.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-11.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-12.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-13.gif)
Alternative definition
Another way to define Chomsky normal form (e.g., Hopcroft and Ullman 1979, and Hopcroft et al. 2006) is:A formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
is in Chomsky reduced form if all of its production rules are of the form:
-
or
-
where
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-16.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-17.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-18.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-19.gif)
![](http://image.absoluteastronomy.com/images/formulas/6/5/1652905-20.gif)
Converting a grammar to Chomsky Normal Form
- Introduce
- Introduce a new start variable,
and a new rule
where
is the previous start variable.
- Introduce a new start variable,
- Eliminate all
rules
-
rules are rules of the form
where
and
where V is the CFG
Context-free grammarIn formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
's variable alphabet. - Remove every rule with
on its right hand side (RHS). For each rule with
in its RHS, add a set of new rules consisting of the different possible combinations of
replaced or not replaced with
. If a rule has
as a singleton on its RHS, add a new rule
unless
has already been removed through this process. For example, examine the following grammar G:
-
- G has one epsilon rule. When the
is removed, we get the following:
-
- Notice that we have to account for all possibilities of
and so we actually end up adding 3 rules.
-
- Eliminate all unit rules
- After all the
rules have been removed, you can begin removing unit rules, or rules whose RHS contains one variable and no terminals (which is inconsistent with CNF.)
- To remove
-
add rule
unless this is a unit rule which has already been removed.
- To remove
- Clean up remaining rules that are not in Chomsky normal form.
- Replace
with
where
are new variables.
- If
, replace
in above rules with some new variable
and add rule
.
- Replace
See also
- CYK algorithmCYK algorithmThe Cocke–Younger–Kasami algorithm is a parsing algorithm for context-free grammars. It employs bottom-up parsing and dynamic programming....
- Backus-Naur form
- Greibach normal formGreibach normal formIn computer science and formal language theory, a context-free grammar is in Greibach normal form if the right-hand sides of all productions start with a terminal symbol, optionally followed by some variables. A non-strict form allows one exception to this format restriction for allowing the empty...
- Kuroda normal formKuroda normal formIn formal language theory, a grammar is in Kuroda normal form if, and only if, all production rules are of the form:Every grammar in Kuroda normal form is monotonic, and therefore, generates a context-sensitive language. Conversely, every context-sensitive language which does not generate the...