Grammar Induction
Encyclopedia
Grammatical induction, also known as grammatical inference or syntactic pattern recognition
, refers to the process in machine learning
of learning a formal grammar
(usually in the form of re-write rules or productions) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. Grammatical inference is distinguished from traditional decision rules
and other such methods principally by the nature of the resulting model, which in the case of grammatical inference relies heavily on hierarchical substitutions. Whereas a traditional decision rule set is geared toward assessing object classification, a grammatical rule set is geared toward the generation of examples. In this sense, the grammatical induction problem can be said to seek a generative model, while the decision rule problem seeks a descriptive model.
algorithm. The text provide a simple example which nicely illustrates the process, but the feasibility of such an unguided trial-and-error approach for more substantial problems is dubious.
s is the process of evolving a representation of the grammar of a target language through some evolutionary process. Formal grammar
s can easily be represented as a tree structure of production rules that can be subjected to evolutionary operators. Algorithm
s of this sort stem from the genetic programming
paradigm pioneered by John Koza
. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF
language made trees
a more flexible approach.
Koza represented Lisp programs as trees
. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions
of the lisp code. Similar analogues between the tree structured lisp representation and the representation of grammars as trees, made the application of genetic programming techniques possible for grammar induction.
In the case of Grammar Induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language. In a tree representation of a grammar, a terminal symbol of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a noun phrase
or a verb phrase
) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal.
s, greedy grammar inference algorithms make, in iterative manner, decisions that seem to be the best at that stage.
These made decisions deal usually with things like the making of a new or the removing of the existing rules, the choosing of the applied rule or the merging of some existing rules.
Because there are several ways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms.
These context-free grammar
generating algorithms make the decision after every read symbol:
These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to make decisions:
, and have been applied (among many other problems) to morpheme
analysis, and even place name derivations. Grammar induction has also been used for lossless data compression
and statistical inference
via
MML
and MDL
principles.
Syntactic pattern recognition
Syntactic pattern recognition or structural pattern recognition is a form of pattern recognition, in which each object can be represented by a variable-cardinality set of symbolic, nominal features...
, refers to the process in machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
of learning a formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
(usually in the form of re-write rules or productions) from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects. Grammatical inference is distinguished from traditional decision rules
Decision rules
A set of decision rules is the verbal equivalent of a graphical decision tree, which specifies class membership based on a hierarchical sequence of decisions...
and other such methods principally by the nature of the resulting model, which in the case of grammatical inference relies heavily on hierarchical substitutions. Whereas a traditional decision rule set is geared toward assessing object classification, a grammatical rule set is geared toward the generation of examples. In this sense, the grammatical induction problem can be said to seek a generative model, while the decision rule problem seeks a descriptive model.
Methodologies
There are a wide variety of methods for grammatical inference. Two of the classic sources are and . also devote a brief section to the problem, and cite a number of references. The basic trial-and-error method they present is discussed below.Grammatical inference by trial-and-error
The method proposed in Section 8.7 of suggests successively guessing grammar rules (productions) and testing them against positive and negative observations. The rule set is expanded so as to be able to generate each positive example, but if a given rule set also generates a negative example, it must be discarded. This particular approach can be characterized as "hypothesis testing" and bears some similarity to Mitchel's version spaceVersion space
A version space in concept learning or induction is the subset of all hypotheses that are consistent with the observed training examples. This set contains all hypotheses that have not been eliminated as a result of being in conflict with observed data....
algorithm. The text provide a simple example which nicely illustrates the process, but the feasibility of such an unguided trial-and-error approach for more substantial problems is dubious.
Grammatical inference by genetic algorithms
Grammatical Induction using evolutionary algorithmEvolutionary algorithm
In artificial intelligence, an evolutionary algorithm is a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses some mechanisms inspired by biological evolution: reproduction, mutation, recombination, and selection...
s is the process of evolving a representation of the grammar of a target language through some evolutionary process. Formal grammar
Formal grammar
A formal grammar is a set of formation rules for strings in a formal language. The rules describe how to form strings from the language's alphabet that are valid according to the language's syntax...
s can easily be represented as a tree structure of production rules that can be subjected to evolutionary operators. Algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s of this sort stem from the genetic programming
Genetic programming
In artificial intelligence, genetic programming is an evolutionary algorithm-based methodology inspired by biological evolution to find computer programs that perform a user-defined task. It is a specialization of genetic algorithms where each individual is a computer program...
paradigm pioneered by John Koza
John Koza
John R. Koza is a computer scientist and a former consulting professor at Stanford University, most notable for his work in pioneering the use of genetic programming for the optimization of complex problems. He was a cofounder of Scientific Games Corporation, a company which built computer systems...
. Other early work on simple formal languages used the binary string representation of genetic algorithms, but the inherently hierarchical structure of grammars couched in the EBNF
Extended Backus–Naur form
In computer science, Extended Backus–Naur Form is a family of metasyntax notations used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. They are extensions of the basic Backus–Naur Form metasyntax notation.The...
language made trees
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
a more flexible approach.
Koza represented Lisp programs as trees
Tree (data structure)
In computer science, a tree is a widely-used data structure that emulates a hierarchical tree structure with a set of linked nodes.Mathematically, it is an ordered directed tree, more specifically an arborescence: an acyclic connected graph where each node has zero or more children nodes and at...
. He was able to find analogues to the genetic operators within the standard set of tree operators. For example, swapping sub-trees is equivalent to the corresponding process of genetic crossover, where sub-strings of a genetic code are transplanted into an individual of the next generation. Fitness is measured by scoring the output from the functions
Grammatical function
In linguistics, grammatical functions refer to functional relationships between participants in a proposition...
of the lisp code. Similar analogues between the tree structured lisp representation and the representation of grammars as trees, made the application of genetic programming techniques possible for grammar induction.
In the case of Grammar Induction, the transplantation of sub-trees corresponds to the swapping of production rules that enable the parsing of phrases from some language. The fitness operator for the grammar is based upon some measure of how well it performed in parsing some group of sentences from the target language. In a tree representation of a grammar, a terminal symbol of a production rule corresponds to a leaf node of the tree. Its parent nodes corresponds to a non-terminal symbol (e.g. a noun phrase
Noun phrase
In grammar, a noun phrase, nominal phrase, or nominal group is a phrase based on a noun, pronoun, or other noun-like word optionally accompanied by modifiers such as adjectives....
or a verb phrase
Verb phrase
In linguistics, a verb phrase or VP is a syntactic unit composed of at least one verb and the dependents of that verb. One can distinguish between two types of VPs, finite VPs and non-finite VPs . While phrase structure grammars acknowledge both, dependency grammars reject the existence of a...
) in the rule set. Ultimately, the root node might correspond to a sentence non-terminal.
Grammatical inference by greedy algorithms
Like all greedy algorithmGreedy algorithm
A greedy algorithm is any algorithm that follows the problem solving heuristic of making the locally optimal choice at each stagewith the hope of finding the global optimum....
s, greedy grammar inference algorithms make, in iterative manner, decisions that seem to be the best at that stage.
These made decisions deal usually with things like the making of a new or the removing of the existing rules, the choosing of the applied rule or the merging of some existing rules.
Because there are several ways to define 'the stage' and 'the best', there are also several greedy grammar inference algorithms.
These context-free grammar
Context-free grammar
In formal language theory, a context-free grammar is a formal grammar in which every production rule is of the formwhere V is a single nonterminal symbol, and w is a string of terminals and/or nonterminals ....
generating algorithms make the decision after every read symbol:
- Lempel-Ziv-Welch algorithmLZWLempel–Ziv–Welch is a universal lossless data compression algorithm created by Abraham Lempel, Jacob Ziv, and Terry Welch. It was published by Welch in 1984 as an improved implementation of the LZ78 algorithm published by Lempel and Ziv in 1978...
creates a context-free grammar in a deterministic way such that it is necessary to store only the start rule of the generated grammar. - SequiturSEQUITUR algorithmSequitur is a recursive algorithm developed by Craig Nevill-Manning and Ian H. Witten in 1997 that infers a hierarchical structure from a sequence of discrete symbols. The algorithm operates in linear space and time...
and its modifications.
These context-free grammar generating algorithms first read the whole given symbol-sequence and then start to make decisions:
- Byte pair encodingByte pair encodingByte pair encoding or digram coding is a simple form of data compression in which the most common pair of consecutive bytes of data is replaced with a byte that does not occur within that data. A table of the replacements is required to rebuild the original data...
and its optimizations.
Applications
The principle of grammar induction has been applied to other aspects of natural language processingNatural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
, and have been applied (among many other problems) to morpheme
Morpheme
In linguistics, a morpheme is the smallest semantically meaningful unit in a language. The field of study dedicated to morphemes is called morphology. A morpheme is not identical to a word, and the principal difference between the two is that a morpheme may or may not stand alone, whereas a word,...
analysis, and even place name derivations. Grammar induction has also been used for lossless data compression
Lossless data compression
Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. The term lossless is in contrast to lossy data compression, which only allows an approximation of the original data to be reconstructed, in exchange...
and statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...
via
MML
Minimum message length
Minimum message length is a formal information theory restatement of Occam's Razor: even when models are not equal in goodness of fit accuracy to the observed data, the one generating the shortest overall message is more likely to be correct...
and MDL
Minimum description length
The minimum description length principle is a formalization of Occam's Razor in which the best hypothesis for a given set of data is the one that leads to the best compression of the data. MDL was introduced by Jorma Rissanen in 1978...
principles.
See also
- Artificial grammar learningArtificial grammar learningArtificial Grammar Learning is a paradigm of study within cognitive psychology. Its goal is to investigate the processes that underlie human language learning, by testing subjects' ability to learn a made-up language in a laboratory setting...
- Syntactic pattern recognitionSyntactic pattern recognitionSyntactic pattern recognition or structural pattern recognition is a form of pattern recognition, in which each object can be represented by a variable-cardinality set of symbolic, nominal features...
- Inductive inferenceInductive inferenceAround 1960, Ray Solomonoff founded the theory of universal inductive inference, the theory of prediction based on observations; for example, predicting the next symbol based upon a given series of symbols...
- Straight-line grammar
- Kolmogorov complexityKolmogorov complexityIn algorithmic information theory , the Kolmogorov complexity of an object, such as a piece of text, is a measure of the computational resources needed to specify the object...
- Automatic distillation of structureAutomatic distillation of structureAutomatic Distillation of Structure is an algorithm that can analyse source material such as text and come up with meaningful information about the generative structures that gave rise to the source. One application of the algorithm is grammar induction: ADIOS can read a source text and infer...