Link grammar
Encyclopedia
Link grammar is a theory of syntax
by Davy Temperley and Daniel Sleator
which builds relations between pairs of words, rather than constructing constituents in a tree-like
hierarchy. There are two basic parameters: directionality and distance. Link grammar is similar to dependency grammar
, but dependency grammar includes a head-dependent relationship, as well as lacking directionality in the relations between words. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words
For example, in a subject–verb–object language like English, the verb would look left to form a subject link, and right to form an object link. Nouns would look right to complete the subject link, or left to complete the object link.
In a subject–object–verb language like Persian
, the verb would look left to form an object link, and a more distant left to form a subject link. Nouns would look to the right for both subject and object links.
& or a disjunction
or. Each rule ends with a semicolon ;.
Thus the English sentence, “The boy painted a picture” would appear as:
+-----O-----+
+-D-+--S--+ +--D--+
| | | | |
The boy painted a picture
SOV language might consist of the following links:
And a simple Persian sentence, man nAn xordam (من نان خوردم) 'I ate bread' would look like:
+-----S-----+
| +--O--+
| | |
man nAn xordam
written in C
. It is available under the BSD license, which is compatible with the GNU General Public License
. The parser is an ongoing project, located here. Recent versions include improved sentence coverage, various bug and security fixes, and Java language bindings.
There are also Perl
, Python
, Ruby, Java
, OCaml
and .NET
bindings available.
The link-grammar program along with rules and word lists for English may be found in standard Linux distribution
s, e.g., as a Debian
package.
, a free
word processor
, uses Link Grammar for on-the-fly grammar checking.http://www.abisource.com/projects/link-grammar/ Words that cannot be linked anywhere are underlined in green.
The RelEx semantic relationship extractor, layered on top of the Link Grammar library, generates a dependency grammar
output by making explicit the semantic relationships between words in a sentence. Its output can be classified as being at a level between that of SSyntR and DSyntR of Meaning-Text Theory
. It also provides framing/grounding, anaphora resolution, head-word identification, lexical chunking, part-of-speech identification, and tagging, including entity, date, money, gender, etc. tagging. It includes a compatibility mode to generate dependency output compatible with the Stanford parser, and Penn TreeBank-compatible POS tagging.
Link Grammar has also been employed for information extraction
of
biomedical texts and
events described in news articles, as well as experimental machine translation systems from English to German and Turkish.
The Link Grammar link dictionary is used to generate and verify the syntactic correctness of two different natural language generation systems: NLGen and NLGen2. It is also used as a part of the NLP pipeline in the OpenCog
AI project.
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
by Davy Temperley and Daniel Sleator
Daniel Sleator
Daniel Dominic Kaplan Sleator is a professor of computer science at Carnegie Mellon University. He discovered amortized analysis and he invented many data structures with Robert Tarjan, such as splay trees, link/cut trees, and skew heaps. He also pioneered the theory of link grammars and developed...
which builds relations between pairs of words, rather than constructing constituents in a tree-like
Parse tree
A concrete syntax tree or parse tree or parsing treeis an ordered, rooted tree that represents the syntactic structure of a string according to some formal grammar. In a parse tree, the interior nodes are labeled by non-terminals of the grammar, while the leaf nodes are labeled by terminals of the...
hierarchy. There are two basic parameters: directionality and distance. Link grammar is similar to dependency grammar
Dependency grammar
Dependency grammar is a class of modern syntactic theories that are all based on the dependency relation and that can be traced back primarily to the work of Lucien Tesnière. Dependency grammars are distinct from phrase structure grammars , since they lack phrasal nodes. Structure is determined by...
, but dependency grammar includes a head-dependent relationship, as well as lacking directionality in the relations between words. Colored Multiplanar Link Grammar (CMLG) is an extension of LG allowing crossing relations between pairs of words
For example, in a subject–verb–object language like English, the verb would look left to form a subject link, and right to form an object link. Nouns would look right to complete the subject link, or left to complete the object link.
In a subject–object–verb language like Persian
Persian language
Persian is an Iranian language within the Indo-Iranian branch of the Indo-European languages. It is primarily spoken in Iran, Afghanistan, Tajikistan and countries which historically came under Persian influence...
, the verb would look left to form an object link, and a more distant left to form a subject link. Nouns would look to the right for both subject and object links.
Syntax
Rightward links are represented as a +, and leftward links with a -. Optional links are contained in curly brackets {...}. Undesirable links are contained in any number of square brackets [...]. Multiple links are joined either by a conjunctionLogical conjunction
In logic and mathematics, a two-place logical operator and, also known as logical conjunction, results in true if both of its operands are true, otherwise the value of false....
& or a disjunction
Logical disjunction
In logic and mathematics, a two-place logical connective or, is a logical disjunction, also known as inclusive disjunction or alternation, that results in true whenever one or more of its operands are true. E.g. in this context, "A or B" is true if A is true, or if B is true, or if both A and B are...
or. Each rule ends with a semicolon ;.
Example 1
A basic rule file for an SVO language might look like:: D+; : {D-} & S+; : {D-} & O-; : S- & {O+};
Thus the English sentence, “The boy painted a picture” would appear as:
+-----O-----+
+-D-+--S--+ +--D--+
| | | | |
The boy painted a picture
Example 2
While a rule file for a null subjectNull subject language
In linguistic typology, a null-subject language is a language whose grammar permits an independent clause to lack an explicit subject. Such a clause is then said to have a null subject. Typically, null subject languages express person, number, and/or gender agreement with the referent on the verb,...
SOV language might consist of the following links:
: S+; : O+; : {O-} & {S-};
And a simple Persian sentence, man nAn xordam (من نان خوردم) 'I ate bread' would look like:
+-----S-----+
| +--O--+
| | |
man nAn xordam
Implementations
The Link grammar syntax parser is a library for natural language processingNatural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
written in C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
. It is available under the BSD license, which is compatible with the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
. The parser is an ongoing project, located here. Recent versions include improved sentence coverage, various bug and security fixes, and Java language bindings.
There are also Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, Ruby, Java
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, OCaml
Objective Caml
OCaml , originally known as Objective Caml, is the main implementation of the Caml programming language, created by Xavier Leroy, Jérôme Vouillon, Damien Doligez, Didier Rémy and others in 1996...
and .NET
.NET Framework
The .NET Framework is a software framework that runs primarily on Microsoft Windows. It includes a large library and supports several programming languages which allows language interoperability...
bindings available.
The link-grammar program along with rules and word lists for English may be found in standard Linux distribution
Linux distribution
A Linux distribution is a member of the family of Unix-like operating systems built on top of the Linux kernel. Such distributions are operating systems including a large collection of software applications such as word processors, spreadsheets, media players, and database applications...
s, e.g., as a Debian
Debian
Debian is a computer operating system composed of software packages released as free and open source software primarily under the GNU General Public License along with other free software licenses. Debian GNU/Linux, which includes the GNU OS tools and Linux kernel, is a popular and influential...
package.
Applications
AbiWordAbiWord
AbiWord is a free and open source software word processor. It was originally started by SourceGear Corporation as the first part of a proposed AbiSuite. Development stopped when SourceGear changed their focus to Internet appliances. AbiWord was adopted by some open source developers and AbiWord...
, a free
Free software
Free software, software libre or libre software is software that can be used, studied, and modified without restriction, and which can be copied and redistributed in modified or unmodified form either without restriction, or with restrictions that only ensure that further recipients can also do...
word processor
Word processor
A word processor is a computer application used for the production of any sort of printable material....
, uses Link Grammar for on-the-fly grammar checking.http://www.abisource.com/projects/link-grammar/ Words that cannot be linked anywhere are underlined in green.
The RelEx semantic relationship extractor, layered on top of the Link Grammar library, generates a dependency grammar
Dependency grammar
Dependency grammar is a class of modern syntactic theories that are all based on the dependency relation and that can be traced back primarily to the work of Lucien Tesnière. Dependency grammars are distinct from phrase structure grammars , since they lack phrasal nodes. Structure is determined by...
output by making explicit the semantic relationships between words in a sentence. Its output can be classified as being at a level between that of SSyntR and DSyntR of Meaning-Text Theory
Meaning-Text Theory
Meaning–text theory is a theoretical linguistic framework, first put forward in Moscow by Aleksandr Žolkovskij and Igor Mel’čuk, for the construction of models of natural language...
. It also provides framing/grounding, anaphora resolution, head-word identification, lexical chunking, part-of-speech identification, and tagging, including entity, date, money, gender, etc. tagging. It includes a compatibility mode to generate dependency output compatible with the Stanford parser, and Penn TreeBank-compatible POS tagging.
Link Grammar has also been employed for information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...
of
biomedical texts and
events described in news articles, as well as experimental machine translation systems from English to German and Turkish.
The Link Grammar link dictionary is used to generate and verify the syntactic correctness of two different natural language generation systems: NLGen and NLGen2. It is also used as a part of the NLP pipeline in the OpenCog
OpenCog
OpenCog is a project that aims to build an open source artificial general intelligence framework. OpenCog Prime is a specific set of interacting components designed to give rise to human-equivalent artificial general intelligence...
AI project.
External links
- The original Link Grammar homepage (which has been replaced by the current project.)
- Online English demonstration (for an older, out-of-date version; many bugs have been fixed since this version.)
- LinkGrammar-WN, lexicon expansion for the Link Grammar Parser (out of date, superseded by recent work that has been incorporated into the link-grammar parser.)
- BioLG, a modification of the Link Grammar Parser adapted for the biomedical domain (many, but not all, BioLG enhancements have been folded back into the main link-grammar distribution).