LOLITA
Encyclopedia
LOLITA is a natural language processing
system developed by Durham University
between 1986 and 2000. The name is an acronym for "Large-scale, Object-based, Linguistic
Interactor
, Translator
and Analyzer".
LOLITA was developed by Roberto Garigliano and colleagues between 1986 and 2000. It was designed as a general-purpose tool for processing unrestricted text that could be the basis of a wide variety of applications
. At its core was a semantic network
containing some 90,000 interlinked concepts. Text could be parsed
and analysed then incorporated into the semantic net, where it could be reasoned about (Long and Garigliano, 1993). Fragments of the semantic net could also be rendered back to English
or Spanish
.
Several applications were built using the system, including financial information analysers and information extraction tools for Darpa’s “Message Understanding Conference
Competitions” (MUC-6
and MUC-7
). The latter involved processing original Wall Street Journal articles, to perform tasks such as identifying key job changes in businesses and summarising articles. LOLITA was one of a small number of systems worldwide to compete in all sections of the tasks. A system description and an analysis of the MUC-6 results were written by Callaghan (Callaghan, 1998).
LOLITA was an early example of a substantial application written in a functional language
: it consisted of around 50,000 lines of Haskell
, with around 6000 lines of C
. It is also a complex and demanding application, in which many aspects of Haskell were invaluable in development.
LOLITA was designed to handle unrestricted text, so that ambiguity at various levels was unavoidable and significant. Laziness
was essential in handling the explosion of syntactic
ambiguity resulting from a large grammar
, and it was much used with semantic ambiguity too. The system used multiple "domain specific
embedded
languages
" for semantic
and pragmatic processing and for generation of natural language text from the semantic net. Also important was the ability to work with complex abstractions and to prototype
new analysis algorithms
quickly.
Later systems based on the same design include Concepts and SenseGraph.
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
system developed by Durham University
Durham University
The University of Durham, commonly known as Durham University, is a university in Durham, England. It was founded by Act of Parliament in 1832 and granted a Royal Charter in 1837...
between 1986 and 2000. The name is an acronym for "Large-scale, Object-based, Linguistic
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
Interactor
Interactor
Interactor is a term used to describe a part of an organism with evolution selection acts upon. Interactors are the individual evolutionary paths which are subject to real-life interactions, such as phenotype and the outward traits most affected by natural selection...
, Translator
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
and Analyzer".
LOLITA was developed by Roberto Garigliano and colleagues between 1986 and 2000. It was designed as a general-purpose tool for processing unrestricted text that could be the basis of a wide variety of applications
Application software
Application software, also known as an application or an "app", is computer software designed to help the user to perform specific tasks. Examples include enterprise software, accounting software, office suites, graphics software and media players. Many application programs deal principally with...
. At its core was a semantic network
Semantic network
A semantic network is a network which represents semantic relations among concepts. This is often used as a form of knowledge representation. It is a directed or undirected graph consisting of vertices, which represent concepts, and edges.- History :...
containing some 90,000 interlinked concepts. Text could be parsed
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
and analysed then incorporated into the semantic net, where it could be reasoned about (Long and Garigliano, 1993). Fragments of the semantic net could also be rendered back to English
English language
English is a West Germanic language that arose in the Anglo-Saxon kingdoms of England and spread into what was to become south-east Scotland under the influence of the Anglian medieval kingdom of Northumbria...
or Spanish
Spanish language
Spanish , also known as Castilian , is a Romance language in the Ibero-Romance group that evolved from several languages and dialects in central-northern Iberia around the 9th century and gradually spread with the expansion of the Kingdom of Castile into central and southern Iberia during the...
.
Several applications were built using the system, including financial information analysers and information extraction tools for Darpa’s “Message Understanding Conference
Message Understanding Conference
The Message Understanding Conferences were initiated and financed by DARPA to encouragethe development of new and better methods of information extraction.The character of this competition—many concurrent research teams competing against one another—required the development of standardsfor...
Competitions” (MUC-6
Message Understanding Conference
The Message Understanding Conferences were initiated and financed by DARPA to encouragethe development of new and better methods of information extraction.The character of this competition—many concurrent research teams competing against one another—required the development of standardsfor...
and MUC-7
Message Understanding Conference
The Message Understanding Conferences were initiated and financed by DARPA to encouragethe development of new and better methods of information extraction.The character of this competition—many concurrent research teams competing against one another—required the development of standardsfor...
). The latter involved processing original Wall Street Journal articles, to perform tasks such as identifying key job changes in businesses and summarising articles. LOLITA was one of a small number of systems worldwide to compete in all sections of the tasks. A system description and an analysis of the MUC-6 results were written by Callaghan (Callaghan, 1998).
LOLITA was an early example of a substantial application written in a functional language
Functional programming
In computer science, functional programming is a programming paradigm that treats computation as the evaluation of mathematical functions and avoids state and mutable data. It emphasizes the application of functions, in contrast to the imperative programming style, which emphasizes changes in state...
: it consisted of around 50,000 lines of Haskell
Haskell (programming language)
Haskell is a standardized, general-purpose purely functional programming language, with non-strict semantics and strong static typing. It is named after logician Haskell Curry. In Haskell, "a function is a first-class citizen" of the programming language. As a functional programming language, the...
, with around 6000 lines of C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
. It is also a complex and demanding application, in which many aspects of Haskell were invaluable in development.
LOLITA was designed to handle unrestricted text, so that ambiguity at various levels was unavoidable and significant. Laziness
Lazy evaluation
In programming language theory, lazy evaluation or call-by-need is an evaluation strategy which delays the evaluation of an expression until the value of this is actually required and which also avoids repeated evaluations...
was essential in handling the explosion of syntactic
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
ambiguity resulting from a large grammar
Grammar
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...
, and it was much used with semantic ambiguity too. The system used multiple "domain specific
Domain-specific programming language
In software development and domain engineering, a domain-specific language is a programming language or specification language dedicated to a particular problem domain, a particular problem representation technique, and/or a particular solution technique...
embedded
Embedded system
An embedded system is a computer system designed for specific control functions within a larger system. often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal...
languages
Programming language
A programming language is an artificial language designed to communicate instructions to a machine, particularly a computer. Programming languages can be used to create programs that control the behavior of a machine and/or to express algorithms precisely....
" for semantic
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
and pragmatic processing and for generation of natural language text from the semantic net. Also important was the ability to work with complex abstractions and to prototype
Prototype
A prototype is an early sample or model built to test a concept or process or to act as a thing to be replicated or learned from.The word prototype derives from the Greek πρωτότυπον , "primitive form", neutral of πρωτότυπος , "original, primitive", from πρῶτος , "first" and τύπος ,...
new analysis algorithms
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
quickly.
Later systems based on the same design include Concepts and SenseGraph.
External links
- Lolita Progress Report #1 1992
- http://www-fp.dcs.st-and.ac.uk/~kh/papers/ABSTRACTS.html A collection of papers on parallelism in Haskell, Lolita frequently being one of or the primary test cases
- Belief Modeling for Discourse Plans -(Garagani 1997)