Computational linguistics
Encyclopedia
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language
from a computational perspective.
Traditionally, computational linguistics was usually performed by computer scientist
s who had specialized in the application of computers to the processing of a natural language
. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists
, computer scientists
, experts in artificial intelligence
, mathematicians, logic
ians, philosophers, cognitive scientists
, cognitive psychologists
, psycholinguists
, anthropologists
and neuroscientists
, among others.
Computational linguistics has applied and theoretical components, where theoretical computational linguistics takes up issues in theoretical linguistics and cognitive science and applied computational linguistics focuses on the practical outcome of modelling human language use.
, a field under which it is often grouped. Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since computers can make arithmetic
calculations much faster and more accurately than humans, it was thought to be only a short matter of time before the technical details could be taken care of that would allow them the same remarkable capacity to process language.
When machine translation
(also known as mechanical translation) failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed. Computational linguistics was born as the name of the new field of study devoted to developing algorithm
s and software for intelligently processing language data. When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division of artificial intelligence dealing with human-level comprehension and production of natural languages.
In order to translate one language into another, it was observed that one had to understand the grammar
of both languages, including both morphology
(the grammar of word forms) and syntax
(the grammar of sentence structure). In order to understand syntax, one had to also understand the semantics
and the lexicon
(or 'vocabulary'), and even to understand something of the pragmatics
of language use. Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers.
Nowadays research within the scope of computational linguistics is done at computational linguistics departments, computational linguistics laboratories, computer science
departments, and linguistics departments.
Speech recognition
and speech synthesis
deal with how spoken language can be understood or created using computers. Parsing and generation are sub-divisions of computational linguistics dealing respectively with taking language apart and putting it together. Machine translation remains the sub-division of computational linguistics dealing with having computers translate between languages.
Some of the areas of research that are studied by computational linguistics include:
The Association for Computational Linguistics
defines computational linguistics as:
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
from a computational perspective.
Traditionally, computational linguistics was usually performed by computer scientist
Computer scientist
A computer scientist is a scientist who has acquired knowledge of computer science, the study of the theoretical foundations of information and computation and their application in computer systems....
s who had specialized in the application of computers to the processing of a natural language
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
. Computational linguists often work as members of interdisciplinary teams, including linguists (specifically trained in linguistics), language experts (persons with some level of ability in the languages relevant to a given project), and computer scientists. In general, computational linguistics draws upon the involvement of linguists
Linguistics
Linguistics is the scientific study of human language. Linguistics can be broadly broken into three categories or subfields of study: language form, language meaning, and language in context....
, computer scientists
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, experts in artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
, mathematicians, logic
Logic
In philosophy, Logic is the formal systematic study of the principles of valid inference and correct reasoning. Logic is used in most intellectual activities, but is studied primarily in the disciplines of philosophy, mathematics, semantics, and computer science...
ians, philosophers, cognitive scientists
Cognitive science
Cognitive science is the interdisciplinary scientific study of mind and its processes. It examines what cognition is, what it does and how it works. It includes research on how information is processed , represented, and transformed in behaviour, nervous system or machine...
, cognitive psychologists
Cognitive psychology
Cognitive psychology is a subdiscipline of psychology exploring internal mental processes.It is the study of how people perceive, remember, think, speak, and solve problems.Cognitive psychology differs from previous psychological approaches in two key ways....
, psycholinguists
Psycholinguistics
Psycholinguistics or psychology of language is the study of the psychological and neurobiological factors that enable humans to acquire, use, comprehend and produce language. Initial forays into psycholinguistics were largely philosophical ventures, due mainly to a lack of cohesive data on how the...
, anthropologists
Anthropology
Anthropology is the study of humanity. It has origins in the humanities, the natural sciences, and the social sciences. The term "anthropology" is from the Greek anthrōpos , "man", understood to mean mankind or humanity, and -logia , "discourse" or "study", and was first used in 1501 by German...
and neuroscientists
Neuroscience
Neuroscience is the scientific study of the nervous system. Traditionally, neuroscience has been seen as a branch of biology. However, it is currently an interdisciplinary science that collaborates with other fields such as chemistry, computer science, engineering, linguistics, mathematics,...
, among others.
Computational linguistics has applied and theoretical components, where theoretical computational linguistics takes up issues in theoretical linguistics and cognitive science and applied computational linguistics focuses on the practical outcome of modelling human language use.
Origins
Computational linguistics as a field predates artificial intelligenceArtificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
, a field under which it is often grouped. Computational linguistics originated with efforts in the United States in the 1950s to use computers to automatically translate texts from foreign languages, particularly Russian scientific journals, into English. Since computers can make arithmetic
Arithmetic
Arithmetic or arithmetics is the oldest and most elementary branch of mathematics, used by almost everyone, for tasks ranging from simple day-to-day counting to advanced science and business calculations. It involves the study of quantity, especially as the result of combining numbers...
calculations much faster and more accurately than humans, it was thought to be only a short matter of time before the technical details could be taken care of that would allow them the same remarkable capacity to process language.
When machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
(also known as mechanical translation) failed to yield accurate translations right away, automated processing of human languages was recognized as far more complex than had originally been assumed. Computational linguistics was born as the name of the new field of study devoted to developing algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s and software for intelligently processing language data. When artificial intelligence came into existence in the 1960s, the field of computational linguistics became that sub-division of artificial intelligence dealing with human-level comprehension and production of natural languages.
In order to translate one language into another, it was observed that one had to understand the grammar
Grammar
In linguistics, grammar is the set of structural rules that govern the composition of clauses, phrases, and words in any given natural language. The term refers also to the study of such rules, and this field includes morphology, syntax, and phonology, often complemented by phonetics, semantics,...
of both languages, including both morphology
Morphology (linguistics)
In linguistics, morphology is the identification, analysis and description, in a language, of the structure of morphemes and other linguistic units, such as words, affixes, parts of speech, intonation/stress, or implied context...
(the grammar of word forms) and syntax
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
(the grammar of sentence structure). In order to understand syntax, one had to also understand the semantics
Semantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
and the lexicon
Lexicon
In linguistics, the lexicon of a language is its vocabulary, including its words and expressions. A lexicon is also a synonym of the word thesaurus. More formally, it is a language's inventory of lexemes. Coined in English 1603, the word "lexicon" derives from the Greek "λεξικόν" , neut...
(or 'vocabulary'), and even to understand something of the pragmatics
Pragmatics
Pragmatics is a subfield of linguistics which studies the ways in which context contributes to meaning. Pragmatics encompasses speech act theory, conversational implicature, talk in interaction and other approaches to language behavior in philosophy, sociology, and linguistics. It studies how the...
of language use. Thus, what started as an effort to translate between languages evolved into an entire discipline devoted to understanding how to represent and process natural languages using computers.
Nowadays research within the scope of computational linguistics is done at computational linguistics departments, computational linguistics laboratories, computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
departments, and linguistics departments.
Subfields
Computational linguistics can be divided into major areas depending upon the medium of the language being processed, whether spoken or textual; and upon the task being performed, whether analyzing language (recognition) or synthesizing language (generation).Speech recognition
Speech recognition
Speech recognition converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software...
and speech synthesis
Speech synthesis
Speech synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech synthesizer, and can be implemented in software or hardware...
deal with how spoken language can be understood or created using computers. Parsing and generation are sub-divisions of computational linguistics dealing respectively with taking language apart and putting it together. Machine translation remains the sub-division of computational linguistics dealing with having computers translate between languages.
Some of the areas of research that are studied by computational linguistics include:
- Computational complexityComputational ComplexityComputational Complexity may refer to:*Computational complexity theory*Computational Complexity...
of natural language, largely modeled on automata theoryAutomata theoryIn theoretical computer science, automata theory is the study of abstract machines and the computational problems that can be solved using these machines. These abstract machines are called automata...
, with the application of context-sensitive grammarContext-sensitive grammarA context-sensitive grammar is a formal grammar in which the left-hand sides and right-hand sides of any production rules may be surrounded by a context of terminal and nonterminal symbols...
and linearly-boundedLinear bounded automatonIn computer science, a linear bounded automaton is a restricted form of nondeterministic Turing machine.-Operation:Linear bounded automata satisfy the following three conditions:...
Turing machineTuring machineA Turing machine is a theoretical device that manipulates symbols on a strip of tape according to a table of rules. Despite its simplicity, a Turing machine can be adapted to simulate the logic of any computer algorithm, and is particularly useful in explaining the functions of a CPU inside a...
s. - Computational semanticsComputational SemanticsComputational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions...
comprises defining suitable logics for linguistic meaningLinguistic meaningThe nature of meaning, its definition, elements, and types, was discussed by philosophers Aristotle, Augustine, and Aquinas. According to them 'meaning is a relationship between two sorts of things: signs and the kinds of things they mean '. One term in the relationship of meaning necessarily...
representation, automatically constructing them and reasoning with them - Computer-aided corpus linguisticsCorpus linguisticsCorpus linguistics is the study of language as expressed in samples or "real world" text. This method represents a digestive approach to deriving a set of abstract rules by which a natural language is governed or else relates to another language. Originally done by hand, corpora are now largely...
- Design of parsers or chunkersPhrase chunkingPhrase chunking is a natural language process that separates and segments a sentence into its subconstituents, such as noun, verb, and prepositional phrases.-External links:**...
for natural languageNatural languageIn the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
s - Design of taggers like POS-taggers (part-of-speech taggers)Part-of-speech taggingIn corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...
- Machine translationMachine translationMachine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...
as one of the earliest and most difficult applications of computational linguistics draws on many subfields. - Simulation and study of language evolution in historical linguisticsHistorical linguisticsHistorical linguistics is the study of language change. It has five main concerns:* to describe and account for observed changes in particular languages...
/glottochronologyGlottochronologyGlottochronology is that part of lexicostatistics dealing with the chronological relationship between languages....
.
The Association for Computational Linguistics
Association for Computational Linguistics
The Association for Computational Linguistics is the international scientific and professional society for people working on problems involving natural language and computation. An annual meeting is held each summer in locations where significant computational linguistics research is carried out...
defines computational linguistics as:
- ...the scientific study of languageLanguageLanguage may refer either to the specifically human capacity for acquiring and using complex systems of communication, or to a specific instance of such a system of complex communication...
from a computational perspective. Computational linguists are interested in providing computational modelComputational modelA computational model is a mathematical model in computational science that requires extensive computational resources to study the behavior of a complex system by computer simulation. The system under study is often a complex nonlinear system for which simple, intuitive analytical solutions are...
s of various kinds of linguistic phenomena.
See also
- Association for Computational LinguisticsAssociation for Computational LinguisticsThe Association for Computational Linguistics is the international scientific and professional society for people working on problems involving natural language and computation. An annual meeting is held each summer in locations where significant computational linguistics research is carried out...
- Collostructional analysisCollostructional analysisCollostructional analysis is a family of methods developed by Stefan Th. Gries and...
- Computational lexicologyComputational lexicologyComputational lexicology is that branch of computational linguistics, which is concerned with the use of computers in the study of lexicon. It has been more narrowly described by some scholars as the use of computers in the study of machine-readable dictionaries...
- Computational Linguistics (journal)Computational Linguistics (journal)Computational Linguistics is a peer-reviewed academic journal in the field of computational linguistics. It is published quarterly by MIT Press for the Association for Computational Linguistics...
- Computational scienceComputational scienceComputational science is the field of study concerned with constructing mathematical models and quantitative analysis techniques and using computers to analyze and solve scientific problems...
- Computational semioticsComputational semioticsComputational semiotics is an interdisciplinary field that applies, conducts, and draws on research in logic, mathematics, the theory and practice of computation, formal and natural language studies, the cognitive sciences generally, and semiotics proper...
- Computer-assisted reviewingComputer-assisted reviewingComputer-assisted reviewing tools are pieces of software based on text-comparison and analysis algorithms. These tools focus on the differences between two documents, taking into account each document's typeface through an intelligent analysis....
- Dialog systems
- Grammar inductionGrammar InductionGrammatical induction, also known as grammatical inference or syntactic pattern recognition, refers to the process in machine learning of learning a formal grammar from a set of observations, thus constructing a model which accounts for the characteristics of the observed objects...
- Human speechome projectHuman speechome projectThe Human Speechome Project, , is an effort to closely observe and model the language acquisition of a child over the first three years of life....
- Internet linguisticsInternet linguisticsInternet linguistics is a sub-domain of linguistics advocated by David Crystal. It studies new language styles and forms that have arisen under the influence of the Internet and other New Media, such as Short Message Service text messaging...
- National Centre for Text MiningNational Centre for Text MiningThe National Centre for Text Mining was the world’s first publicly funded text mining centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response...
- Natural language processingNatural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
- North American Computational Linguistics OlympiadNorth American Computational Linguistics OlympiadThe North American Computational Linguistics Olympiad is a linguistics competition for high school students in the United States and Canada that has been held since 2007. Around 1000 students participate annually. Since 2008 the contest has consisted of two rounds, the second being administered...
- Quantitative linguisticsQuantitative linguisticsQuantitative linguistics is a sub-discipline of general linguistics and, more specifically, of mathematical linguistics. Quantitative Linguistics deals with language learning, language change, and application as well as structure of natural languages...
- Semantic relatedness
- Systemic functional linguistics
- Translation memoryTranslation memoryA translation memory, or TM, is a database that stores so-called "segments", which can be sentences or sentence-like units that have previously been translated. A translation memory system stores the words, phrases and paragraphs that have already been translated, in order to aid human translators...
- Ubiquitous Knowledge Processing LabUbiquitous Knowledge Processing LabUbiquitous Knowledge Processing Lab is a research lab in the Department of Computer Science at the Technische Universität Darmstadt. It was founded in 2006 by Prof. Dr...
- Universal Networking LanguageUniversal Networking LanguageUniversal Networking Language is a declarative formal language specifically designed to represent semantic data extracted from natural language texts...
External links
- Association for Computational Linguistics (ACL)
- CICLing annual conferences on Computational Linguistics
- Computational Linguistics – Applications workshop
- Free online introductory book on Computational Linguistics (Internet Archive copy)
- Language Technology World
- Resources for Text, Speech and Language Processing