Lexical choice
Encyclopedia
Lexical choice is a subtask of Natural language generation
, which involves choosing the content words (nouns, verbs, adjectives, adverbs) in a generated text. Function words (determiners, for example) are usually chosen during Realisation
.
A more complex situation is when a domain concept is expressed using different words in different situations. For example, the domain concept Value-Change can be expressed in many ways
Sometimes words can communicate additional contextual information, for example
Contextual information is especially significant for vague terms such as tall. For example, a 2m tall man is tall, but a 2m tall horse is small.
, but it is also influenced by syntactic
factors (such as collocation
effects) and pragmatic
factors (such as context).
Hence NLG systems need linguistic models
of how meaning is mapped to words in the target domain (genre
) of the NLG system. Genre tends to be very important; for example the verb veer has a very specific meaning in weather forecasts (wind direction is changing in a clockwise direction) which it does not have in general English, and a weather-forecast generator must be aware of this genre-specific meaning.
In some cases there are major differences in how different people use the same word ; for example, some people use by evening to mean 6PM and others use it to mean midnight. Psycholinguists have shown that when people speak to each other, they agree on a common interpretation via lexical alignment ; this is not something which NLG systems can yet do.
Ultimately, lexical choice must deal with the fundamental issue of how language relates to the non-linguistic world . For example, a system which chose colour terms such as red to describe objects in a digital image would need to know which RGB pixel values could generally be described as red; how this was influenced by visual (lighting, other objects in the scene) and linguistic (other objects being discussed) context; what pragmatic connotations were associated with red (for example, when an apple is called red, it is assumed to be ripe as well as have the colour red); and so forth.
Natural language generation
Natural Language Generation is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form...
, which involves choosing the content words (nouns, verbs, adjectives, adverbs) in a generated text. Function words (determiners, for example) are usually chosen during Realisation
Realization (linguistics)
Realisation is a subtask of Natural language generation, which involvescreating an actual text in a human language from a syntacticrepresentation...
.
Examples
The simplest type of lexical choice involves mapping a domain concept (perhaps represented in an ontology) to a word. For example, the concept Finger might be mapped to the word finger.A more complex situation is when a domain concept is expressed using different words in different situations. For example, the domain concept Value-Change can be expressed in many ways
- The temperature rose: the verb rose is used for a Value-Change in temperature which increases the value
- The temperature fell: the verb fell is used for a Value-Change in temperature which decreases the value
- The rain got heavier: the phrase got heavier is used for a Value-Change in precipitation amount when the precipitation is rain.
Sometimes words can communicate additional contextual information, for example
- The temperature plummeted: the verb plummeted is used for a Value-Change in temperature which decreases the value, when the change is rapid and large
Contextual information is especially significant for vague terms such as tall. For example, a 2m tall man is tall, but a 2m tall horse is small.
Linguistic perspective
Lexical choice modules must be informed by linguistic knowledge of how the system's input data maps onto words. This is a question of semanticsSemantics
Semantics is the study of meaning. It focuses on the relation between signifiers, such as words, phrases, signs and symbols, and what they stand for, their denotata....
, but it is also influenced by syntactic
Syntax
In linguistics, syntax is the study of the principles and rules for constructing phrases and sentences in natural languages....
factors (such as collocation
Collocation
In corpus linguistics, collocation defines a sequence of words or terms that co-occur more often than would be expected by chance. In phraseology, collocation is a sub-type of phraseme. An example of a phraseological collocation is the expression strong tea...
effects) and pragmatic
Pragmatics
Pragmatics is a subfield of linguistics which studies the ways in which context contributes to meaning. Pragmatics encompasses speech act theory, conversational implicature, talk in interaction and other approaches to language behavior in philosophy, sociology, and linguistics. It studies how the...
factors (such as context).
Hence NLG systems need linguistic models
of how meaning is mapped to words in the target domain (genre
Genre
Genre , Greek: genos, γένος) is the term for any category of literature or other forms of art or culture, e.g. music, and in general, any type of discourse, whether written or spoken, audial or visual, based on some set of stylistic criteria. Genres are formed by conventions that change over time...
) of the NLG system. Genre tends to be very important; for example the verb veer has a very specific meaning in weather forecasts (wind direction is changing in a clockwise direction) which it does not have in general English, and a weather-forecast generator must be aware of this genre-specific meaning.
In some cases there are major differences in how different people use the same word ; for example, some people use by evening to mean 6PM and others use it to mean midnight. Psycholinguists have shown that when people speak to each other, they agree on a common interpretation via lexical alignment ; this is not something which NLG systems can yet do.
Ultimately, lexical choice must deal with the fundamental issue of how language relates to the non-linguistic world . For example, a system which chose colour terms such as red to describe objects in a digital image would need to know which RGB pixel values could generally be described as red; how this was influenced by visual (lighting, other objects in the scene) and linguistic (other objects being discussed) context; what pragmatic connotations were associated with red (for example, when an apple is called red, it is assumed to be ripe as well as have the colour red); and so forth.