SemEval
Encyclopedia
SemEval is an ongoing series of evaluations of computational semantic analysis
systems; it evolved from the Senseval word sense
evaluation series. The evaluations are intended to explore the nature of meaning
in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.
This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word sense
s computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling
), relations between sentences (e.g., coreference
), and the nature of what we are saying (semantic relations and sentiment analysis
).
The purpose of the SemEval exercises and SENSEVAL is to evaluate semantic analysis systems. Semantic Analysis
" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation.
The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation
, each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the fourth workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis
tasks outside of word sense disambiguation.
(WSD) algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”. Only very recently (2006) had extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications. Until 1990 or so, discussions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words.
had revolutionized other areas of NLP, such as part-of-speech tagging
and parsing
, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well. Kilgarriff recalled that there was “a high degree of consensus that the field needed evaluation,” and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises.
]
Stages of SemEval/Senseval evaluation workshops
s and started to evaluate systems that looked into wider areas of semantics, such as Semantic Roles (technically known as Theta roles in formal semantics), Logic Form
Transformation (commonly semantics of phrases, clauses or sentences were represented in first-order logic forms
) and Senseval-3 explored performances of semantics analysis on Machine Translations
.
As the types of different computational semantic systems grew beyond the coverage of WSD, Senseval evolved into SemEval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to SemEval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/SemEval workshops.
The first major area in semantic analysis is the identification of the intended meaning at the word level (taken to include idiomatic expressions). This is word-sense disambiguation (a concept that is evolving away from the notion that words have discrete senses, but rather are characterized by the ways in which they are used, i.e., their contexts). The tasks in this area include lexical sample and all-word disambiguation, multi- and cross-lingual disambiguation, and lexical substitution. Given the difficulties of identifying word senses, other tasks relevant to this topic include word-sense induction, subcategorization acquisition, and evaluation of lexical resources.
The second major area in semantic analysis is the understanding of how different sentence and textual elements fit together. Tasks in this area include semantic role labeling, semantic relation analysis, and coreference resolution. Other tasks in this area look at more specialized issues of semantic analysis, such as temporal information processing, metonymy resolution, and sentiment analysis. The tasks in this area have many potential applications, such as information extraction, question answering, document summarization, machine translation, construction of thesauri and semantic networks, language modeling, paraphrasing,
and recognizing textual entailment. In each of these potential applications, the contribution of the types of semantic analysis constitutes the most outstanding research issue.
and computerized dictionary. Senseval-3 looked beyond the lexemes
and started to evaluate systems that looked into wider areas of semantics, viz. Semantic Roles (technically known as Theta roles
in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences were represented in first-order logic forms) and Senseval-3 explored performances of semantics analysis on Machine Translations.
As the types of different computational semantic systems grew beyond the coverage of WSD, Senseval evolved into SemEval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to SemEval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/SemEval workshops.
. This list is expected to grow as the field progresses. The following table shows the areas of studies that were involved in Senseval-1 through SemEval-2010:
Semantic analysis (computational)
Semantic Analysis is a composite of the "Semantic Analysis" and the "Computational" components. "Semantic Analysis" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation..-Further reading:* Chris Fox , Computational...
systems; it evolved from the Senseval word sense
Sense
Senses are physiological capacities of organisms that provide inputs for perception. The senses and their operation, classification, and theory are overlapping topics studied by a variety of fields, most notably neuroscience, cognitive psychology , and philosophy of perception...
evaluation series. The evaluations are intended to explore the nature of meaning
Meaning (linguistics)
In linguistics, meaning is what is expressed by the writer or speaker, and what is conveyed to the reader or listener, provided that they talk about the same thing . In other words if the object and the name of the object and the concepts in their head are the same...
in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.
This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word sense
Word sense
In linguistics, a word sense is one of the meanings of a word.For example a dictionary may have over 50 different meanings of the word , each of these having a different meaning based on the context of the word usage in a sentence...
s computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling
Semantic Role Labeling
Semantic role labeling is a task in natural language processing consisting of the detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles.-References:...
), relations between sentences (e.g., coreference
Coreference
In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent."...
), and the nature of what we are saying (semantic relations and sentiment analysis
Sentiment analysis
Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials....
).
The purpose of the SemEval exercises and SENSEVAL is to evaluate semantic analysis systems. Semantic Analysis
Semantic analysis (linguistics)
In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of phrases, clauses, sentences and paragraphs to the level of the writing as a whole, to their language-independent meanings. It also involves removing features specific to particular linguistic and...
" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation.
The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation
Word sense disambiguation
In computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...
, each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the fourth workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis
Semantic analysis (linguistics)
In linguistics, semantic analysis is the process of relating syntactic structures, from the levels of phrases, clauses, sentences and paragraphs to the level of the writing as a whole, to their language-independent meanings. It also involves removing features specific to particular linguistic and...
tasks outside of word sense disambiguation.
Early evaluation of algorithms for word sense disambiguation
From the earliest days, assessing the quality of word sense disambiguationWord sense disambiguation
In computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...
(WSD) algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”. Only very recently (2006) had extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications. Until 1990 or so, discussions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words.
Senseval to SemEval
In April 1997, a workshop entitled Tagging with Lexical Semantics: Why, What, and How? was held in conjunction with the Conference on Applied Natural Language Processing. At the time, there was a clear recognition that manually annotated corporaText corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...
had revolutionized other areas of NLP, such as part-of-speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...
and parsing
Parsing
In computer science and linguistics, parsing, or, more formally, syntactic analysis, is the process of analyzing a text, made of a sequence of tokens , to determine its grammatical structure with respect to a given formal grammar...
, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well. Kilgarriff recalled that there was “a high degree of consensus that the field needed evaluation,” and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises.
List of Senseval and SemEval Workshops
- Senseval-1 took place in the summer of 1998 for English, French, and Italian, culminating in a workshop held at Herstmonceux Castle, Sussex, England on September 2–4.
- Senseval-2 took place in the summer of 2001, and was followed by a workshop held in July 2001 in Toulouse, in conjunction with ACL 2001. Senseval-2 included tasks for Basque, Chinese, Czech, Danish, Dutch, English, Estonian, Italian, Japanese, Korean, Spanish and Swedish.
- Senseval-3 took place in March–April 2004, followed by a workshop held in July 2004 in Barcelona, in conjunction with ACL 2004. Senseval-3 included 14 different tasks for core word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition.
- SemEval-2007 (Senseval-4) took place in 2007, followed by a workshop held in conjunction with ACL in Prague. SemEval-2007 included 18 different tasks targeting the evaluation of systems for the semantic analysis of text. A special issue of Language Resources and Evaluation is devoted to the result.
- SemEval-2010 took place in 2010, followed by a workshop held in conjunction with ACL in Uppsala. SemEval-2010 included 18 different tasks targeting the evaluation of semantic analysis systems.
SemEval Workshop framework
The framework of the SemEval/Senseval evaluation workshops emulates Message Understanding Conferences (MUCs) and other evaluation workshops ran by ARPA (Advanced Research Projects Agency, renamed the Defense Advanced Research Projects Agency (DARPA)).]
Stages of SemEval/Senseval evaluation workshops
- Firstly, all likely participants were invited to express their interest and participate in the exercise design.
- A timetable towards a final workshop was worked out.
- A plan for selecting evaluation materials was agreed.
- 'Gold standards' for the individual tasks were acquired, often human annotators were considered as a gold standard to measure precision and recall scores of computer systems. These 'gold standards' are what the computational systems strive towards. (In WSD tasks, human annotators were set on the task of generating a set of correct WSD answers(i.e. the correct sense for a given word in a given context)
- The gold standard materials, without answers, were released to participants, who then had a short time to run their programs over them and return their sets of answers to the organizers.
- The organizers then scored the answers and the scores were announced and discussed at a workshop
Semantic evaluation tasks
Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemeLexeme
A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as RUN...
s and started to evaluate systems that looked into wider areas of semantics, such as Semantic Roles (technically known as Theta roles in formal semantics), Logic Form
Logic form
Logic forms are simple, first-order logic knowledge representations of natural language sentences formed by the conjunction of concept predicates related through shared arguments. Each noun, verb, adjective, adverb, pronoun, preposition and conjunction generates a predicate. Logic forms can be...
Transformation (commonly semantics of phrases, clauses or sentences were represented in first-order logic forms
First-order logic
First-order logic is a formal logical system used in mathematics, philosophy, linguistics, and computer science. It goes by many names, including: first-order predicate calculus, the lower predicate calculus, quantification theory, and predicate logic...
) and Senseval-3 explored performances of semantics analysis on Machine Translations
Machine Translations
Machine Translations is the recording and touring name of J Walker, an Australian singer, songwriter and multi-instrumentalist. J Walker started out recording all instruments himself in a home studio, but has now branched out to include a band in his recent works...
.
As the types of different computational semantic systems grew beyond the coverage of WSD, Senseval evolved into SemEval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to SemEval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/SemEval workshops.
Overview of Issues in Semantic Analysis
The SemEval exercises provide a mechanism for examining issues in semantic analysis of texts. The topics of interest fall short of the logical rigor that is found in formal computational semantics, attempting to identify and characterize the kinds of issues relevant to human understanding of language. The primary goal is to replicate human processing by means of computer systems. The tasks (shown below) are developed by individuals and groups to deal with identifiable issues, as they take on some concrete form.The first major area in semantic analysis is the identification of the intended meaning at the word level (taken to include idiomatic expressions). This is word-sense disambiguation (a concept that is evolving away from the notion that words have discrete senses, but rather are characterized by the ways in which they are used, i.e., their contexts). The tasks in this area include lexical sample and all-word disambiguation, multi- and cross-lingual disambiguation, and lexical substitution. Given the difficulties of identifying word senses, other tasks relevant to this topic include word-sense induction, subcategorization acquisition, and evaluation of lexical resources.
The second major area in semantic analysis is the understanding of how different sentence and textual elements fit together. Tasks in this area include semantic role labeling, semantic relation analysis, and coreference resolution. Other tasks in this area look at more specialized issues of semantic analysis, such as temporal information processing, metonymy resolution, and sentiment analysis. The tasks in this area have many potential applications, such as information extraction, question answering, document summarization, machine translation, construction of thesauri and semantic networks, language modeling, paraphrasing,
and recognizing textual entailment. In each of these potential applications, the contribution of the types of semantic analysis constitutes the most outstanding research issue.
Senseval and SemEval tasks overview
Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpusText corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...
and computerized dictionary. Senseval-3 looked beyond the lexemes
Lexeme
A lexeme is an abstract unit of morphological analysis in linguistics, that roughly corresponds to a set of forms taken by a single word. For example, in the English language, run, runs, ran and running are forms of the same lexeme, conventionally written as RUN...
and started to evaluate systems that looked into wider areas of semantics, viz. Semantic Roles (technically known as Theta roles
Theta role
In generative grammar , a theta role or θ-role is the formal device for representing syntactic argument structure required syntactically by a particular verb. For example, the verb put requires three arguments...
in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences were represented in first-order logic forms) and Senseval-3 explored performances of semantics analysis on Machine Translations.
As the types of different computational semantic systems grew beyond the coverage of WSD, Senseval evolved into SemEval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to SemEval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/SemEval workshops.
Workshop | No. of Tasks | Areas of study | Languages of Data Evaluated |
---|---|---|---|
Senseval-1 | 3 | Word Sense Disambiguation (WSD) - Lexical Sample WSD tasks | English, French, Italian |
Senseval-2 | 12 | Word Sense Disambiguation (WSD) - Lexical Sample, All Words, Translation WSD tasks | Czech, Dutch, English, Estonian, Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish, Swedish |
Senseval-3 | 16 (including 2 cancelled tasks) | Logic Form Transformation, Machine Translation (MT) Evaluation, Semantic Role Labelling, WSD | Basque, Catalan, Chinese, English, Italian, Romanian, Spanish |
SemEval-2007 | 19 (including 1 cancelled task) | Cross-lingual, Frame Extraction, Information Extraction, Lexical Substitution, Lexical Sample, Metonymy, Semantic Annotation, Semantic Relations, Semantic Role Labelling, Sentiment Analysis, Time Expression, WSD | Arabic, Catalan, Chinese, English, Spanish, Turkish |
SemEval-2010 | 18 (including 1 cancelled task) | Coreference, Cross-lingual, Ellipsis, Information Extraction, Lexical Substitution, Metonymy, Noun Compounds, Parsing, Semantic Relations, Semantic Role Labeling, Sentiment Analysis, Textual Entailment, Time Expressions, WSD | Catalan, Chinese, Dutch, English, French, German, Italian, Japanese, Spanish |
Areas of evaluation
The major tasks in semantic evaluation include the following areas of natural language processingNatural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
. This list is expected to grow as the field progresses. The following table shows the areas of studies that were involved in Senseval-1 through SemEval-2010:
Areas of Study | Senseval-1 | Senseval-2 | Senseval-3 | SemEval-2007 | SemEval-2010 |
---|---|---|---|---|---|
Coreference Coreference In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent."... Resolution |
✓ | ||||
Multi-lingual or Cross-lingual Lexical Substitution | ✓ | ✓ | ✓ | ||
Ellipsis Ellipsis (linguistics) In linguistics, ellipsis or elliptical construction refers to the omission from a clause of one or more words that would otherwise be required by the remaining elements.-Overview:... |
✓ | ||||
Keyphrase Extraction (Information Extraction Information extraction Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language... ) |
✓ | ||||
Metonymy Metonymy Metonymy is a figure of speech used in rhetoric in which a thing or concept is not called by its own name, but by the name of something intimately associated with that thing or concept... (Information Extraction Information extraction Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language... ) |
✓ | ✓ | |||
Noun Compounds Compound (linguistics) In linguistics, a compound is a lexeme that consists of more than one stem. Compounding or composition is the word formation that creates compound lexemes... (Information Extraction Information extraction Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language... ) |
✓ | ||||
Semantic Relation Identification | ✓ | ✓ | |||
Semantic Role Labeling Semantic Role Labeling Semantic role labeling is a task in natural language processing consisting of the detection of the semantic arguments associated with the predicate or verb of a sentence and their classification into their specific roles.-References:... |
✓ | ✓ | ✓ | ||
Sentimental Analysis Sentiment analysis Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials.... |
✓ | ✓ | |||
Time Expression Temporal expressions A temporal expression in a text is a sequence of tokens that denote time, that is express a point in time, a duration or a frequency.Examples:-External links:... |
✓ | ✓ | |||
Textual Entailment | ✓ | ||||
Word sense disambiguation (Lexical Sample) | ✓ | ✓ | ✓ | ✓ | ✓ |
Word sense disambiguation (All-Words) | ✓ | ✓ | ✓ | ✓ | |
Word sense induction Word sense induction In computational linguistics, word-sense induction or discrimination is an open problem of natural language processing, which concerns the automatic identification of the senses of a word... |
✓ | ✓ |
See also
- Computational semanticsComputational SemanticsComputational semantics is the study of how to automate the process of constructing and reasoning with meaning representations of natural language expressions...
- Natural language processingNatural language processingNatural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
- Word senseWord senseIn linguistics, a word sense is one of the meanings of a word.For example a dictionary may have over 50 different meanings of the word , each of these having a different meaning based on the context of the word usage in a sentence...
- Word sense disambiguationWord sense disambiguationIn computational linguistics, word-sense disambiguation is an open problem of natural language processing, which governs the process of identifying which sense of a word is used in a sentence, when the word has multiple meanings...
- Semantic analysis (computational)Semantic analysis (computational)Semantic Analysis is a composite of the "Semantic Analysis" and the "Computational" components. "Semantic Analysis" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation..-Further reading:* Chris Fox , Computational...
External links
- Special Interest Group on the Lexicon (SIGLEX) of the Association for Computational Linguistics (ACL)
- Semeval-2010 - Semantic Evaluation Workshop (endorsed by SIGLEX)
- Senseval - international organization devoted to the evaluation of Word Sense Disambiguation Systems (endorsed by SIGLEX)
- SemEval Portal on the Wiki of the Association for Computational Linguistics
- Senseval / SemEval tasks:
- Senseval-1 - the first evaluation exercise on word sense disambiguation systems; the lexical-sample task was evaluated on English, French and Italian
- Senseval-2 - evaluated word sense disambiguation systems on three types of tasks (the all-words, lexical-sample and the translation task)
- Senseval-3 - included tasks for word sense disambiguation, as well as identification of semantic roles, multilingual annotations, logic forms, subcategorization acquisition.
- SemEval-2007 - included tasks which were more elaborate than Senseval as it crosses the different areas of studies in Natural Language Processing
- SemEval-2010 - added tasks that were from new areas of studies in computational semantics, viz., Coreference, Elipsis, Keyphrase Extraction, Noun Compounds and Textual Entailment.