OpenNLP
Encyclopedia
The Apache
OpenNLP library is a machine learning
based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization
, sentence segmentation
, part-of-speech tagging
, named entity extraction
, chunking
, parsing, and coreference resolution
. These tasks are usually required to build more advanced text processing services.
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...
OpenNLP library is a machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...
based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization
Tokenization
Tokenization is the process of breaking a stream of text up into words, phrases, symbols, or other meaningful elements called tokens. The list of tokens becomes input for further processing such as parsing or text mining...
, sentence segmentation
Sentence boundary disambiguation
Sentence boundary disambiguation , also known as sentence breaking, is the problem in natural language processing of deciding where sentences begin and end. Often natural language processing tools require their input to be divided into sentences for a number of reasons. However sentence boundary...
, part-of-speech tagging
Part-of-speech tagging
In corpus linguistics, part-of-speech tagging , also called grammatical tagging or word-category disambiguation, is the process of marking up a word in a text as corresponding to a particular part of speech, based on both its definition, as well as its context—i.e...
, named entity extraction
Named entity recognition
Named-entity recognition is a subtask of information extraction that seeks to locate and classify atomic elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc.Most research on NER...
, chunking
Shallow parsing
Shallow parsing is an analysis of a sentence which identifies the constituents , but does not specify their internal structure, nor their role in the main sentence....
, parsing, and coreference resolution
Coreference
In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing; or in linguistic jargon, they have the same "referent."...
. These tasks are usually required to build more advanced text processing services.
See also
- Unstructured Information Management Architecture (UIMA)
- General Architecture for Text EngineeringGeneral Architecture for Text EngineeringGeneral Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including...
(GATE) - cTAKESCTAKEScTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source natural language processing system for information extraction from electronic medical record clinical free-text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders,...