National Centre for Text Mining
Encyclopedia
The National Centre for Text Mining (NaCTeM) was the world’s first publicly funded text mining
Text mining
Text mining, sometimes alternately referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as...

 (TM) centre. It was established to provide support, advice, and information on TM technologies and to disseminate information from the larger TM community, while also providing tailored services and tools in response to the requirements of the United Kingdom
United Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...

 academic community.

The software tools and services which NaCTeM supplies allow researchers to apply text mining techniques to problems within their specific areas of interest - examples of these tools are highlighted below. In addition to providing services, the Centre is also involved in, and makes significant contributions to, the text mining research community both nationally and internationally in initiatives such as UK PubMed Central
UK PubMed Central
' is an on-line database that offers free access to a vast and growing collection of biomedical and health research information.-Service:...

.

The Centre is located in the Manchester Interdisciplinary Biocentre
Manchester Interdisciplinary Biocentre
The Manchester Interdisciplinary Biocentre , also called 'The John Garside Building', is a research institute of the University of Manchester, England which has been designed to enable academic communities to explore specific areas of interdisciplinary quantitative bioscience, largely through the...

 and is operated and organized by the University of Manchester School of Computer Science in close collaboration with the Tsujii Lab, University of Tokyo
University of Tokyo
, abbreviated as , is a major research university located in Tokyo, Japan. The University has 10 faculties with a total of around 30,000 students, 2,100 of whom are foreign. Its five campuses are in Hongō, Komaba, Kashiwa, Shirokane and Nakano. It is considered to be the most prestigious university...

. NaCTeM contributes expertise in information extraction
Information extraction
Information extraction is a type of information retrieval whose goal is to automatically extract structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language...

, natural language processing
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....

 and parallel and distributed [data mining] systems.

Services

TerMine is a domain independent method for automatic term recognition which can be used to help locate the most important terms in a document and automatically ranks them.

AcroMine finds all known expanded forms of acronyms as they have appeared in Medline
MEDLINE
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care...

 entries or conversely, it can be used to find possible acronyms of expanded forms as they have previously appeared in Medline
MEDLINE
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care...

 and disambiguates them.

Medie is an intelligent search engine, for semantic retrieval of sentences containing biomedical correlations from Medline
MEDLINE
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care...

 abstracts.

Facta+ is a MEDLINE search engine for finding associations between biomedical concepts.

KLEIO is a faceted semantic information retrieval system based on MEDLINE.

Info-PubMed provides information and graphical representation of biomedical interactions extracted from Medline
MEDLINE
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care...

 using deep semantic parsing technology. This is supplemented with a term dictionary consisting of over 200,000 protein
Protein
Proteins are biochemical compounds consisting of one or more polypeptides typically folded into a globular or fibrous form, facilitating a biological function. A polypeptide is a single linear polymer chain of amino acids bonded together by peptide bonds between the carboxyl and amino groups of...

/gene
Gene
A gene is a molecular unit of heredity of a living organism. It is a name given to some stretches of DNA and RNA that code for a type of protein or for an RNA chain that has a function in the organism. Living beings depend on genes, as they specify all proteins and functional RNA chains...

 names and identification of disease
Disease
A disease is an abnormal condition affecting the body of an organism. It is often construed to be a medical condition associated with specific symptoms and signs. It may be caused by external factors, such as infectious disease, or it may be caused by internal dysfunctions, such as autoimmune...

 types and organisms.

Resources

BioLexicon a large-scale terminological resource for the biomedical domain

GENIA a collection of reference materials for the development of biomedical text mining systems
The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK