CTAKES
Encyclopedia
cTAKES: clinical Text Analysis and Knowledge Extraction System is an open-source natural language processing system for information extraction from electronic medical record clinical free-text. It processes clinical notes, identifying types of clinical named entities — drugs, diseases/disorders, signs/symptoms, anatomical sites and procedures. Each named entity has attributes for the text span, the ontology mapping code, context (family history of, current, unrelated to patient), and negated/not negated.
cTAKES was built using the UIMA Unstructured Information Management Architecture framework and OpenNLP
natural language processing toolkit. Its components are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems & clinical research.
These components include:
. The development team was led by Dr. Guergana Savova & Dr. Christopher Chute. This system was deployed at Mayo and is currently an integral part of their clinical data management infrastructure and has processed in excess of 80 million clinical notes.
Currently, the core development team is co-located at Mayo Clinic and Children's Hospital Boston
following Dr. Savova's move to Children's Hospital Boston in early 2010. Additional collaborations with external groups at University of Colorado, Brandeis University, University of Pittsburgh, University of California at San Diego continue to extend the capabilities of cTAKES into areas such Temporal Reasoning, Clinical Question and Answering, and coreference resolution for the clinical domain.
In 2010, cTAKES was adopted by the I2B2 program and is a central component of the SHARP Area 4
cTAKES was built using the UIMA Unstructured Information Management Architecture framework and OpenNLP
OpenNLP
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks...
natural language processing toolkit. Its components are specifically trained for the clinical domain, and create rich linguistic and semantic annotations that can be utilized by clinical decision support systems & clinical research.
These components include:
- Sentence boundary detector
- Rule-based tokenizer to separate punctuations from words
- Normalizer
- Context dependent tokenizer
- Part-of-speech tagger
- Phrasal chunker
- Dictionary lookup annotator
- Context annotator
- Negation detector
- Dependency parser
- Module for the identification of patient smoking status
- Drug mention annotator
History
The development of cTAKES started in 2006 by a team of physicians, computer scientists and software engineers at the Mayo ClinicMayo Clinic
Mayo Clinic is a not-for-profit medical practice and medical research group specializing in treating difficult patients . Patients are referred to Mayo Clinic from across the U.S. and the world, and it is known for innovative and effective treatments. Mayo Clinic is known for being at the top of...
. The development team was led by Dr. Guergana Savova & Dr. Christopher Chute. This system was deployed at Mayo and is currently an integral part of their clinical data management infrastructure and has processed in excess of 80 million clinical notes.
Currently, the core development team is co-located at Mayo Clinic and Children's Hospital Boston
Children's Hospital Boston
Children's Hospital Boston is a 396-licensed bed children's hospital in the Longwood Medical and Academic Area of Boston, Massachusetts.At 300 Longwood Avenue, Children's is adjacent both to its teaching affiliate, Harvard Medical School, and to Dana-Farber Cancer Institute...
following Dr. Savova's move to Children's Hospital Boston in early 2010. Additional collaborations with external groups at University of Colorado, Brandeis University, University of Pittsburgh, University of California at San Diego continue to extend the capabilities of cTAKES into areas such Temporal Reasoning, Clinical Question and Answering, and coreference resolution for the clinical domain.
In 2010, cTAKES was adopted by the I2B2 program and is a central component of the SHARP Area 4
External links
- Abstract (JAMIA)
- Strategic Health IT Advanced Research Projects (SHARP) Program
- OpenNLPOpenNLPThe Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks...
- UIMAUimaUIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics....
- The Automated Retrieval Console (ARC)
- Informatics for Integrating Biology and the Bedside