ETBLAST
Encyclopedia
eTBLAST is a free text similarity service search engine
currently offering access to the MEDLINE
database, the National Institutes of Health
(NIH) CRISP
database, the Institute of Physics
(IOP) database, Wikipedia
, arXiv
, the NASA
technical reports database, Virginia Tech class descriptions and a variety of databases of clinical interest. It is continuously expanding with additional text-based databases. eTBLAST searches citation databases and databases containing full text, such as PUBMED
. The eTBLAST server compares a user's natural text query to target databases using a hybrid search algorithm consisting of a low-sensitivity weighted keyword-based first pass followed by a novel sentence-alignment based second pass. eTBLAST is a free web-based service of The Innovation Laboratory at the Virginia Bioinformatics Institute
.
eTBLAST, as a text similarity engine, made possible a large study of duplicate publications and potential plagiarisms in the biomedical literature. Thousands of random samples of Medline abstracts were submitted to eTBLAST, and those with the highest similarity were studied and entered into a on-line database. This study is on-going, with the database maturing as the entries are manually inspected and classified. This work revealed several trends, including an increasing rate of duplication in the biomedical literature, as reported in the journals Bioinformatics, Anaesthesia and Intensive Care, Clinical Chemistry, Urologic Oncology, Nature, and Science.
eTBLAST aims to help the user rapidly to find references, evaluate novelty, find experts and journals in a given topical area and track the popularity of the topic as defined by the user’s query. There also is information found within the results as a set, in addition to those found within individual 'hits'. eTBLAST can also infer possible hypothese from inspection of implicit keywords found within the top most similar 'hits'. A matrix of similarity and a heat map are also displayed for the most similar 'hits'.
A typical query of 120 words takes less than 10 seconds to return results after a comparison to MEDLINE
that as of 8/1/2011 contains over 20 million records.
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
currently offering access to the MEDLINE
MEDLINE
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care...
database, the National Institutes of Health
National Institutes of Health
The National Institutes of Health are an agency of the United States Department of Health and Human Services and are the primary agency of the United States government responsible for biomedical and health-related research. Its science and engineering counterpart is the National Science Foundation...
(NIH) CRISP
CRISP
CRISP may refer to:*C-language Reduced Instruction Set Processor*Chesapeake Regional Information System for our Patients*Complementary Randomized Integrated Sensing and Processing...
database, the Institute of Physics
Institute of Physics
The Institute of Physics is a scientific charity devoted to increasing the practice, understanding and application of physics. It has a worldwide membership of around 40,000....
(IOP) database, Wikipedia
Wikipedia
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Its 20 million articles have been written collaboratively by volunteers around the world. Almost all of its articles can be edited by anyone with access to the site,...
, arXiv
ArXiv
The arXiv |Chi]], χ) is an archive for electronic preprints of scientific papers in the fields of mathematics, physics, astronomy, computer science, quantitative biology, statistics, and quantitative finance which can be accessed online. In many fields of mathematics and physics, almost all...
, the NASA
NASA
The National Aeronautics and Space Administration is the agency of the United States government that is responsible for the nation's civilian space program and for aeronautics and aerospace research...
technical reports database, Virginia Tech class descriptions and a variety of databases of clinical interest. It is continuously expanding with additional text-based databases. eTBLAST searches citation databases and databases containing full text, such as PUBMED
PubMed
PubMed is a free database accessing primarily the MEDLINE database of references and abstracts on life sciences and biomedical topics. The United States National Library of Medicine at the National Institutes of Health maintains the database as part of the Entrez information retrieval system...
. The eTBLAST server compares a user's natural text query to target databases using a hybrid search algorithm consisting of a low-sensitivity weighted keyword-based first pass followed by a novel sentence-alignment based second pass. eTBLAST is a free web-based service of The Innovation Laboratory at the Virginia Bioinformatics Institute
Virginia Bioinformatics Institute
The Virginia Bioinformatics Institute at Virginia Tech is a bioinformatics, computational biology, and systems biology research facility that uses transdisciplinary approaches combining information technology, biology and medicine to interpret and apply vast amounts of biological data generated...
.
eTBLAST, as a text similarity engine, made possible a large study of duplicate publications and potential plagiarisms in the biomedical literature. Thousands of random samples of Medline abstracts were submitted to eTBLAST, and those with the highest similarity were studied and entered into a on-line database. This study is on-going, with the database maturing as the entries are manually inspected and classified. This work revealed several trends, including an increasing rate of duplication in the biomedical literature, as reported in the journals Bioinformatics, Anaesthesia and Intensive Care, Clinical Chemistry, Urologic Oncology, Nature, and Science.
Interface
Because eTBLAST is a text-similarity engine rather than a simple keyword-based search tool, it is claimed that the user need not identify and manipulate query keywords and Boolean operators, as must be done for other search engines.eTBLAST aims to help the user rapidly to find references, evaluate novelty, find experts and journals in a given topical area and track the popularity of the topic as defined by the user’s query. There also is information found within the results as a set, in addition to those found within individual 'hits'. eTBLAST can also infer possible hypothese from inspection of implicit keywords found within the top most similar 'hits'. A matrix of similarity and a heat map are also displayed for the most similar 'hits'.
A typical query of 120 words takes less than 10 seconds to return results after a comparison to MEDLINE
MEDLINE
MEDLINE is a bibliographic database of life sciences and biomedical information. It includes bibliographic information for articles from academic journals covering medicine, nursing, pharmacy, dentistry, veterinary medicine, and health care...
that as of 8/1/2011 contains over 20 million records.