CiteSeerX
Encyclopedia
CiteSeerX is a public search engine
and digital library
and repository
for scientific and academic papers with a focus on computer
and information science
. It is loosely based on the previous CiteSeer
search engine and digital library and is built with a new open source
infrastructure, SeerSuite
, and new algorithms and their implementations. It was developed by researchers Dr. Isaac Councill and Dr. C. Lee Giles
at the College of Information Sciences and Technology
, Pennsylvania State University
. It continues to support the goals outlined by CiteSeer to actively crawl and harvest academic and scientific documents on the public web and to use a citation index to permit query by citations and ranking of documents by the impact of citations. Currently, Lee Giles, Prasenjit Mitra, Susan Gauch, Min-Yen Kan, Pradeep Teregowda, Juan Pablo Fernández Ramírez, Pucktada Treeratpituk, and Shuyi Zheng are or have been actively involved in its development. Recently, a table search feature was introduced.
CiteSeerX continues to be rated as one of the world's top repositories and was rated number 1 in July 2010. It currently has over 1.5 million documents with nearly 1.5 million unique authors and 30 million citations.
CiteSeerX also shares its software, data, databases and metadata with other researchers, currently by rsync
. Its new modular open source architecture and software (available on SourceForge
) is built on Apache Solr and other Apache
and open source tools which allows it to be a testbed for new algorithms in document harvesting, ranking, indexing, and information extraction.
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
and digital library
Digital library
A digital library is a library in which collections are stored in digital formats and accessible by computers. The digital content may be stored locally, or accessed remotely via computer networks...
and repository
Repository
Repository commonly refers to a location for storage, often for safety or preservation.Repository may also refer to:* Repository clone, concept from distributed revision control...
for scientific and academic papers with a focus on computer
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
and information science
Information science
-Introduction:Information science is an interdisciplinary science primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval and dissemination of information...
. It is loosely based on the previous CiteSeer
CiteSeer
CiteSeer was a public search engine and digital library for scientific and academic papers. It is often considered to be the first automated citation indexing system and was considered a predecessor of academic search tools such as Google Scholar and Microsoft Academic Search. It was replaced by...
search engine and digital library and is built with a new open source
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
infrastructure, SeerSuite
SeerSuite
SeerSuite refers a to a collection of open source tools that provide the underlying application software for creating academic search engines and digital libraries such as CiteSeerX, ChemXSeer, and ArchSeer...
, and new algorithms and their implementations. It was developed by researchers Dr. Isaac Councill and Dr. C. Lee Giles
Lee Giles
C. Lee Giles is the David Reese Professor at the College of Information Sciences and Technology at the Pennsylvania State University. He is also Professor of Computer Science and Engineering, Professor of Supply Chain and Information Systems, and Director of the Intelligent Systems Research...
at the College of Information Sciences and Technology
Penn State College of Information Sciences and Technology
The College of Information Sciences and Technology, also known as IST, was opened in 1999 in response to the rapidly growing need in almost every field for leadership in information sciences and technology...
, Pennsylvania State University
Pennsylvania State University
The Pennsylvania State University, commonly referred to as Penn State or PSU, is a public research university with campuses and facilities throughout the state of Pennsylvania, United States. Founded in 1855, the university has a threefold mission of teaching, research, and public service...
. It continues to support the goals outlined by CiteSeer to actively crawl and harvest academic and scientific documents on the public web and to use a citation index to permit query by citations and ranking of documents by the impact of citations. Currently, Lee Giles, Prasenjit Mitra, Susan Gauch, Min-Yen Kan, Pradeep Teregowda, Juan Pablo Fernández Ramírez, Pucktada Treeratpituk, and Shuyi Zheng are or have been actively involved in its development. Recently, a table search feature was introduced.
CiteSeerX continues to be rated as one of the world's top repositories and was rated number 1 in July 2010. It currently has over 1.5 million documents with nearly 1.5 million unique authors and 30 million citations.
CiteSeerX also shares its software, data, databases and metadata with other researchers, currently by rsync
Rsync
rsync is a software application and network protocol for Unix-like and Windows systems which synchronizes files and directories from one location to another while minimizing data transfer using delta encoding when appropriate. An important feature of rsync not found in most similar...
. Its new modular open source architecture and software (available on SourceForge
SourceForge
SourceForge Enterprise Edition is a collaborative revision control and software development management system. It provides a front-end to a range of software development lifecycle services and integrates with a number of free software / open source software applications .While originally itself...
) is built on Apache Solr and other Apache
Apache Software Foundation
The Apache Software Foundation is a non-profit corporation to support Apache software projects, including the Apache HTTP Server. The ASF was formed from the Apache Group and incorporated in Delaware, U.S., in June 1999.The Apache Software Foundation is a decentralized community of developers...
and open source tools which allows it to be a testbed for new algorithms in document harvesting, ranking, indexing, and information extraction.
See also
- Citation indexCitation indexA citation index is a kind of bibliographic database, an index of citations between publications, allowing the user to easily establish which later documents cite which earlier documents. The first citation indices were legal citators such as Shepard's Citations...
- The Collection of Computer Science BibliographiesThe Collection of Computer Science BibliographiesThe Collection of Computer Science Bibliographies is one of the oldest bibliography collections freely accessible on the Internet. It is a collection of bibliographies of scientific literature in computer science and mathematics from various sources, covering most aspects of computer science...
- DBLPDBLPDBLP is a computer science bibliography website hosted at Universität Trier, in Germany. It was originally a database and logic programming bibliography site, and has existed at least since the 1980s. DBLP listed more than 1.3 million articles on computer science in January 2010...
(Digital Bibliography & Library Project) - Disciplinary repositoryDisciplinary repositoryA disciplinary repository is a collection containing works or data associated with these works of scholars in a particular subject area. The repository can be online and accept work from scholars across institutions in contrast to institutional repositories...
- getCITEDGetCITEDGetCITED is a website database that lists publication and citation information on academic articles whose information is entered by members. It aims to include not only journal articles but also book chapters and other publications, both peer-reviewed and non-reviewed...
- Google Scholar
- Institute for Scientific InformationInstitute for Scientific InformationThe Institute for Scientific Information was founded by Eugene Garfield in 1960. It was acquired by Thomson Scientific & Healthcare in 1992, became known as Thomson ISI and now is part of the Healthcare & Science business of the multi-billion dollar Thomson Reuters Corporation.ISI offered...
's Web of ScienceWeb of ScienceISI Web of Knowledge is an academic citation indexing and search service, which is combined with web linking and provided by Thomson Reuters. Web of Knowledge coverage encompasses the sciences, social sciences, arts and humanities. It provides bibliographic content and the tools to access, analyze,... - Libra (Academic Search)Libra (Academic Search)Microsoft Academic Search is a free academic search engine developed by Microsoft Research. It covers more than 36 million publications and over 18 million authors across a variety of domains with updates added each week...
- List of academic databases and search engines
- ScirusScirusScirus is a comprehensive science-specific search engine. Like CiteSeerX and Google Scholar, it is focused on scientific information. Unlike CiteSeerX, Scirus is not only for computer sciences and IT and not all of the results include full text. It also sends its scientific search results to...
- ScopusScopusScopus, officially named SciVerse Scopus, is a bibliographic database containing abstracts and citations for academic journal articles. It covers nearly 18,000 titles from over 5,000 international publishers, including coverage of 16,500 peer-reviewed journals in the scientific, technical, medical,...