Xapian
Encyclopedia
Xapian is an open source
probabilistic information retrieval
library, released under the GNU General Public License
(GPL). It is a full text search engine
library for programmers.
It is written in C++
, with bindings to allow use from Perl
, Python
, PHP
, Java
, Tcl
, C#, Ruby,
and Lua. Xapian is highly portable and runs on Linux
, Mac OS X
, FreeBSD
, NetBSD
, OpenBSD
, Solaris
, HP-UX
, Tru64, IRIX
, Microsoft Windows
, GNU Hurd
, and OS/2
.
Xapian is designed to be a highly adaptable toolkit to allow developers to easily add advanced indexing and search facilities to their own applications.
A growing number of organisations and projects are known to be using Xapian including Debian
, Gmane
, Die Zeit
, Delicious, MoinMoin
, and One Laptop per Child.
Open source
The term open source describes practices in production and development that promote access to the end product's source materials. Some consider open source a philosophy, others consider it a pragmatic methodology...
probabilistic information retrieval
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
library, released under the GNU General Public License
GNU General Public License
The GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
(GPL). It is a full text search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
library for programmers.
It is written in C++
C++
C++ is a statically typed, free-form, multi-paradigm, compiled, general-purpose programming language. It is regarded as an intermediate-level language, as it comprises a combination of both high-level and low-level language features. It was developed by Bjarne Stroustrup starting in 1979 at Bell...
, with bindings to allow use from Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
, Python
Python (programming language)
Python is a general-purpose, high-level programming language whose design philosophy emphasizes code readability. Python claims to "[combine] remarkable power with very clear syntax", and its standard library is large and comprehensive...
, PHP
PHP
PHP is a general-purpose server-side scripting language originally designed for web development to produce dynamic web pages. For this purpose, PHP code is embedded into the HTML source document and interpreted by a web server with a PHP processor module, which generates the web page document...
, Java
Java (Sun)
Java refers to several computer software products and specifications from Sun Microsystems, a subsidiary of Oracle Corporation, that together provide a system for developing application software and deploying it in a cross-platform environment...
, Tcl
Tcl
Tcl is a scripting language created by John Ousterhout. Originally "born out of frustration", according to the author, with programmers devising their own languages intended to be embedded into applications, Tcl gained acceptance on its own...
, C#, Ruby,
and Lua. Xapian is highly portable and runs on Linux
Linux
Linux is a Unix-like computer operating system assembled under the model of free and open source software development and distribution. The defining component of any Linux system is the Linux kernel, an operating system kernel first released October 5, 1991 by Linus Torvalds...
, Mac OS X
Mac OS X
Mac OS X is a series of Unix-based operating systems and graphical user interfaces developed, marketed, and sold by Apple Inc. Since 2002, has been included with all new Macintosh computer systems...
, FreeBSD
FreeBSD
FreeBSD is a free Unix-like operating system descended from AT&T UNIX via BSD UNIX. Although for legal reasons FreeBSD cannot be called “UNIX”, as the direct descendant of BSD UNIX , FreeBSD’s internals and system APIs are UNIX-compliant...
, NetBSD
NetBSD
NetBSD is a freely available open source version of the Berkeley Software Distribution Unix operating system. It was the second open source BSD descendant to be formally released, after 386BSD, and continues to be actively developed. The NetBSD project is primarily focused on high quality design,...
, OpenBSD
OpenBSD
OpenBSD is a Unix-like computer operating system descended from Berkeley Software Distribution , a Unix derivative developed at the University of California, Berkeley. It was forked from NetBSD by project leader Theo de Raadt in late 1995...
, Solaris
Solaris Operating System
Solaris is a Unix operating system originally developed by Sun Microsystems. It superseded their earlier SunOS in 1993. Oracle Solaris, as it is now known, has been owned by Oracle Corporation since Oracle's acquisition of Sun in January 2010....
, HP-UX
HP-UX
HP-UX is Hewlett-Packard's proprietary implementation of the Unix operating system, based on UNIX System V and first released in 1984...
, Tru64, IRIX
IRIX
IRIX is a computer operating system developed by Silicon Graphics, Inc. to run natively on their 32- and 64-bit MIPS architecture workstations and servers. It was based on UNIX System V with BSD extensions. IRIX was the first operating system to include the XFS file system.The last major version...
, Microsoft Windows
Microsoft Windows
Microsoft Windows is a series of operating systems produced by Microsoft.Microsoft introduced an operating environment named Windows on November 20, 1985 as an add-on to MS-DOS in response to the growing interest in graphical user interfaces . Microsoft Windows came to dominate the world's personal...
, GNU Hurd
GNU Hurd
GNU Hurd is a free software Unix-like replacement for the Unix kernel, released under the GNU General Public License. It has been under development since 1990 by the GNU Project of the Free Software Foundation...
, and OS/2
OS/2
OS/2 is a computer operating system, initially created by Microsoft and IBM, then later developed by IBM exclusively. The name stands for "Operating System/2," because it was introduced as part of the same generation change release as IBM's "Personal System/2 " line of second-generation personal...
.
Xapian is designed to be a highly adaptable toolkit to allow developers to easily add advanced indexing and search facilities to their own applications.
A growing number of organisations and projects are known to be using Xapian including Debian
Debian
Debian is a computer operating system composed of software packages released as free and open source software primarily under the GNU General Public License along with other free software licenses. Debian GNU/Linux, which includes the GNU OS tools and Linux kernel, is a popular and influential...
, Gmane
Gmane
Gmane is an e-mail to news gateway. It allows users to access electronic mailing lists as if they were Usenet newsgroups, and also through a variety of web interfaces. Gmane is an archive; it never expires messages...
, Die Zeit
Die Zeit
Die Zeit is a German nationwide weekly newspaper that is highly respected for its quality journalism.With a circulation of 488,036 and an estimated readership of slightly above 2 million, it is the most widely read German weekly newspaper...
, Delicious, MoinMoin
MoinMoin
MoinMoin is a wiki engine implemented in Python, initially based on the PikiPiki wiki engine. The MoinMoin code is licensed under the GNU General Public License v2, or any later version .A number of organizations use MoinMoin to run public wikis,...
, and One Laptop per Child.
Features
- Transactions: if database update fails in the middle of a transaction, the database is guaranteed to remain in a consistent state.
- Simultaneous search and update, with new documents being immediately visible.
- Support for large databases: Xapian has been proven to be scalable to hundreds of millions of documents.
- Accurate probabilistic ranking: more relevant documents are listed first.
- Phrase and proximity searching.
- Relevance feedbackRelevance feedbackRelevance feedback is a feature of some information retrieval systems. The idea behind relevance feedback is to take the results that are initially returned from a given query and to use information about whether or not those results are relevant to perform a new query...
, which improves ranking and can expand a query, find related documents, categorise documents etc. - Structured Boolean queries, e.g. "race AND condition NOT horse"
- Wildcard search, e.g. "wiki*"
- Spelling correction
- Synonyms
- Omega, a packaged solution for adding a search engine to a web site or intranet. Omega can easily be extended and adapted to fit changing requirements.
External links
- http://xapian.org is the Xapian project website.
- Xappy is a set of Python bindings for Xapian.
- Flax is an open-source enterprise search engine based on Xapian.
- Recoll is a desktop search tool based on Xapian.
- Full Text Search options in Ruby using Xapian.