Federated search
Encyclopedia
Federated search is an information retrieval
technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engine
s participating in the federation. The federated search then aggregates the results that are received from the search engine
s for presentation to the user.
and broadcasting it to a group of disparate databases or other web resources, with the appropriate syntax, (2) merging the results collected from the databases, (3) presenting them in a succinct and unified format with minimal duplication, and (4) providing a means, performed either automatically or by the portal user, to sort the merged result set.
Federated search portals, either commercial or open access, generally search public access bibliographic databases, public access Web-based library catalogues (OPAC
s), Web-based search engines like Google
and/or open-access, government-operated or corporate data collections. These individual information sources send back to the portal's interface a list of results from the search query. The user can review this hit list. Some portals will merely screen scrape the actual database results and not directly allow a user to enter the information source's application. More sophisticated ones will de-dupe the results list by merging and removing duplicates. There are additional features available in many portals, but the basic idea is the same: to improve the accuracy and relevance of individual searches as well as reduce the amount of time required to search for resources.
This process allows federated search some key advantages when compared with existing crawler-based search engines. Federated search need not place any requirements or burdens on owners of the individual information sources, other than handling increased traffic. Federated searches are inherently as current as the individual information sources, as they are searched in real time.
; however, this is not a complete solution as many documents are not currently indexed. These documents are on what is known as the deep Web
, or invisible Web. Many more information sources are not yet stored in electronic form. Google Scholar
is one example of many projects trying to address this.
When the search vocabulary or data model
of the search system is different from the data model of one or more of the foreign target systems the query must be translated into each of the foreign target systems. This can be done using simple data-element translation or may require semantic translation
.
A challenge faced in the implementation of federated search engines is scalability, in other words, the performance of the site as the number of information sources comprising the federated search engine increase. One federated search engine that has begun to address this issue is WorldWideScience
, hosted by the U.S. Department of Energy's Office of Scientific and Technical Information
. WorldWideScience is composed of more than 40 information sources, several of which are federated search portals themselves. One such portal is Science.gov which itself federates more than 30 information sources representing most of the R&D output of the U.S. Federal government. Science.gov returns its highest ranked results to WorldWideScience, which then merges and ranks these results with the search returned by the other information sources that comprise WorldWideScience. This approach of cascaded federated search enables large number of information sources to be searched via a single query.
Another application Sesam
running in both Norway and Sweden has been built on top of an open sourced platform specialised for federated search solutions. Sesat, an acronym for Sesam Search Application Toolkit, is a platform that provides much of the framework and functionality required for handling parallel and pipelined searches and displaying them elegantly in a user interface, allowing engineers to focus on the index/database configuration tuning.
to each underlying search engine, so that appropriate security is maintained. If the user has different
login credentials for different systems, there must be a means to map their login ID to each search
engine's security domain.
Another challenge is mapping results list navigators into a common form. Suppose 3 real-estate sites are searched, each provides a list of hyperlinked city names to click on, to see matches only in each city. Ideally these facets would be combined into one set, but that presents additional technical challenges. The system also needs to understand "next page" links if it's going to allow the user to page through the combined results.
Information retrieval
Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...
technology that allows the simultaneous search of multiple searchable resources. A user makes a single query request which is distributed to the search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
s participating in the federation. The federated search then aggregates the results that are received from the search engine
Search engine
A search engine is an information retrieval system designed to help find information stored on a computer system. The search results are usually presented in a list and are commonly called hits. Search engines help to minimize the time required to find information and the amount of information...
s for presentation to the user.
Purpose
Federated search came about to meet the need of searching multiple disparate content sources with one query. This allows a user to search multiple database at once in real time, arrange the results from the various databases into a useful form and then present the results to the user.Process
As described by Peter Jacso (2004), federated searching consists of (1) transforming a queryWeb search query
A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.- Types...
and broadcasting it to a group of disparate databases or other web resources, with the appropriate syntax, (2) merging the results collected from the databases, (3) presenting them in a succinct and unified format with minimal duplication, and (4) providing a means, performed either automatically or by the portal user, to sort the merged result set.
Federated search portals, either commercial or open access, generally search public access bibliographic databases, public access Web-based library catalogues (OPAC
OPAC
An Online Public Access Catalog is an online database of materials held by a library or group of libraries...
s), Web-based search engines like Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
and/or open-access, government-operated or corporate data collections. These individual information sources send back to the portal's interface a list of results from the search query. The user can review this hit list. Some portals will merely screen scrape the actual database results and not directly allow a user to enter the information source's application. More sophisticated ones will de-dupe the results list by merging and removing duplicates. There are additional features available in many portals, but the basic idea is the same: to improve the accuracy and relevance of individual searches as well as reduce the amount of time required to search for resources.
This process allows federated search some key advantages when compared with existing crawler-based search engines. Federated search need not place any requirements or burdens on owners of the individual information sources, other than handling increased traffic. Federated searches are inherently as current as the individual information sources, as they are searched in real time.
Implementation
One application of federated searching is the metasearch engineMetasearch engine
A metasearch engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Metasearch engines enable users to enter search criteria once and access several search engines...
; however, this is not a complete solution as many documents are not currently indexed. These documents are on what is known as the deep Web
Deep web
The Deep Web refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines....
, or invisible Web. Many more information sources are not yet stored in electronic form. Google Scholar
Google Scholar
Google Scholar is a freely accessible web search engine that indexes the full text of scholarly literature across an array of publishing formats and disciplines. Released in beta in November 2004, the Google Scholar index includes most peer-reviewed online journals of Europe and America's largest...
is one example of many projects trying to address this.
When the search vocabulary or data model
Data model
A data model in software engineering is an abstract model, that documents and organizes the business data for communication between team members and is used as a plan for developing applications, specifically how data is stored and accessed....
of the search system is different from the data model of one or more of the foreign target systems the query must be translated into each of the foreign target systems. This can be done using simple data-element translation or may require semantic translation
Semantic translation
Semantic translation is the process of using semantic information to aid in the translation of data in one representation or data model to another representation or data model...
.
A challenge faced in the implementation of federated search engines is scalability, in other words, the performance of the site as the number of information sources comprising the federated search engine increase. One federated search engine that has begun to address this issue is WorldWideScience
WorldWideScience
WorldWideScience.org is a global science search engine designed to accelerate scientific discovery and progress by accelerating the sharing of scientific knowledge...
, hosted by the U.S. Department of Energy's Office of Scientific and Technical Information
Office of Scientific and Technical Information
The Office of Scientific and Technical Information is a component of the Office of Science within the U.S. Department of Energy...
. WorldWideScience is composed of more than 40 information sources, several of which are federated search portals themselves. One such portal is Science.gov which itself federates more than 30 information sources representing most of the R&D output of the U.S. Federal government. Science.gov returns its highest ranked results to WorldWideScience, which then merges and ranks these results with the search returned by the other information sources that comprise WorldWideScience. This approach of cascaded federated search enables large number of information sources to be searched via a single query.
Another application Sesam
Sesam
Sesam was a Scandinavian internet search engine developed by the media corporation Schibsted. It was available both in a Norwegian and Swedish version and was launched on 1 November 2005. By 2007 Sesam.no had 480,000 unique users and was among the 12 largest web sites in Norway...
running in both Norway and Sweden has been built on top of an open sourced platform specialised for federated search solutions. Sesat, an acronym for Sesam Search Application Toolkit, is a platform that provides much of the framework and functionality required for handling parallel and pipelined searches and displaying them elegantly in a user interface, allowing engineers to focus on the index/database configuration tuning.
Challenges
When federated search is performed against secure data sources, the users' credentials must be passed onto each underlying search engine, so that appropriate security is maintained. If the user has different
login credentials for different systems, there must be a means to map their login ID to each search
engine's security domain.
Another challenge is mapping results list navigators into a common form. Suppose 3 real-estate sites are searched, each provides a list of hyperlinked city names to click on, to see matches only in each city. Ideally these facets would be combined into one set, but that presents additional technical challenges. The system also needs to understand "next page" links if it's going to allow the user to page through the combined results.
Related links
- Federated Search 101. Linoski, Alexis, Walczyk, Tine, Library Journal, Summer 2008 Net Connect, Vol. 133 Note: this content has been moved here, but you will need a remote access account through your local library to get the whole article.
- Cox, Christopher N. Federated Search: Solution or Setback for Online Library Services. Binghamton, NY: Haworth Information Press, 2007.Table of Contents
- Federated Search Primer. Lederman, S., AltSearchEngines, January 2009 Note: This material has been reposted here, on the blog of a commercial search engine company.
- Milad Shokouhi and Luo Si, Federated Search, Foundations and Trends® in Information Retrieval: Vol. 5: No 1, pp 1-102., http://dx.doi.org/10.1561/1500000010
See also
- Federated contentFederated contentFederated content is digital media content that is designed to be self-managing to support reporting and rights management in a peer-to-peer network. Ex: Audio stored in a digital rights management file format.-Further reading:...
- Metasearch engineMetasearch engineA metasearch engine is a search tool that sends user requests to several other search engines and/or databases and aggregates the results into a single list or displays them according to their source. Metasearch engines enable users to enter search criteria once and access several search engines...
- FunnelbackFunnelbackFunnelback is both an enterprise search engine and the name of the company selling the technology. Funnelback is used by many Australian universities and government organisations to search for information on their websites, intranets, file-shares and databases.- History :Funnelback was originally...
- Search aggregatorSearch aggregatorA search aggregator is a type of metasearch engine which gathers results from multiple search engines simultaneously through RSS search results...
- Deep WebDeep webThe Deep Web refers to World Wide Web content that is not part of the Surface Web, which is indexed by standard search engines....