Google Search Appliance
Encyclopedia
The Google Search Appliance is a rack-mounted
device providing document indexing functionality that can be integrated into an intranet
, document management system
or web site using a Google search
-like interface for end-user retrieval of results. The operating system is based on CentOS
. The software is produced by Google
and the hardware is manufactured by Dell Computers and is based on Dell's PowerEdge R710.
The device is supplied in three models: a model capable of indexing up to 300,000 documents (Google Mini), a 2U model (GB-7007) capable of indexing up to 10 million documents, and a 5U (2U plus 3U storage) model (GB-9009) that is capable of indexing up to 30 million documents. Later versions of the software allow the connecting of multiple appliances to offer searching "millions or billions" of documents. Sales are operated on a licensing scheme which starts as a two-year contract for maintenance, support and software updates.
Other features include
Software version 6.0 was released in June, 2009. This software runs on some hardware versions of the GB-1001 model (all units with an "S5" prefix in their "Appliance ID"), and all GB-7007 and GB-9009 models. New features available in this software include:
(REST) based admin API that allows for automation of tasks. There are also existing admin modules that can be used for customization.
If a person is interested in using the Google Search Appliance in another region, they can deploy the Google Search Appliance at a location or data center in the US, Canada, or Europe.
19-inch rack
A 19-inch rack is a standardized frame or enclosure for mounting multiple equipment modules. Each module has a front panel that is wide, including edges or ears that protrude on each side which allow the module to be fastened to the rack frame with screws.-Overview and history:Equipment designed...
device providing document indexing functionality that can be integrated into an intranet
Intranet
An intranet is a computer network that uses Internet Protocol technology to securely share any part of an organization's information or network operating system within that organization. The term is used in contrast to internet, a network between organizations, and instead refers to a network...
, document management system
Document management system
A document management system is a computer system used to track and store electronic documents and/or images of paper documents. It is usually also capable of keeping track of the different versions created by different users . The term has some overlap with the concepts of content management...
or web site using a Google search
Google search
Google or Google Web Search is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services....
-like interface for end-user retrieval of results. The operating system is based on CentOS
CentOS
CentOS is a free operating system based on Red Hat Enterprise Linux . It exists to provide a free enterprise class computing platform and strives to maintain 100% binary compatibility with its upstream distribution...
. The software is produced by Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
and the hardware is manufactured by Dell Computers and is based on Dell's PowerEdge R710.
The device is supplied in three models: a model capable of indexing up to 300,000 documents (Google Mini), a 2U model (GB-7007) capable of indexing up to 10 million documents, and a 5U (2U plus 3U storage) model (GB-9009) that is capable of indexing up to 30 million documents. Later versions of the software allow the connecting of multiple appliances to offer searching "millions or billions" of documents. Sales are operated on a licensing scheme which starts as a two-year contract for maintenance, support and software updates.
Features
The Google Search Appliance contains Google search technologies and a means of configuring and customizing the appliance. The appliance also comes with a T-shirt.Other features include
- it supports Google AnalyticsGoogle AnalyticsGoogle Analytics is a free service offered by Google that generates detailed statistics about the visitors to a website. The product is aimed at marketers as opposed to webmasters and technologists from which the industry of web analytics originally grew. It is the most widely used website...
and Google SitemapsGoogle SitemapsThe Sitemaps protocol allows a webmaster to inform search engines about URLs on a website that are available for crawling. A Sitemap is an XML file that lists the URLs for a site. It allows webmasters to include additional information about each URL: when it was last updated, how often it changes,...
functionality - its search capabilities include searching web content, other file types (e.g. html, pdf, office documents), databases (OracleOracle DatabaseThe Oracle Database is an object-relational database management system produced and marketed by Oracle Corporation....
, MySQLMySQLMySQL officially, but also commonly "My Sequel") is a relational database management system that runs as a server providing multi-user access to a number of databases. It is named after developer Michael Widenius' daughter, My...
, Microsoft SQL ServerMicrosoft SQL ServerMicrosoft SQL Server is a relational database server, developed by Microsoft: It is a software product whose primary function is to store and retrieve data as requested by other software applications, be it those on the same computer or those running on another computer across a network...
, IBM DB2IBM DB2The IBM DB2 Enterprise Server Edition is a relational model database server developed by IBM. It primarily runs on Unix , Linux, IBM i , z/OS and Windows servers. DB2 also powers the different IBM InfoSphere Warehouse editions...
, SybaseSybaseSybase, an SAP company, is an enterprise software and services company offering software to manage, analyze, and mobilize information, using relational databases, analytics and data warehousing solutions and mobile applications development platforms....
) and content management systems (EMC DocumentumDocumentumDocumentum is an enterprise content management platform, now delivered by EMC Corporation, as well as the name of the software company that originally developed the technology. EMC acquired Documentum for $1.7 billion in December, 2003...
, FileNetFileNetFileNet, a company acquired by IBM, developed software to help enterprises manage their content and business processes. The FileNet P8 platform, their flagship system, is a framework for developing custom enterprise systems, offering much functionality out of the box and capable of being customized...
, Open TextOpen textIn semiotic analysis, an open text is a text that allows multiple or mediated interpretation by the readers. In contrast, a closed text leads the reader to one intended interpretation....
LiveLink, Microsoft SharePointMicrosoft SharePointMicrosoft SharePoint is a web application platform developed by Microsoft. First launched in 2001, SharePoint is typically associated with web content management and document management systems, but it is actually a much broader platform of web technologies, capable of being configured into a wide...
) - indexing (crawling) of search-able content can be configured by specifying URLs to crawl. Search patterns can also be included to limit the information that is being searched and searching can be customized by using the OneBox API
- the result set will be displayed with a google like appearance. The default behavior can be customized by using XSL TransformationsXSL TransformationsXSLT is a declarative, XML-based language used for the transformation of XML documents. The original document is not changed; rather, a new document is created based on the content of an existing one. The new document may be serialized by the processor in standard XML syntax or in another format,...
- keywords that returns specific result when specific keywords are used. Example: Associate Cell Phone with http://SampleCellProvider.com so whenever someone searches for cell phone your link will appear at the top of the search no matter where it would normally appear in the result set
- synonyms will give alternate terms for your search. E.g. when user types “cell phone”. Search will add suggestions e.g. “mobile phone” to the result set
- cached results each result item will include a "cached" link next to each result item. By clicking on the user will be able to view an HTML version of the page / document which means that the actual document does not need to be opened
- The result set also contains number of results returned, duration of search, document title, url of document, date modified.
- search terms are highlighted to show search hits and allows you to see words in context without having to open documents.
- groups similar results to hide duplicates.
- shows document types
- result sets can be sorted by date or relevance
Google Mini
The Google Mini is a smaller and lower-cost solution for small and medium-sized businesses to set up a search engine that allows them to index and search up to 300,000 documents.Google Search Appliance
The Google Search Appliance can be purchased in two separate versions based on the number of documents being indexed. Model GB-7007, a 2U appliance, can index up to 10,000,000 documents. The GB-9009 5U appliance can index up to 30,000,000 documents.Software version 6.0 was released in June, 2009. This software runs on some hardware versions of the GB-1001 model (all units with an "S5" prefix in their "Appliance ID"), and all GB-7007 and GB-9009 models. New features available in this software include:
- Customized and enhanced relevancy tuning to bias certain nodes’ and collections’ results.
- Administration APIs for .net and Java programmers to automate tasks.
- Early binding to increase serving performance
- Customization in SAML authentication and Authorization
- Added user results to search results.
- Search-as-you-Type functionality.
- Query translation to 40 different languages.
- Replication of search results
- Clustering multiple GSAs by using a new technology called (GSA)n makes it possible to index up to 1 billion documents
Older appliances
Google used to sell a 2U appliance (GB-1001) capable of indexing up to 5,000,000 documents, a half-rack cluster (GB-5005) of five 2U nodes capable of indexing up to 10,000,000 documents, and a full-rack cluster (GB-8008) of eight and later twelve nodes capable of indexing up to 30,000,000 documents. Some models were based on Dell PowerEdge 2950 2U rackmount servers.Scalability
- Multiple appliances can be linked together to scale to billions of documents
- Physical hardware can be distributed across multiple locations
Administration
Minimal support infrastructure / admin staff is needed as quoted on their web site “…doesn’t need a tech support baby-sitter. You simply plug it in, configure it, and let it run…”. The device does come with a web based admin console that can be used to make configuration changes where needed. Additional customisation is possible through a Representational State TransferRepresentational State Transfer
Representational state transfer is a style of software architecture for distributed hypermedia systems such as the World Wide Web. The term representational state transfer was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation...
(REST) based admin API that allows for automation of tasks. There are also existing admin modules that can be used for customization.
Product availability
The Google Search Appliance is available in the United States, Canada, Europe, Japan, parts of Asia, the Middle East, North Africa and South America.If a person is interested in using the Google Search Appliance in another region, they can deploy the Google Search Appliance at a location or data center in the US, Canada, or Europe.