Web search engine
Encyclopedia
A web search engine is designed to search for information on the World Wide Web
and FTP servers. The search results are generally presented in a list of results often referred to as SERPS, or "search engine results pages". The information may consist of web page
s, images, information and other types of files. Some search engines also mine data
available in database
s or open directories
. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time
information by running an algorithm
on a web crawler
.
During the early development of the web, there was a list of webservers edited by Tim Berners-Lee
and hosted on the CERN
webserver. One historical snapshot from 1992 remains. As more webservers went online the central list could not keep up. On the NCSA
site new servers were announced under the title "What's New!"
The very first tool used for searching on the Internet was Archie
.
The name stands for "archive" without the "v". It was created in 1990 by Alan Emtage
, Bill Heelan and J. Peter Deutsch, computer science students at McGill University
in Montreal
. The program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol
) sites, creating a searchable database of file names; however, Archie did not index the contents of these sites since the amount of data was so limited it could be readily searched manually.
The rise of Gopher (created in 1991 by Mark McCahill at the University of Minnesota
) led to two new search programs, Veronica
and Jughead
. Like Archie, they searched the file names and titles stored in Gopher index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information from specific Gopher servers. While the name of the search engine "Archie" was not a reference to the Archie comic book
series, "Veronica
" and "Jughead
" are characters in the series, thus referencing their predecessor.
In the summer of 1993, no search engine existed yet for the web, though numerous specialized catalogues were maintained by hand. Oscar Nierstrasz
at the University of Geneva
wrote a series of Perl
scripts that would periodically mirror these pages and rewrite them into a standard format which formed the basis for W3Catalog
, the web's first primitive search engine, released on September 2, 1993.
In June 1993, Matthew Gray, then at MIT
, produced what was probably the first web robot, the Perl
-based World Wide Web Wanderer
, and used it to generate an index called 'Wandex'. The purpose of the Wanderer was to measure the size of the World Wide Web, which it did until late 1995. The web's second search engine Aliweb
appeared in November 1993. Aliweb did not use a web robot, but instead depended on being notified by website administrators of the existence at each site of an index file in a particular format.
JumpStation
(released in December 1993) used a web robot
to find web pages and to build its index, and used a web form as the interface to its query program. It was thus the first WWW resource-discovery tool to combine the three essential features of a web search engine (crawling, indexing, and searching) as described below. Because of the limited resources available on the platform on which it ran, its indexing and hence searching were limited to the titles and headings found in the web pages the crawler encountered.
One of the first "full text" crawler-based search engines was WebCrawler
, which came out in 1994. Unlike its predecessors, it let users search for any word in any webpage, which has become the standard for all major search engines since. It was also the first one to be widely known by the public. Also in 1994, Lycos
(which started at Carnegie Mellon University
) was launched and became a major commercial endeavor.
Soon after, many search engines appeared and vied for popularity. These included Magellan (search engine), Excite
, Infoseek
, Inktomi
, Northern Light
, and AltaVista
. Yahoo!
was among the most popular ways for people to find web pages of interest, but its search function operated on its web directory
, rather than full-text copies of web pages. Information seekers could also browse the directory instead of doing a keyword-based search.
In 1996, Netscape
was looking to give a single search engine an exclusive deal to be the featured search engine on Netscape's web browser. There was so much interest that instead a deal was struck with Netscape by five of the major search engines, where for $5 million per year each search engine would be in rotation on the Netscape search engine page. The five engines were Yahoo!, Magellan, Lycos, Infoseek, and Excite.
Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s. Several companies entered the market spectacularly, receiving record gains during their initial public offering
s. Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light. Many search engine companies were caught up in the dot-com bubble
, a speculation-driven market boom that peaked in 1999 and ended in 2001.
Around 2000, Google's search engine
rose to prominence. The company achieved better results for many searches with an innovation called PageRank
. This iterative algorithm ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others. Google also maintained a minimalist interface to its search engine. In contrast, many of its competitors embedded a search engine in a web portal
.
By 2000, Yahoo! was providing search services based on Inktomi's search engine. Yahoo! acquired Inktomi in 2002, and Overture
(which owned AlltheWeb
and AltaVista) in 2003. Yahoo! switched to Google's search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions.
Microsoft first launched MSN Search in the fall of 1998 using search results from Inktomi. In early 1999 the site began to display listings from Looksmart
blended with results from Inktomi except for a short time in 1999 when results from AltaVista were used instead. In 2004, Microsoft
began a transition to its own search technology, powered by its own web crawler
(called msnbot
).
Microsoft's rebranded search engine, Bing
, was launched on June 1, 2009. On July 29, 2009, Yahoo! and Microsoft finalized a deal in which Yahoo! Search
would be powered by Microsoft Bing technology.
Web search engines work by storing information about many web pages, which they retrieve from the html itself. These pages are retrieved by a Web crawler
(sometimes also known as a spider) — an automated Web browser which follows every link on the site. Exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. A query can be a single word. The purpose of an index is to allow information to be found as quickly as possible. Some search engines, such as Google
, store all or part of the source page (referred to as a cache
) as well as information about the web pages, whereas others, such as AltaVista
, store every word of every page they find. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot, and Google's handling of it increases usability
by satisfying user expectations
that the search terms will be on the returned webpage. This satisfies the principle of least astonishment
since the user normally expects the search terms to be on the returned pages. Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.
When a user enters a query
into a search engine (typically by using key words
), the engine examines its index
and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. The index is built from the information stored with the data and the method by which the information is indexed. Unfortunately, there are currently no known public search engines that allow documents to be searched by date. Most search engines support the use of the boolean operators AND, OR and NOT to further specify the search query
. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced feature called proximity search which allows users to define the distance between keywords. There is also concept-based searching where the research involves using statistical analysis on pages containing the words or phrases you search for. As well, natural language queries allow the user to type a question in the same form one would ask it to a human. A site like this would be ask.com.
The usefulness of a search engine depends on the relevance
of the result set it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve. There are two main types of search engine that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have programmed extensively. The other is a system that generates an "inverted index
" by analyzing texts it locates. This second form relies much more heavily on the computer itself to do the bulk of the work.
Most Web search engines are commercial ventures supported by advertising
revenue and, as a result, some employ the practice of allowing advertisers to pay money to have their listings ranked higher in search results. Those search engines which do not accept money for their search engine results make money by running search related ads
alongside the regular search engine results. The search engines make money every time someone clicks on one of these ads.
Google's worldwide market share peaked at 86.3% in April 2010. Yahoo!
, Bing
and other search engines are more popular in the US than in Europe
.
According to Hitwise
, market share in the U.S. for October 2011 was Google 65.38%, Bing-powered (Bing and Yahoo!) 28.62%, and the remaining 66 search engines 6%. However, the "success rate" of searches sampled in July. Over 80 percent of Yahoo! and Bing searches resulted in the users visiting a web site, while Google's rate was just under 68 percent.
An Experian Hitwise report released in August 2011 gave the "success rate" of searches sampled in July. Over 80 percent of Yahoo! and Bing searches resulted in the users visiting a web site, while Google's rate was just under 68 percent.
In the People's Republic of China
, Baidu held a 61.6% market share for web search in July 2009.
results), and political processes (e.g., the removal of search results in order to comply with local laws). Google Bombing is one example of an attempt to manipulate search results for political, social or commercial reasons.
World Wide Web
The World Wide Web is a system of interlinked hypertext documents accessed via the Internet...
and FTP servers. The search results are generally presented in a list of results often referred to as SERPS, or "search engine results pages". The information may consist of web page
Web page
A web page or webpage is a document or information resource that is suitable for the World Wide Web and can be accessed through a web browser and displayed on a monitor or mobile device. This information is usually in HTML or XHTML format, and may provide navigation to other web pages via hypertext...
s, images, information and other types of files. Some search engines also mine data
Data mining
Data mining , a relatively young and interdisciplinary field of computer science is the process of discovering new patterns from large data sets involving methods at the intersection of artificial intelligence, machine learning, statistics and database systems...
available in database
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
s or open directories
Web directory
A web directory or link directory is a directory on the World Wide Web. It specializes in linking to other web sites and categorizing those links....
. Unlike web directories, which are maintained only by human editors, search engines also maintain real-time
Real-time computing
In computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...
information by running an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
on a web crawler
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...
.
History
Timeline (full list) | ||
---|---|---|
Year | Engine | Current status |
1993 | W3Catalog W3Catalog W3 Catalog was a very early web search engine, first released on September 2, 1993 by developer Oscar Nierstrasz at the University of Geneva.Unlike later search engines, like Aliweb, which attempt to index the web by crawling over the accessible content of web sites, W3 Catalog exploited the fact... |
Closed |
Aliweb Aliweb ALIWEB is considered the first Web search engine, as its predecessors were either built with different purposes or were literally just indexers .... |
Closed | |
JumpStation JumpStation JumpStation was the first WWW search engine that behaved, and appeared to the user, the way current web search engines do. It started indexing on Sunday 12 December 1993 and was announced on the Mosaic "What's New" webpage on 21 December 1993... |
Closed | |
1994 | WebCrawler WebCrawler WebCrawler is a metasearch engine that blends the top search results from Google, Yahoo!, Bing Search , Ask.com, About.com, MIVA, LookSmart and other popular search engines. WebCrawler also provides users the option to search for images, audio, video, news, yellow pages and white pages... |
Active, Aggregator |
Go.com Go.com Go.com is a web portal first launched by Jeff Gold, and now operated by the Walt Disney Internet Group, which is a part of The Walt Disney Company. The portal includes content from ABC News, ESPN, and FamilyFun.com, all of which are associated with Disney and are hosted under a .go.com name... |
Active, Yahoo Search | |
Lycos Lycos Lycos, Inc. is a search engine and web portal established in 1994. Lycos also encompasses a network of email, webhosting, social networking, and entertainment websites.-Corporate history:... |
Active | |
1995 | AltaVista AltaVista AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the most popular search engines but its popularity declined with the rise of Google... |
Bought and operated by Yahoo! |
Daum | Active | |
Magellan Magellan Magellan may refer to:*Ferdinand Magellan, a Portuguese explorer who led part of the first expedition around the world*Magellan , a progressive rock band*Magellan , a forerunner of the Excite web portal... |
Closed | |
Excite Excite Excite is a collection of Internet sites and services owned by IAC Search & Media, which is a subsidiary of InterActive Corporation . Launched in 1994, it is an online service offering a variety of content, including an Internet portal, a search engine, a web-based email, instant messaging, stock... |
Active | |
SAPO | Active | |
Yahoo! Yahoo! Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,... |
Active, Launched as a directory | |
1996 | Dogpile Dogpile Dogpile is a metasearch engine that fetches results from Google, Yahoo!, Bing, Ask.com, About.com and several other popular search engines, including those from audio and video content providers. It is a registered trademark of InfoSpace, Inc.- History :... |
Active, Aggregator |
Inktomi Inktomi Inktomi Corporation was a California company that provided software for Internet service providers. It was founded in 1996 by UC Berkeley professor Eric Brewer and graduate student Paul Gauthier. The company was initially founded based on the real-world success of the search engine they developed... |
Acquired by Yahoo! | |
HotBot HotBot HotBot is a web search engine launched in May 1996 by Wired Magazine. It is currently owned by Lycos. HotBot became a popular tool with search results served by the Inktomi database and directory results provided originally by LookSmart and then the Open Directory Project since mid-1999... |
Active (lycos.com) | |
Ask Jeeves Ask.com Ask is a Q&A focused search engine founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California. The original software was implemented by Gary Chevsky from his own design. Warthen, Chevsky, Justin Grant, and others built the early AskJeeves.com website around that core engine... |
Active (ask.com, Jeeves went away) | |
1997 | Northern Light Northern Light Group Northern Light Group, LLC is a company specializing in strategic research portals, enterprise search technology, and text analytics solutions. The company provides custom, hosted, turnkey solutions for its clients using the software as a service delivery model. Northern Light markets its... |
Closed |
Yandex Yandex Yandex is a Russian IT company which operates the largest search engine in Russia and develops a number of Internet-based services and products. Yandex is ranked as 5-th world largest search engine... |
Active | |
1998 | Google Google search Google or Google Web Search is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services.... |
Active |
MSN Search | Active as Bing | |
1999 | AlltheWeb AlltheWeb AlltheWeb was an Internet search engine that made its debut in mid-1999. It grew out of FTP Search, Tor Egge's doctorate thesis at the Norwegian University of Science and Technology, which he started in 1994, which in turn resulted in the formation of Fast Search and Transfer, established on July... |
Closed (URL redirected to Yahoo!) |
GenieKnows GenieKnows GenieKnows is a division of IT Interactive Services Inc., a privately owned vertical search engine company based in Halifax, Nova Scotia. Like many internet search engines, its revenue model centers on an online advertising platform and B2B transactions... |
Active, rebranded Yellowee.com | |
Naver | Active | |
Teoma Teoma Teoma, pronounced chawmuh , was an Internet search engine founded in 2000 by Professor Apostolos Gerasoulis and his colleagues at Rutgers University in New Jersey. Professor Tao Yang from the University of California, Santa Barbara co-led technology R&D. Their research grew out of the 1998 DiscoWeb... |
Active | |
Vivisimo Vivísimo Vivisimo is a privately held enterprise search software company in Pittsburgh that develops and sells software products to improve search on the web and in enterprises... |
Closed | |
2000 | Baidu Baidu Baidu, Inc. , simply known as Baidu and incorporated on January 18, 2000, is a Chinese web services company headquartered in the Baidu Campus in Haidian District, Beijing, People's Republic of China.... |
Active |
Exalead Exalead Exalead is a software company that provides search platforms and search-based applications for consumer and business users. The company is headquartered in Paris, France, and is a subsidiary of Dassault Systèmes .- CloudView Platform :... |
Acquired by Dassault Systèmes Dassault Systemes Dassault Systèmes S.A. is a leading company specializing in 3D and PLM software.Dassault Systèmes develops and markets PLM application software and services that support industrial processes and provide a 3D vision of the entire lifecycle of products from conception to maintenance to recycling... |
|
2002 | Inktomi Inktomi Inktomi Corporation was a California company that provided software for Internet service providers. It was founded in 1996 by UC Berkeley professor Eric Brewer and graduate student Paul Gauthier. The company was initially founded based on the real-world success of the search engine they developed... |
Acquired by Yahoo! |
2003 | Info.com Info.com Info.com is a metasearch engine which provides results from leading search engines and pay-per-click directories, including Google, Yahoo!, Bing.com, Ask, LookSmart, About and Open Directory.... |
Active |
2004 | Yahoo! Search Yahoo! Search Yahoo! Search is a web search engine, owned by Yahoo! Inc. and was , the 2nd largest search engine on the web by query volume, at 6.42%, after its competitor Google at 85.35% and before Baidu at 3.67%, according to Net Applications.... |
Active, Launched own web search (see Yahoo! Directory, 1995) |
A9.com A9.com A9.com is a subsidiary of Amazon.com based in Palo Alto, California that develops search engine technology. A9 currently has over 100 employees in its Palo Alto, Bangalore, and Dublin offices.A9 has worked in 3 areas over the years.... |
Closed | |
Sogou | Active | |
2005 | AOL Search | Active |
Ask.com Ask.com Ask is a Q&A focused search engine founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California. The original software was implemented by Gary Chevsky from his own design. Warthen, Chevsky, Justin Grant, and others built the early AskJeeves.com website around that core engine... |
Active | |
GoodSearch GoodSearch GoodSearch is a Yahoo-powered search engine that donates 50% of its revenue, about a penny per search, to listed American charities and schools designated by its users. The money donated comes from the site's advertisers... |
Active | |
SearchMe SearchMe SearchMe was a visual search engine based in Mountain View, California. It organized search results as snapshots of web pages — an interface similar to that of the iPhone's and iTunes's album selection.... |
Closed | |
2006 | wikiseek Wikiseek Wikiseek was a search engine that indexed Wikipedia pages and pages that were linked to from Wikipedia articles. The search engine was funded by a Palo Alto based Internet startup SearchMe and was officially launched on January 17, 2007. Most of the funding came from Sequoia Capital. It used Google... |
Active |
Quaero Quaero Quaero is a European research and development program with the goal of developing multimedia and multilingual indexing and management tools for professional and general public applications . The European Commission approved the aid granted by France on 11 March 2008.This program is supported by the... |
Active | |
Ask.com Ask.com Ask is a Q&A focused search engine founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California. The original software was implemented by Gary Chevsky from his own design. Warthen, Chevsky, Justin Grant, and others built the early AskJeeves.com website around that core engine... |
Active | |
Live Search | Active as Bing, Launched as rebranded MSN Search |
|
ChaCha ChaCha (search engine) ChaCha is a search engine that specializes in answering questions through a technique known as the human search engine. ChaCha was created by Scott A. Jones and Brad Bostic... |
Active | |
Guruji.com Guruji.com Guruji.com is an Indian Internet search engine that is focused on providing better search results to Indian consumers, by leveraging proprietary algorithms and data in the Indian context.-The Concept:... |
Active | |
2007 | wikiseek Wikiseek Wikiseek was a search engine that indexed Wikipedia pages and pages that were linked to from Wikipedia articles. The search engine was funded by a Palo Alto based Internet startup SearchMe and was officially launched on January 17, 2007. Most of the funding came from Sequoia Capital. It used Google... |
Closed |
Sproose Sproose Sproose is a consumer search engine launched in August 2007 by founder Bob Pack. Sproose provides web search results from partners including MSN, Yahoo! and Ask.com... |
Closed | |
Wikia Search Wikia Search Wikia Search was a short-lived free and open-source Web search engine launched by Wikia, a for-profit wiki-hosting company founded in late 2004 by Jimmy Wales and Angela Beesley.... |
Closed | |
Blackle.com Blackle.com Blackle is a website powered by Google Custom Search and created by Heap Media, which aims to save energy by displaying a black background and using grayish-white font color for search results... |
Active | |
2008 | Powerset Powerset (company) Powerset is a Microsoft owned company based in San Francisco, California that, in 2006, was developing a natural language search engine for the Internet.... |
Acquired by Microsoft |
Picollator Picollator Picollator - Internet search engine that performs search for web sites and multimedia by visual query or text, or a combination of visual query and text... |
Closed | |
Viewzi Viewzi Viewzi was a search engine company based in Dallas, Texas that developed a highly visual experience that tailored the way users look at information based on what they are looking for. The search engine lightened the data overload by filtering and grouping results into several distinct interfaces... |
Closed | |
Boogami Boogami Boogami is a search engine that was developed by James Wildish, a sixteen year old college student from Kent in United Kingdom. Prior to launch James, gained a partnership with Yahoo, to reduce server loads by using their search feed, as the operational costs of running his own web spider were too... |
Active | |
LeapFish LeapFish Leapfish.com and its parent company, dotnext, web site are both down. Their sites no longer display their homepages.LeapFish.com is a search aggregator that retrieves results from other portals and search engines, including Google, Bing and Yahoo!, and also search engines of Blogs, Videos etc... |
Closed | |
Forestle Forestle Forestle is an ecologically inspired search engine created by Christian Kroll, Wittenberg, Germany, in 2008. Forestle is a website for finding all kinds of information on the internet; Forestle helps to save the rain forest and aims to reduce CO2 emissions... |
Active | |
VADLO VADLO VADLO is a life sciences search engine, privately owned by Life in Research, LLC., based in Illinois, USA. VADLO caters to life sciences and biomedical researchers, educators, students, clinicians and reference librarians... |
Active | |
Duck Duck Go Duck Duck Go DuckDuckGo is a search engine that is based in Valley Forge, Pennsylvania and uses information from crowd-sourced sites with the aim of augmenting traditional results and improving relevance... |
Active, Aggregator | |
2009 | Bing Bing Bing is a web search engine from Microsoft.Bing may also refer to:* An onomatopœia of a bell sound* Bing cherry, a variety of cherry* Bing , Chinese flatbread* Bing , a German company that manufactured toys and kitchen utensils... |
Active, Launched as rebranded Live Search |
Yebol Yebol Yebol is a vertical "decision" search engine that had developed a knowledge-based, semantic search platform. Based in San Jose, CA, Yebol's artificial intelligence human intelligence-infused algorithms automatically cluster and categorize search results, web sites, pages and contents that it... |
Active | |
Search2.net Search2.net Search2.net is a search engine with an international index that is built on Nutch and online since 2009.Search2.net is located on server in Israel and has currently 5 million sites indexed.The search engine supported the OpenSearch description.... |
Active | |
Mugurdy Mugurdy Mugurdy is a visual search engine that launched its public beta in July 2009. In addition to textual results of search queries it shows a screenshot of each webpage. Mugurdy, created by Software du Jour, is the first company to enter the Guinness Enterprise Centre’s Microsoft BizSpark... |
Closed due to a lack of funding | |
Goby Goby Inc. Goby is a deep web search engine which launched in September 2009. The site searches selected databases and other sources of information on the web focused on 400 categories of things to do while traveling. Signed in users may also share their results utilizing the facebook connect applications... |
Active | |
2010 | Yandex Yandex Yandex is a Russian IT company which operates the largest search engine in Russia and develops a number of Internet-based services and products. Yandex is ranked as 5-th world largest search engine... |
Active, Launched global (English) search |
Cuil Cuil Cuil was a search engine that organized web pages by content and displayed relatively long entries along with thumbnail pictures for many results. Cuil said it had a larger index than any other search engine, with about 120 billion web pages. It went live on July 28, 2008... |
Closed | |
Blekko Blekko Blekko is a web search engine whose goal is to provide better search results than those offered by Google Search, by offering results culled from a set of 3 billion trusted websites and excluding material from such sites as content farms... |
Active | |
Yummly Yummly Yummly is a semantic web search engine for food, cooking and recipes. It ‘understands’ food on a variety of levels, allows users to search by ingredient, diet, allergy, nutrition, price, cuisine, time, taste, meal courses and sources, and ‘learns’ about users based on their likes and dislikes.... |
Active | |
Solusee | Active | |
2011 | ||
Interred Interred educational search engine Interred is a semantic web search engine for education. It was created at the National College 17 "Primera Junta" of Argentina, with websites of different subjects: biology, sociology, psychology, physics, mathematics, literature, English, Italian, French, Spanish, geology, others.Interred is... |
Active |
During the early development of the web, there was a list of webservers edited by Tim Berners-Lee
Tim Berners-Lee
Sir Timothy John "Tim" Berners-Lee, , also known as "TimBL", is a British computer scientist, MIT professor and the inventor of the World Wide Web...
and hosted on the CERN
CERN
The European Organization for Nuclear Research , known as CERN , is an international organization whose purpose is to operate the world's largest particle physics laboratory, which is situated in the northwest suburbs of Geneva on the Franco–Swiss border...
webserver. One historical snapshot from 1992 remains. As more webservers went online the central list could not keep up. On the NCSA
NCSA
NCSA may refer to:*National Center for Supercomputing Applications**NCSA HTTPd, an early webserver developed at this center*University of North Carolina School of the Arts*National Cyber Security Alliance...
site new servers were announced under the title "What's New!"
The very first tool used for searching on the Internet was Archie
Archie search engine
Archie is a tool for indexing FTP archives, allowing people to find specific files. It is considered to be the first Internet search engine. The original implementation was written in 1990 by Alan Emtage, Bill Heelan, and J...
.
The name stands for "archive" without the "v". It was created in 1990 by Alan Emtage
Alan Emtage
Alan Emtage conceived and implemented the first version of Archie, a pre-Web internet search engine for locating material in public FTP archives....
, Bill Heelan and J. Peter Deutsch, computer science students at McGill University
McGill University
Mohammed Fathy is a public research university located in Montreal, Quebec, Canada. The university bears the name of James McGill, a prominent Montreal merchant from Glasgow, Scotland, whose bequest formed the beginning of the university...
in Montreal
Montreal
Montreal is a city in Canada. It is the largest city in the province of Quebec, the second-largest city in Canada and the seventh largest in North America...
. The program downloaded the directory listings of all the files located on public anonymous FTP (File Transfer Protocol
File Transfer Protocol
File Transfer Protocol is a standard network protocol used to transfer files from one host to another host over a TCP-based network, such as the Internet. FTP is built on a client-server architecture and utilizes separate control and data connections between the client and server...
) sites, creating a searchable database of file names; however, Archie did not index the contents of these sites since the amount of data was so limited it could be readily searched manually.
The rise of Gopher (created in 1991 by Mark McCahill at the University of Minnesota
University of Minnesota
The University of Minnesota, Twin Cities is a public research university located in Minneapolis and St. Paul, Minnesota, United States. It is the oldest and largest part of the University of Minnesota system and has the fourth-largest main campus student body in the United States, with 52,557...
) led to two new search programs, Veronica
Veronica (computer)
Veronica is a search engine system for the Gopher protocol, developed in 1992 by Steven Foster and Fred Barrie at the University of Nevada, Reno.Veronica is a constantly updated database of the names of almost every menu item on thousands of Gopher servers...
and Jughead
Jughead (computer)
Jughead is a search engine system for the Gopher protocol. It is distinct from Veronica in that it searches a single server at a time.Jughead is officially an acronym for Jonzy's Universal Gopher Hierarchy Excavation And Display, though it was originally chosen to match that of the FTP search...
. Like Archie, they searched the file names and titles stored in Gopher index systems. Veronica (Very Easy Rodent-Oriented Net-wide Index to Computerized Archives) provided a keyword search of most Gopher menu titles in the entire Gopher listings. Jughead (Jonzy's Universal Gopher Hierarchy Excavation And Display) was a tool for obtaining menu information from specific Gopher servers. While the name of the search engine "Archie" was not a reference to the Archie comic book
Archie Comics
Archie Comics is an American comic book publisher headquartered in the Village of Mamaroneck, Town of Mamaroneck, New York, known for its many series featuring the fictional teenagers Archie Andrews, Betty Cooper, Veronica Lodge, Reggie Mantle and Jughead Jones. The characters were created by...
series, "Veronica
Veronica Lodge
Veronica Lodge is a fictional character in the Archie Comics books series.-Fictional history and character:She is called both by her name Veronica and her nickname Ronnie...
" and "Jughead
Jughead Jones
Jughead Jones is a fictional character in Archie Comics who first appeared in the comic in December 1941. He is the son of Forsythe II; although in one of the early Archie newspaper comic strips, he himself is identified as Forsythe Van Jones II...
" are characters in the series, thus referencing their predecessor.
In the summer of 1993, no search engine existed yet for the web, though numerous specialized catalogues were maintained by hand. Oscar Nierstrasz
Oscar Nierstrasz
Oscar Marius Nierstrasz, born , is a Professor at the Computer Science Institute at the University of Berne. He is active in the field of...
at the University of Geneva
University of Geneva
The University of Geneva is a public research university located in Geneva, Switzerland.It was founded in 1559 by John Calvin, as a theological seminary and law school. It remained focused on theology until the 17th century, when it became a center for Enlightenment scholarship. In 1873, it...
wrote a series of Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
scripts that would periodically mirror these pages and rewrite them into a standard format which formed the basis for W3Catalog
W3Catalog
W3 Catalog was a very early web search engine, first released on September 2, 1993 by developer Oscar Nierstrasz at the University of Geneva.Unlike later search engines, like Aliweb, which attempt to index the web by crawling over the accessible content of web sites, W3 Catalog exploited the fact...
, the web's first primitive search engine, released on September 2, 1993.
In June 1993, Matthew Gray, then at MIT
Massachusetts Institute of Technology
The Massachusetts Institute of Technology is a private research university located in Cambridge, Massachusetts. MIT has five schools and one college, containing a total of 32 academic departments, with a strong emphasis on scientific and technological education and research.Founded in 1861 in...
, produced what was probably the first web robot, the Perl
Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Perl was originally developed by Larry Wall in 1987 as a general-purpose Unix scripting language to make report processing easier. Since then, it has undergone many changes and revisions and become widely popular...
-based World Wide Web Wanderer
World Wide Web Wanderer
The World Wide Web Wanderer, also referred to as just the Wanderer, was a Perl-based web crawler that was first deployed in June 1993 to measure the size of the World Wide Web. The Wanderer was developed at the Massachusetts Institute of Technology by Matthew Gray, who now works for Google. It was...
, and used it to generate an index called 'Wandex'. The purpose of the Wanderer was to measure the size of the World Wide Web, which it did until late 1995. The web's second search engine Aliweb
Aliweb
ALIWEB is considered the first Web search engine, as its predecessors were either built with different purposes or were literally just indexers ....
appeared in November 1993. Aliweb did not use a web robot, but instead depended on being notified by website administrators of the existence at each site of an index file in a particular format.
JumpStation
JumpStation
JumpStation was the first WWW search engine that behaved, and appeared to the user, the way current web search engines do. It started indexing on Sunday 12 December 1993 and was announced on the Mosaic "What's New" webpage on 21 December 1993...
(released in December 1993) used a web robot
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...
to find web pages and to build its index, and used a web form as the interface to its query program. It was thus the first WWW resource-discovery tool to combine the three essential features of a web search engine (crawling, indexing, and searching) as described below. Because of the limited resources available on the platform on which it ran, its indexing and hence searching were limited to the titles and headings found in the web pages the crawler encountered.
One of the first "full text" crawler-based search engines was WebCrawler
WebCrawler
WebCrawler is a metasearch engine that blends the top search results from Google, Yahoo!, Bing Search , Ask.com, About.com, MIVA, LookSmart and other popular search engines. WebCrawler also provides users the option to search for images, audio, video, news, yellow pages and white pages...
, which came out in 1994. Unlike its predecessors, it let users search for any word in any webpage, which has become the standard for all major search engines since. It was also the first one to be widely known by the public. Also in 1994, Lycos
Lycos
Lycos, Inc. is a search engine and web portal established in 1994. Lycos also encompasses a network of email, webhosting, social networking, and entertainment websites.-Corporate history:...
(which started at Carnegie Mellon University
Carnegie Mellon University
Carnegie Mellon University is a private research university in Pittsburgh, Pennsylvania, United States....
) was launched and became a major commercial endeavor.
Soon after, many search engines appeared and vied for popularity. These included Magellan (search engine), Excite
Excite
Excite is a collection of Internet sites and services owned by IAC Search & Media, which is a subsidiary of InterActive Corporation . Launched in 1994, it is an online service offering a variety of content, including an Internet portal, a search engine, a web-based email, instant messaging, stock...
, Infoseek
Infoseek
Infoseek was a popular search engine founded in 1994 by Steve Kirsch.Infoseek was originally operated by the Infoseek Corporation, headquartered in Sunnyvale, California. Infoseek was bought by The Walt Disney Company in 1998, and the technology was merged with that of the Disney-acquired Starwave...
, Inktomi
Inktomi
Inktomi Corporation was a California company that provided software for Internet service providers. It was founded in 1996 by UC Berkeley professor Eric Brewer and graduate student Paul Gauthier. The company was initially founded based on the real-world success of the search engine they developed...
, Northern Light
Northern Light Group
Northern Light Group, LLC is a company specializing in strategic research portals, enterprise search technology, and text analytics solutions. The company provides custom, hosted, turnkey solutions for its clients using the software as a service delivery model. Northern Light markets its...
, and AltaVista
AltaVista
AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the most popular search engines but its popularity declined with the rise of Google...
. Yahoo!
Yahoo!
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...
was among the most popular ways for people to find web pages of interest, but its search function operated on its web directory
Web directory
A web directory or link directory is a directory on the World Wide Web. It specializes in linking to other web sites and categorizing those links....
, rather than full-text copies of web pages. Information seekers could also browse the directory instead of doing a keyword-based search.
In 1996, Netscape
Netscape
Netscape Communications is a US computer services company, best known for Netscape Navigator, its web browser. When it was an independent company, its headquarters were in Mountain View, California...
was looking to give a single search engine an exclusive deal to be the featured search engine on Netscape's web browser. There was so much interest that instead a deal was struck with Netscape by five of the major search engines, where for $5 million per year each search engine would be in rotation on the Netscape search engine page. The five engines were Yahoo!, Magellan, Lycos, Infoseek, and Excite.
Search engines were also known as some of the brightest stars in the Internet investing frenzy that occurred in the late 1990s. Several companies entered the market spectacularly, receiving record gains during their initial public offering
Initial public offering
An initial public offering or stock market launch, is the first sale of stock by a private company to the public. It can be used by either small or large companies to raise expansion capital and become publicly traded enterprises...
s. Some have taken down their public search engine, and are marketing enterprise-only editions, such as Northern Light. Many search engine companies were caught up in the dot-com bubble
Dot-com bubble
The dot-com bubble was a speculative bubble covering roughly 1995–2000 during which stock markets in industrialized nations saw their equity value rise rapidly from growth in the more...
, a speculation-driven market boom that peaked in 1999 and ended in 2001.
Around 2000, Google's search engine
Google search
Google or Google Web Search is a web search engine owned by Google Inc. Google Search is the most-used search engine on the World Wide Web, receiving several hundred million queries each day through its various services....
rose to prominence. The company achieved better results for many searches with an innovation called PageRank
PageRank
PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set...
. This iterative algorithm ranks web pages based on the number and PageRank of other web sites and pages that link there, on the premise that good or desirable pages are linked to more than others. Google also maintained a minimalist interface to its search engine. In contrast, many of its competitors embedded a search engine in a web portal
Web portal
A web portal or links page is a web site that functions as a point of access to information in the World Wide Web. A portal presents information from diverse sources in a unified way....
.
By 2000, Yahoo! was providing search services based on Inktomi's search engine. Yahoo! acquired Inktomi in 2002, and Overture
Overture
Overture in music is the term originally applied to the instrumental introduction to an opera...
(which owned AlltheWeb
AlltheWeb
AlltheWeb was an Internet search engine that made its debut in mid-1999. It grew out of FTP Search, Tor Egge's doctorate thesis at the Norwegian University of Science and Technology, which he started in 1994, which in turn resulted in the formation of Fast Search and Transfer, established on July...
and AltaVista) in 2003. Yahoo! switched to Google's search engine until 2004, when it launched its own search engine based on the combined technologies of its acquisitions.
Microsoft first launched MSN Search in the fall of 1998 using search results from Inktomi. In early 1999 the site began to display listings from Looksmart
LookSmart
LookSmart is an online advertising company based in San Francisco. LookSmart provides search advertising products and services to text advertisers, as well as targeted pay-per-click search and contextual advertising via its Search Advertising Network...
blended with results from Inktomi except for a short time in 1999 when results from AltaVista were used instead. In 2004, Microsoft
Microsoft
Microsoft Corporation is an American public multinational corporation headquartered in Redmond, Washington, USA that develops, manufactures, licenses, and supports a wide range of products and services predominantly related to computing through its various product divisions...
began a transition to its own search technology, powered by its own web crawler
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...
(called msnbot
Msnbot
msnbot was a web-crawling robot , deployed by Microsoft to collect documents from the web to build a searchable index for the MSN Search engine. It went into beta in 2004, and had full public release in 2005. The month of October 2010 saw the official retirement of msnbot and its replacement by...
).
Microsoft's rebranded search engine, Bing
Bing
Bing is a web search engine from Microsoft.Bing may also refer to:* An onomatopœia of a bell sound* Bing cherry, a variety of cherry* Bing , Chinese flatbread* Bing , a German company that manufactured toys and kitchen utensils...
, was launched on June 1, 2009. On July 29, 2009, Yahoo! and Microsoft finalized a deal in which Yahoo! Search
Yahoo! Search
Yahoo! Search is a web search engine, owned by Yahoo! Inc. and was , the 2nd largest search engine on the web by query volume, at 6.42%, after its competitor Google at 85.35% and before Baidu at 3.67%, according to Net Applications....
would be powered by Microsoft Bing technology.
How web search engines work
A search engine operates in the following order:- Web crawling
- IndexingIndex (search engine)Search engine indexing collects, parses, and stores data to facilitate fast and accurate information retrieval. Index design incorporates interdisciplinary concepts from linguistics, cognitive psychology, mathematics, informatics, physics, and computer science...
- SearchingWeb search queryA web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.- Types...
Web search engines work by storing information about many web pages, which they retrieve from the html itself. These pages are retrieved by a Web crawler
Web crawler
A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or—especially in the FOAF community—Web scutters.This process is called Web...
(sometimes also known as a spider) — an automated Web browser which follows every link on the site. Exclusions can be made by the use of robots.txt. The contents of each page are then analyzed to determine how it should be indexed (for example, words are extracted from the titles, headings, or special fields called meta tags). Data about web pages are stored in an index database for use in later queries. A query can be a single word. The purpose of an index is to allow information to be found as quickly as possible. Some search engines, such as Google
Google
Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...
, store all or part of the source page (referred to as a cache
Web cache
A web cache is a mechanism for the temporary storage of web documents, such as HTML pages and images, to reduce bandwidth usage, server load, and perceived lag...
) as well as information about the web pages, whereas others, such as AltaVista
AltaVista
AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the most popular search engines but its popularity declined with the rise of Google...
, store every word of every page they find. This cached page always holds the actual search text since it is the one that was actually indexed, so it can be very useful when the content of the current page has been updated and the search terms are no longer in it. This problem might be considered to be a mild form of linkrot, and Google's handling of it increases usability
Usability
Usability is the ease of use and learnability of a human-made object. The object of use can be a software application, website, book, tool, machine, process, or anything a human interacts with. A usability study may be conducted as a primary job function by a usability analyst or as a secondary job...
by satisfying user expectations
User expectations
User expectations refers to the consistency that users expect from products. Interaction design is very concerned with this topic. For example, our user expectations for traffic behavior is one of the more consistent ones because it is governed by traffic laws that are enforced...
that the search terms will be on the returned webpage. This satisfies the principle of least astonishment
Principle of least astonishment
The principle of least astonishment applies to user interface design, software design, and ergonomics. It is alternatively referred to as the rule or law of least astonishment, or the rule or principle of least surprise .The POLA states that, when two elements of an interface conflict, or are...
since the user normally expects the search terms to be on the returned pages. Increased search relevance makes these cached pages very useful, even beyond the fact that they may contain data that may no longer be available elsewhere.
When a user enters a query
Web search query
A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.- Types...
into a search engine (typically by using key words
Keyword (Internet search)
An index term, subject term, subject heading, or descriptor, in information retrieval, is a term that captures the essence of the topic of a document. Index terms make up a controlled vocabulary for use in bibliographic records. They are an integral part of bibliographic control, which is the...
), the engine examines its index
Inverted index
In computer science, an inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents...
and provides a listing of best-matching web pages according to its criteria, usually with a short summary containing the document's title and sometimes parts of the text. The index is built from the information stored with the data and the method by which the information is indexed. Unfortunately, there are currently no known public search engines that allow documents to be searched by date. Most search engines support the use of the boolean operators AND, OR and NOT to further specify the search query
Web search query
A web search query is a query that a user enters into web search engine to satisfy his or her information needs. Web search queries are distinctive in that they are unstructured and often ambiguous; they vary greatly from standard query languages which are governed by strict syntax rules.- Types...
. Boolean operators are for literal searches that allow the user to refine and extend the terms of the search. The engine looks for the words or phrases exactly as entered. Some search engines provide an advanced feature called proximity search which allows users to define the distance between keywords. There is also concept-based searching where the research involves using statistical analysis on pages containing the words or phrases you search for. As well, natural language queries allow the user to type a question in the same form one would ask it to a human. A site like this would be ask.com.
The usefulness of a search engine depends on the relevance
Relevance (information retrieval)
In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user.-Types:...
of the result set it gives back. While there may be millions of web pages that include a particular word or phrase, some pages may be more relevant, popular, or authoritative than others. Most search engines employ methods to rank the results to provide the "best" results first. How a search engine decides which pages are the best matches, and what order the results should be shown in, varies widely from one engine to another. The methods also change over time as Internet usage changes and new techniques evolve. There are two main types of search engine that have evolved: one is a system of predefined and hierarchically ordered keywords that humans have programmed extensively. The other is a system that generates an "inverted index
Inverted index
In computer science, an inverted index is an index data structure storing a mapping from content, such as words or numbers, to its locations in a database file, or in a document or a set of documents...
" by analyzing texts it locates. This second form relies much more heavily on the computer itself to do the bulk of the work.
Most Web search engines are commercial ventures supported by advertising
Advertising
Advertising is a form of communication used to persuade an audience to take some action with respect to products, ideas, or services. Most commonly, the desired result is to drive consumer behavior with respect to a commercial offering, although political and ideological advertising is also common...
revenue and, as a result, some employ the practice of allowing advertisers to pay money to have their listings ranked higher in search results. Those search engines which do not accept money for their search engine results make money by running search related ads
Contextual advertising
Contextual advertising is a form of targeted advertising for advertisements appearing on websites or other media, such as content displayed in mobile browsers...
alongside the regular search engine results. The search engines make money every time someone clicks on one of these ads.
Market share and wars
Search engine | |Market share in December 2010 | |
---|---|---|
Google Google Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program... |
||
Yahoo! Yahoo! Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,... |
||
Baidu Baidu Baidu, Inc. , simply known as Baidu and incorporated on January 18, 2000, is a Chinese web services company headquartered in the Baidu Campus in Haidian District, Beijing, People's Republic of China.... |
||
Bing Bing Bing is a web search engine from Microsoft.Bing may also refer to:* An onomatopœia of a bell sound* Bing cherry, a variety of cherry* Bing , Chinese flatbread* Bing , a German company that manufactured toys and kitchen utensils... |
||
Ask Ask.com Ask is a Q&A focused search engine founded in 1996 by Garrett Gruener and David Warthen in Berkeley, California. The original software was implemented by Gary Chevsky from his own design. Warthen, Chevsky, Justin Grant, and others built the early AskJeeves.com website around that core engine... |
||
AOL AOL AOL Inc. is an American global Internet services and media company. AOL is headquartered at 770 Broadway in New York. Founded in 1983 as Control Video Corporation, it has franchised its services to companies in several nations around the world or set up international versions of its services... |
Google's worldwide market share peaked at 86.3% in April 2010. Yahoo!
Yahoo!
Yahoo! Inc. is an American multinational internet corporation headquartered in Sunnyvale, California, United States. The company is perhaps best known for its web portal, search engine , Yahoo! Directory, Yahoo! Mail, Yahoo! News, Yahoo! Groups, Yahoo! Answers, advertising, online mapping ,...
, Bing
Bing
Bing is a web search engine from Microsoft.Bing may also refer to:* An onomatopœia of a bell sound* Bing cherry, a variety of cherry* Bing , Chinese flatbread* Bing , a German company that manufactured toys and kitchen utensils...
and other search engines are more popular in the US than in Europe
Europe
Europe is, by convention, one of the world's seven continents. Comprising the westernmost peninsula of Eurasia, Europe is generally 'divided' from Asia to its east by the watershed divides of the Ural and Caucasus Mountains, the Ural River, the Caspian and Black Seas, and the waterways connecting...
.
According to Hitwise
Hitwise
Experian Hitwise is a global online competitive intelligence service which collects data directly from ISP networks to aid website managers in analysing trends in visitor behavior and to measure website market share. The Hitwise product is a commercial platform whereby customers pay Hitwise a...
, market share in the U.S. for October 2011 was Google 65.38%, Bing-powered (Bing and Yahoo!) 28.62%, and the remaining 66 search engines 6%. However, the "success rate" of searches sampled in July. Over 80 percent of Yahoo! and Bing searches resulted in the users visiting a web site, while Google's rate was just under 68 percent.
An Experian Hitwise report released in August 2011 gave the "success rate" of searches sampled in July. Over 80 percent of Yahoo! and Bing searches resulted in the users visiting a web site, while Google's rate was just under 68 percent.
In the People's Republic of China
People's Republic of China
China , officially the People's Republic of China , is the most populous country in the world, with over 1.3 billion citizens. Located in East Asia, the country covers approximately 9.6 million square kilometres...
, Baidu held a 61.6% market share for web search in July 2009.
Search engine bias
Although search engines are programmed to rank websites based on their popularity and relevancy, empirical studies indicate various political, economic, and social biases in the information they provide. These biases could be a direct result of economic and commercial processes (e.g., companies that advertise with a search engine can become also more popular in its organic searchOrganic search
Organic search results are listings on search engine results pages that appear because of their relevance to the search terms, as opposed to their being advertisements. In contrast, non-organic search results may include pay per click advertising....
results), and political processes (e.g., the removal of search results in order to comply with local laws). Google Bombing is one example of an attempt to manipulate search results for political, social or commercial reasons.
Further reading
- For a more detailed history of early search engines, see Search Engine Birthdays (from Search Engine WatchSearch Engine WatchSearch Engine Watch is a website that provides news and information about search engines and search engine marketing. Search Engine Watch was started by Danny Sullivan in 1996. In 1997, Sullivan sold it for an undisclosed amount to MecklerMedia...
), Chris Sherman, September 2003. - Bing Liu (2007), Web Data Mining: Exploring Hyperlinks, Contents and Usage Data. Springer, ISBN 3540378812
- Bar-Ilan, J. (2004). The use of Web search engines in information science research. ARIST, 38, 231-288. ISBN 978-0-910965-76-7