Learning to rank - AbsoluteAstronomy.com

Learning to rank or machine-learned ranking (MLR) is a type of supervised

Supervised learning

Supervised learning is the machine learning task of inferring a function from supervised training data. The training data consist of a set of training examples. In supervised learning, each example is a pair consisting of an input object and a desired output value...

or semi-supervised

Semi-supervised learning

In computer science, semi-supervised learning is a class of machine learning techniques that make use of both labeled and unlabeled data for training - typically a small amount of labeled data with a large amount of unlabeled data...

machine learning

Machine learning

Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

problem in which the goal is to automatically construct a ranking model

Ranking function

In information retrieval, a ranking function is a function used by search engines to rank matching documents according to their relevance to a given search query....

from training data. Training data consists of lists of items with some partial order specified between items in each list. This order is typically induced by giving a numerical or ordinal score or a binary judgment (e.g. "relevant" or "not relevant") for each item. Ranking model's purpose is to rank, i.e. produce a permutation

Permutation

In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting objects or values. Informally, a permutation of a set of objects is an arrangement of those objects into a particular order...

of items in new, unseen lists in a way, which is "similar" to rankings in the training data in some sense.

Learning to rank is a relatively new research area which has emerged in the past decade.

In information retrieval

Ranking is a central part of many information retrieval

Information retrieval

Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

problems, such as document retrieval

Document retrieval

Document retrieval is defined as the matching of some stated user query against a set of free-text records. These records could be any type of mainly unstructured text, such as newspaper articles, real estate records or paragraphs in a manual...

, collaborative filtering

Collaborative filtering

Collaborative filtering is the process of filtering for information or patterns using techniques involving collaboration among multiple agents, viewpoints, data sources, etc. Applications of collaborative filtering typically involve very large data sets...

, sentiment analysis

Sentiment analysis

Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials....

, computational advertising

Computational Advertising

Computational Advertising is a relatively new discipline within Computer Science that deals with algorithms of presenting the best advertisement displayed to a person, typically through an Internet browser.- Research and Academic Courses :...

(online ad placement).

A possible architecture of a machine-learned search engine is shown in the figure to the right.

Training data consists of queries and documents matching them together with relevance degree of each match. It may be prepared manually by human assessors (or raters, as Google

Google

Google Inc. is an American multinational public corporation invested in Internet search, cloud computing, and advertising technologies. Google hosts and develops a number of Internet-based services and products, and generates profit primarily from advertising through its AdWords program...

calls them),

who check results for some queries and determine relevance

Relevance (information retrieval)

In information science and information retrieval, relevance denotes how well a retrieved document or set of documents meets the information need of the user.-Types:...

of each result. It is not feasible to check relevance of all documents, and so typically a technique called pooling is used — only top few documents, retrieved by some existing ranking models are checked. Alternatively, training data may be derived automatically by analyzing clickthrough logs (i.e. search results which got clicks from users), query chains, or such search engines' features as Google's SearchWiki

Google SearchWiki

SearchWiki was a Google search feature which allowed logged-in users to annotate and re-order search results. The annotations and modified order only applied to the user's searches, but it was possible to view other users' annotations for a given search query. SearchWiki was released on November...

.

Training data is used by a learning algorithm to produce a ranking model which computes relevance of documents for actual queries.

Typically, users expect a search query to complete in a short time (such as a few hundred milliseconds for web search), which makes it impossible to evaluate a complex ranking model on each document in the corpus, and so a two-phase scheme is used. First, a small number of potentially relevant documents are identified using simpler retrieval models which permit fast query evaluation, such as vector space model

Vector space model

Vector space model is an algebraic model for representing text documents as vectors of identifiers, such as, for example, index terms. It is used in information filtering, information retrieval, indexing and relevancy rankings...

, boolean model

Standard Boolean model

The Boolean model of information retrieval is a classical information retrieval model and, at the same time, the first and most adopted one. It is used by virtually all commercial IR systems today.-Definitions:...

, weighted AND, BM25

Okapi BM25

In information retrieval, Okapi BM25 is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and...

. This phase is called top- document retrieval and many good heuristics were proposed in the literature to accelerate it, such as using document's static quality score and tiered indexes. In the second phase, a more accurate but computationally expensive machine-learned model is used to re-rank these documents.

In other areas

Learning to rank algorithms have been applied in areas other than information retrieval:

In machine translation
Machine translation
Machine translation, sometimes referred to by the abbreviation MT is a sub-field of computational linguistics that investigates the use of computer software to translate text or speech from one natural language to another.On a basic...

for ranking a set of hypothesized translations;
In computational biology
Computational biology
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems...

for ranking candidate 3-D structures in protein structure prediction problem.
In proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...

for the identification of frequent top scoring peptides.
In Recommender system for identifying a ranked list of related news articles to recommend to a user after he or she has read a current news article.

Feature vectors

For convenience of MLR algorithms, query-document pairs are usually represented by numerical vectors, which are called feature vectors. Such approach is sometimes called bag of features and is analogous to bag of words and vector space model

Vector space model

used in information retrieval for representation of documents.

Components of such vectors are called features, factors or ranking signals. They may be divided into three groups (features from document retrieval

Document retrieval

are shown as examples):

Query-independent or static features — those features, which depend only on the document, but not on the query. For example, PageRank
PageRank
PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set...

or document's length. Such features can be precomputed in off-line mode during indexing. They may be used to compute document's static quality score (or static rank), which is often used to speed up search query evaluation.
Query-dependent or dynamic features — those features, which depend both on the contents of the document and the query, such as TF-IDF score or other non-machine-learned ranking functions.
Query features, which depend only on the query. For example, the number of words in a query.

Some examples of features, which were used in the well-known LETOR dataset:

TF, TF-IDF, BM25
Okapi BM25
In information retrieval, Okapi BM25 is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework developed in the 1970s and 1980s by Stephen E. Robertson, Karen Spärck Jones, and...

, and language modeling scores of document's zones (title, body, anchors text, URL) for a given query;
Lengths and IDF sums of document's zones;
Document's PageRank
PageRank
PageRank is a link analysis algorithm, named after Larry Page and used by the Google Internet search engine, that assigns a numerical weighting to each element of a hyperlinked set of documents, such as the World Wide Web, with the purpose of "measuring" its relative importance within the set...

, HITS
HITS algorithm
Hyperlink-Induced Topic Search is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. It was a precursor to PageRank...

ranks and their variants.

Selecting and designing good features is an important area in machine learning, which is called feature engineering.

Evaluation measures

There are several measures (metrics) which are commonly used to judge how well an algorithm is doing on training data and to compare performance of different MLR algorithms. Often a learning-to-rank problem is reformulated as an optimization problem with respect to one of these metrics.

Examples of ranking quality measures:

Mean average precision (MAP);
DCG
Discounted cumulative gain
Discounted cumulative gain is a measure of effectiveness of a Web search engine algorithm or related applications, often used in information retrieval. Using a graded relevance scale of documents in a search engine result set, DCG measures the usefulness, or gain, of a document based on its...

and NDCG;
Precision@n, NDCG@n, where "@n" denotes that the metrics are evaluated only on top n documents;
Mean reciprocal rank
Mean reciprocal rank
Mean reciprocal rank is a statistic for evaluating any process that produces a list of possible responses to a query, ordered by probability of correctness. The reciprocal rank of a query response is the multiplicative inverse of the rank of the first correct answer...

;
Kendall's tau

DCG and its normalized variant NDCG are usually preferred in academic research when multiple levels of relevance are used. Other metrics such as MAP, MRR and precision, are defined only for binary judgements.

Recently, there have been proposed several new evaluation metrics which claim to model user's satisfaction with search results better than the DCG metric:

Expected reciprocal rank (ERR);
Yandex
Yandex
Yandex is a Russian IT company which operates the largest search engine in Russia and develops a number of Internet-based services and products. Yandex is ranked as 5-th world largest search engine...

's pfound.

Both of these metrics are based on the assumption that the user is more likely to stop looking at search results after examining a more relevant document, than after a less relevant document.

Approaches

Tie-Yan Liu of Microsoft Research Asia

Microsoft Research Asia

Microsoft Research Asia, Microsoft’s fundamental research arm in the Asia Pacific region, was founded on November 5, 1998. In 2004, Technology Review named Microsoft Research Asia “the hottest computer lab in the world”....

in his paper "Learning to Rank for Information Retrieval" and talks at several leading conferences has analyzed existing algorithms for learning to rank problems and categorized them into three groups by their input representation and loss function

Loss function

In statistics and decision theory a loss function is a function that maps an event onto a real number intuitively representing some "cost" associated with the event. Typically it is used for parameter estimation, and the event in question is some function of the difference between estimated and...

Pointwise approach

In this case it is assumed that each query-document pair in the training data has a numerical or ordinal score. Then learning-to-rank problem can be approximated by a regression problem — given a single query-document pair, predict its score.

A number of existing supervised

Supervised learning

machine learning algorithms can be readily used for this purpose. Ordinal regression and classification algorithms can also be used in pointwise approach when they are used to predict score of a single query-document pair, and it takes a small, finite number of values.

Pairwise approach

In this case learning-to-rank problem is approximated by a classification problem — learning a binary classifier which can tell which document is better in a given pair of documents. The goal is to minimize average number of inversions in ranking.

Listwise approach

These algorithms try to directly optimize the value of one of the above evaluation measures, averaged over all queries in the training data. This is difficult because most evaluation measures are not continuous functions with respect to ranking model's parameters, and so continuous approximations or bounds on evaluation measures have to be used.

List of methods

A partial list of published learning-to-rank algorithms is shown below with years of first publication of each method:

Year	Name	Type	Notes
2000	Ranking SVM (RankSVM)	2 pairwise	A more recent exposition is in, which describes an application to ranking using clickthrough logs.
2002	Pranking	1 pointwise	Ordinal regression.
2003	RankBoost	2 pairwise
2005	RankNet	2 pairwise
2006	IR-SVM	2 pairwise	Ranking SVM with query-level normalization in the loss function.
2006	LambdaRank	3 listwise	RankNet in which pairwise loss function is multiplied by the change in the IR metric caused by a swap.
2007	AdaRank	3 listwise
2007	FRank	2 pairwise	Based on RankNet, uses a different loss function - fidelity loss.
2007	GBRank	2 pairwise
2007	ListNet	3 listwise
2007	McRank	1 pointwise
2007	QBRank	2 pairwise
2007	RankCosine	3 listwise
2007	RankGP	3 listwise
2007	RankRLS	2 pairwise	Regularized least-squares based ranking. The work is extended in to learning to rank from general preference graphs.
2007	SVM^map	3 listwise
2008	[ftp://ftp.research.microsoft.com/pub/tr/TR-2008-109.pdf LambdaMART]	3 listwise	Winning entry in the recent Yahoo Learning to Rank competition used an ensemble of LambdaMART models.
2008	ListMLE	3 listwise	Based on ListNet.
2008	PermuRank	3 listwise
2008	SoftRank	3 listwise
2008	Ranking Refinement	2 pairwise	A semi-supervised approach to learning to rank that uses Boosting.
2008	SSRankBoost	2 pairwise	An extension of RankBoost to learn with partially labeled data (semi-supervised learning to rank)
2008	SortNet	2 pairwise	SortNet, an adaptive ranking algorithm which orders objects using a neural network as a comparator.
2009	MPBoost	2 pairwise	Magnitude-preserving variant of RankBoost. The idea is that the more unequal are labels of a pair of documents, the harder should the algorithm try to rank them.
2009	BoltzRank	3 listwise	Unlike earlier methods, BoltzRank produces a ranking model that looks during query time not just at a single document, but also at pairs of documents.
2009	BayesRank	3 listwise	Based on ListNet.
2010	NDCG Boost	3 listwise	A boosting approach to optimize NDCG.
2010	GBlend	2 pairwise	Extends GBRank to the learning-to-blend problem of jointly solving multiple learning-to-rank problems with some shared features.
2010	IntervalRank	2 pairwise & listwise
2010	CRR	2 pointwise & pairwise	Combined Regression and Ranking. Uses stochastic gradient descent Stochastic gradient descent Stochastic gradient descent is an optimization method for minimizing an objective function that is written as a sum of differentiable functions.- Background :... to optimize a linear combination of a pointwise quadratic loss and a pairwise hinge loss from Ranking SVM.

Note: as most supervised learning

Supervised learning

algorithms can be applied to pointwise case, only those methods which are specifically designed with ranking in mind are shown above.

History

C. Manning et al. trace earliest works on learning to rank problem to papers in late 1980s

1980s

File:1980s decade montage.png|thumb|400px|From left, clockwise: The first Space Shuttle, Columbia, lifted off in 1981; American President Ronald Reagan and Soviet leader Mikhail Gorbachev eased tensions between the two superpowers, leading to the end of the Cold War; The Fall of the Berlin Wall in...

and early 1990s

1990s

File:1990s decade montage.png|From left, clockwise: The Hubble Space Telescope floats in space after it was taken up in 1990; American F-16s and F-15s fly over burning oil fields and the USA Lexie in Operation Desert Storm, also known as the 1991 Gulf War; The signing of the Oslo Accords on...

. They suggest that these early works achieved limited results in their time due to little available training data and poor machine learning techniques.

In mid-1990s Berkeley researchers used logistic regression

Logistic regression

In statistics, logistic regression is used for prediction of the probability of occurrence of an event by fitting data to a logit function logistic curve. It is a generalized linear model used for binomial regression...

to train a successful ranking function at TREC

Text Retrieval Conference

The Text REtrieval Conference is an on-going series of workshops focusing on a list of different information retrieval research areas, or tracks. It is co-sponsored by the National Institute of Standards and Technology and the Intelligence Advanced Research Projects Activity , and began in 1992...

conference.

Several conferences, such as NIPS

Neural Information Processing Systems

The Conference on Neural Information Processing Systems is a machine learning and computational neuroscience conference held every December. The conference is a single track meeting that includes invited talks as well as oral and poster presentations of refereed papers...

, SIGIR

Special Interest Group on Information Retrieval

SIGIR is the Association for Computing Machinery's Special Interest Group on Information Retrieval. The scope of the group's specialty is the theory and application of computers to the acquisition, organization, storage, retrieval and distribution of information; emphasis is placed on working with...

and ICML had workshops devoted to the learning-to-rank problem since mid-2000s, and this has stimulated much of academic research.

Practical usage by search engines

Commercial web search engine

Web search engine

A web search engine is designed to search for information on the World Wide Web and FTP servers. The search results are generally presented in a list of results often referred to as SERPS, or "search engine results pages". The information may consist of web pages, images, information and other...

s began using machine learned ranking systems since 2000s. One of the first search engines to start using it was AltaVista

AltaVista

AltaVista is a web search engine owned by Yahoo!. AltaVista was once one of the most popular search engines but its popularity declined with the rise of Google...

(later its technology was acquired by Overture, and then Yahoo), which launched a gradient boosting

Gradient boosting

Gradient boosting is a machine learning technique for regression problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by...

-trained ranking function in April 2003.

Bing's search is said to be powered by RankNet algorithm, which was invented at Microsoft Research

Microsoft Research

Microsoft Research is the research division of Microsoft created in 1991 for developing various computer science ideas and integrating them into Microsoft products. It currently employs Turing Award winners C.A.R. Hoare, Butler Lampson, and Charles P...

in 2005.

In November 2009 a Russian search engine Yandex

Yandex

Yandex is a Russian IT company which operates the largest search engine in Russia and develops a number of Internet-based services and products. Yandex is ranked as 5-th world largest search engine...

announced that it had significantly increased its search quality due to deployment of a new proprietary MatrixNet algorithm, a variant of gradient boosting

Gradient boosting

method which uses oblivious decision trees. Recently they have also sponsored a machine-learned ranking competition "Internet Mathematics 2009" based on their own search engine's production data. Yahoo has announced a similar competition in 2010.

As of 2008, Google

Google

's Peter Norvig

Peter Norvig

Peter Norvig is an American computer scientist. He is currently the Director of Research at Google Inc.-Educational Background:...

denied that their search engine exclusively relies on machine-learned ranking. Cuil

Cuil

Cuil was a search engine that organized web pages by content and displayed relatively long entries along with thumbnail pictures for many results. Cuil said it had a larger index than any other search engine, with about 120 billion web pages. It went live on July 28, 2008...

's CEO, Tom Costello

Tom Costello

Tom Costello was an American jockey in the sport of Thoroughbred horse racing who won three American Classic Races.As a young boy, Costello lived at the New York House of Refuge, a place for juveniles convicted of crimes or adjudicated as vagrants...

, suggests that they prefer hand-built models because they can outperform machine-learned models when measured against metrics like click-through rate or time on landing page, which is because machine-learned models "learn what people say they like, not what people actually like".

External links

Competitions and public datasets

Open Source code

The source of this article is wikipedia, the free encyclopedia. The text of this article is licensed under the GFDL.