Ranking Training

Learning to rank ( eng. Learning to rank or machine-learned ranking, MLR ) ^[1] is a class of machine learning tasks with a teacher consisting in automatically selecting a ranking model for a training sample consisting of many lists and given partial orders on the elements inside each list. A partial order is usually specified by specifying a rating for each element (for example, “relevant” or “not relevant”; more than two gradations may be used). The purpose of the ranking model is to best (in a sense) approximate and generalize the ranking method in the training sample to new data.

Ranking training is still a rather young, rapidly developing field of research that arose in the 2000s with the emergence of interest in the field of information retrieval for the application of machine learning methods to ranking problems.

Content

1 Application in information retrieval
- 1.1 Ranking features
2 Ranking Quality Metrics
3 Classification of algorithms
- 3.1 Pointwise approach
- 3.2 Pairwise approach
- 3.3 list approach
4 Practical application
- 4.1 In large search engines
5 notes

Information Retrieval Application

In relation to search engines , each list is a set of documents that satisfy a certain search query.

A training sample consists of a selection of search queries, a subset of the documents that correspond to them, and assessments of the relevance of each document to the query. They can be prepared either manually, by specially trained people (search quality evaluators or assessors ), or automatically, based on an analysis of user clicks ^[2] or search engine tools such as Google’s SearchWiki .

Ranking features

During the training of the ranking model and during its operation, each document-query pair is translated into a numerical vector of ranking features (also called ranking factors or signals) that characterize the properties of the document, query and their relationship. Such signs can be divided into three groups:

Query-independent or static characteristics - depending only on the document, but not on the request. For example, PageRank or document length. Such features are usually calculated at the stage of indexing documents and are often used to build an indicator of the static quality of a document, which is used to increase the efficiency of search engines. ^[3] ^[4]
Signs that depend only on the request. For example, "query about porn or not."
Query-dependent or dynamic characteristics - depending on both the document and the request. For example, a measure of TF-IDF matching a document to a query.

The following are some examples of ranking features used in the LETOR dataset widely known in the art: ^[5]

Values of measures TF, TF-IDF , BM25 , and the language model of matching the request of different areas of the document (title, URL, body text, link text);
Lengths and IDFs - sums of document zones;
Document ranks obtained by various variations of such link ranking algorithms as PageRank and HITS .

Ranking Quality Metrics

There are several metrics by which the quality of the performance of ranking algorithms in a sample is evaluated and compared with assessors. Often the parameters of the ranking model tend to be adjusted in such a way as to maximize the value of one of these metrics.

Examples of metrics:

DCG and NDCG ;
Accuracy @ n , NDCG @ n (@ n means that the metric value is considered only according to n best issuing documents);
MAP
average inverse rank ;
pfound is a development by Yandex . ^[6]

Algorithm Classification

In his article “Learning to Rank for Information Retrieval” ^[1] and speeches at thematic conferences, Tai-Yang Lew from Microsoft Research Asia analyzed the methods existing at that time for solving the task of ranking training and proposed their classification into three approaches, depending on the input data representation used and the penalty function:

Pointwise

In the pointwise approach ( English pointwise approach ) it is assumed that each pair of request-document is assigned a numerical rating. The task of ranking training comes down to building a regression : for each individual query-document pair, it is necessary to predict its assessment.

As part of this approach, many machine learning algorithms can be applied to regression problems. When estimates can take only a few values, algorithms for ordinal regression and classification can also be used.

Pairwise Approach

In a pairwise approach ( English pairwise approach ) ranking training is reduced to building a binary classifier, which receives two documents that correspond to the same query, and it is necessary to determine which one is better.

Examples of algorithms: ^[1] RankNet, FRank, RankBoost, RankSVM, IR-SVM.

List approach

The list approach ( English listwise approach ) is to build a model, at the input of which all the documents matching the request are received at once, and the output is their rearrangement. Model parameters are adjusted to directly maximize one of the ranking metrics listed above. But this is often difficult, since the ranking metrics are usually not continuous and non-differentiable with respect to the parameters of the ranking model, therefore they resort to maximizing some of their approximations or lower bounds.

Examples of algorithms: ^[1] SoftRank, SVM ^map , AdaRank, RankGP, ListNet, ListMLE.

Practical Application

Large Search Engines

Search engines of many modern search engines on the Internet, including Yandex , Yahoo ^[7] and Bing , use ranking models built using machine learning methods. Bing's search uses the RankNet algorithm. ^[8] The latest machine learning ranking algorithm developed and applied in the Yandex search engine is called MatrixNet; ^[9] Yandex itself sponsored the Internet Mathematics 2009 contest ^[10] to build a ranking algorithm on a set of its own data.

In an interview in early 2008, Peter Norvig , director of research at Google , said that their search engine is not yet ready to completely entrust ranking to machine learning algorithms, motivating it by the fact that, firstly, automatically created models can behave unpredictably on new query classes that are not similar to queries from the training set, compared to models created by expert people. Secondly, the creators of the current Google ranking algorithm are confident that their model is able to solve problems more efficiently than machine learning. ^[11] The first reason is of much greater interest to us, since it not only goes back to such a well-known problem of inductive logic formulated by the German mathematician K. G. Hempel and contradicting intuition (the statement "all crows are black" is logically equivalent to the fact that "all non-black objects are not crows"), but it also forces us to return to a number of unresolved issues by F. Rosenblatt, who created the world's first neural network capable of to perception and the formation of a response to the perceived stimulus - a single-layer perceptron. ^[12] Based on criticism of the elementary Rosenblatt perceptron , we can understand the vulnerability of this rating model, which Google experts tell us about: are artificial systems able to generalize their individual experience to a wide class of situations for which the response was not communicated to them in advance? No, the individual experience of artificial systems in practice is always limited and never complete. One way or another, machine learning tools can solve the problem of spamming with a fairly high degree of efficiency. ^[13]

Notes

↑ ¹ ² ³ ⁴ Tie-Yan Liu (2009), Learning to Rank for Information Retrieval , Foundations and Trends in Information Retrieval: Vol. 3: No 3, p. 225-331, ISBN 978-1-60198-244-5 , DOI 10.1561 / 1500000016 . Slides available Archived March 31, 2010. c. T. Lew's speech at the WWW 2009 conference
↑ Optimizing Search Engines using Clickthrough Data
↑ Static quality scores and ordering
↑ Richardson, M .; Prakash, A. and Brill, E. (2006). " Beyond PageRank: Machine Learning for Static Ranking ." Proceedings of the 15th International World Wide Web Conference : 707-715.
↑ LETOR 3.0. A Benchmark Collection for Learning to Rank for Information Retrieval
↑ Gulin A., Karpovich P., Raskovalov D., Segalovich I. Yandex on ROMIP'2009. Optimization of ranking algorithms by machine learning methods.
↑ Yahoo Launches World's Largest Hadoop Production Application Archived December 21, 2009 to Wayback Machine .
↑ Bing Search Blog: User Needs, Features and the Science behind Bing
↑ Roem.ru: Yandex launched a new formula Snezhinsk, now a thousand variables instead of 250.
↑ Internet Mathematics 2009
↑ Are Machine-Learned Models Prone to Catastrophic Errors? (eng.)
↑ Perceptrons: An Associative Learning Network
↑ Search spam detection. Part 15: The use of artificial neural networks (Russian)

[liu-1] ¹ ² ³ ⁴ Tie-Yan Liu (2009), Learning to Rank for Information Retrieval , Foundations and Trends in Information Retrieval: Vol. 3: No 3, p. 225-331, ISBN 978-1-60198-244-5 , DOI 10.1561 / 1500000016 . Slides available Archived March 31, 2010. c. T. Lew's speech at the WWW 2009 conference

[2] Optimizing Search Engines using Clickthrough Data

[3] Static quality scores and ordering

[4] Richardson, M .; Prakash, A. and Brill, E. (2006). " Beyond PageRank: Machine Learning for Static Ranking ." Proceedings of the 15th International World Wide Web Conference : 707-715.

[letor3-5] LETOR 3.0. A Benchmark Collection for Learning to Rank for Information Retrieval

[6] Gulin A., Karpovich P., Raskovalov D., Segalovich I. Yandex on ROMIP'2009. Optimization of ranking algorithms by machine learning methods.

[7] Yahoo Launches World's Largest Hadoop Production Application Archived December 21, 2009 to Wayback Machine .

[8] Bing Search Blog: User Needs, Features and the Science behind Bing

[9] Roem.ru: Yandex launched a new formula Snezhinsk, now a thousand variables instead of 250.

[10] Internet Mathematics 2009

[11] Are Machine-Learned Models Prone to Catastrophic Errors? (eng.)

[12] Perceptrons: An Associative Learning Network

[13] Search spam detection. Part 15: The use of artificial neural networks (Russian)