F1 Score - AbsoluteAstronomy.com

Statistics

Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

, the F₁ score (also F-score or F-measure) is a measure of a test's accuracy. It considers both the precision p and the recall r of the test to compute the score: p is the number of correct results divided by the number of all returned results and r is the number of correct results divided by the number of results that should have been returned. The F₁ score can be interpreted as a weighted average of the precision and recall, where an F₁ score reaches its best value at 1 and worst score at 0.

The traditional F-measure or balanced F-score (F₁ score) is the harmonic mean of precision and recall:

.

The general formula for positive real β is:

.

The formula in terms of Type I and type II errors

Type I and type II errors

In statistical test theory the notion of statistical error is an integral part of hypothesis testing. The test requires an unambiguous statement of a null hypothesis, which usually corresponds to a default "state of nature", for example "this person is healthy", "this accused is not guilty" or...

.

Two other commonly used F measures are the

measure, which weights recall higher than precision, and the

measure, which puts more emphasis on precision than recall.

The F-measure was derived so that

"measures the effectiveness of retrieval with respect to a user who attaches β times as much importance to recall as precision" . It is based on van Rijsbergen's effectiveness measure

.

Their relationship is

where

Applications

The F-score is often used in the field of information retrieval

Information retrieval

Information retrieval is the area of study concerned with searching for documents, for information within documents, and for metadata about documents, as well as that of searching structured storage, relational databases, and the World Wide Web...

for measuring search, document classification

Document classification

Document classification or document categorization is a problem in both library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically...

, and query classification performance. Earlier works focused primarily on the F₁ score, but with the proliferation of large scale search engines, performance goals changed to place more emphasis on either precision or recall and so

is seen in wide application.

The F-score is also used in machine learning. Note, however, that the F-measures do not take the true negative rate into account, and that measures such as the Matthews correlation coefficient

Matthews Correlation Coefficient

The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes...

may be preferable to assess the performance of a binary classifier.

Applications

See also