Frequency list
Encyclopedia
In computational linguistics
, a frequency list is a sorted list of word
s (word types) together with their frequency
, where frequency here usually means the number of occurrences in a given corpus
. A short example could be:
It seems that Zipf's law holds for frequency lists drawn from longer texts of any natural language. Frequency lists are a necessary prerequisite for building of an electronic dictionary, which is by itself a prerequisite for a wide range of applications in computational linguistics
.
German linguists define the häufigkeitsklasse (frequency class) of an item in the list using the base 2 logarithm
of the ratio between its frequency and the frequency of the most frequent item. The most common item belongs to frequency class 0 (zero) and any item that is approximately half as frequent belongs in class 1. In the example list above, the misspelled word outragious has a ratio of 76/3789654 and belongs in class 16.
where is the floor function
.
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
, a frequency list is a sorted list of word
Word
In language, a word is the smallest free form that may be uttered in isolation with semantic or pragmatic content . This contrasts with a morpheme, which is the smallest unit of meaning but will not necessarily stand on its own...
s (word types) together with their frequency
Frequency
Frequency is the number of occurrences of a repeating event per unit time. It is also referred to as temporal frequency.The period is the duration of one cycle in a repeating event, so the period is the reciprocal of the frequency...
, where frequency here usually means the number of occurrences in a given corpus
Text corpus
In linguistics, a corpus or text corpus is a large and structured set of texts...
. A short example could be:
the | 3789654 |
he | 2098762 |
[...] | |
king | 57897 |
boy | 56975 |
[...] | |
outragious [sic] | 76 |
[...] | |
stringyfy | 5 |
[...] | |
transducionalify | 1 |
It seems that Zipf's law holds for frequency lists drawn from longer texts of any natural language. Frequency lists are a necessary prerequisite for building of an electronic dictionary, which is by itself a prerequisite for a wide range of applications in computational linguistics
Computational linguistics
Computational linguistics is an interdisciplinary field dealing with the statistical or rule-based modeling of natural language from a computational perspective....
.
German linguists define the häufigkeitsklasse (frequency class) of an item in the list using the base 2 logarithm
Binary logarithm
In mathematics, the binary logarithm is the logarithm to the base 2. It is the inverse function of n ↦ 2n. The binary logarithm of n is the power to which the number 2 must be raised to obtain the value n. This makes the binary logarithm useful for anything involving powers of 2,...
of the ratio between its frequency and the frequency of the most frequent item. The most common item belongs to frequency class 0 (zero) and any item that is approximately half as frequent belongs in class 1. In the example list above, the misspelled word outragious has a ratio of 76/3789654 and belongs in class 16.
where is the floor function
Floor function
In mathematics and computer science, the floor and ceiling functions map a real number to the largest previous or the smallest following integer, respectively...
.
See also
- Letter frequenciesLetter frequenciesThe frequency of letters in text has often been studied for use in cryptography, and frequency analysis in particular. No exact letter frequency distribution underlies a given language, since all writers write slightly differently. Linotype machines sorted the letters' frequencies as etaoin shrdlu...
- Most common words in EnglishMost common words in EnglishThe list below of most common words in English cannot be definitive. It is based on an analysis of the Oxford English Corpus of over a billion words, and represents one study done by Oxford Online, associated with the Oxford English Dictionary...
- The Long TailThe Long TailThe Long Tail or long tail refers to the statistical property that a larger share of population rests within the tail of a probability distribution than observed under a 'normal' or Gaussian distribution...