Language guessing
Encyclopedia
Language identification or language guessing is the process of automatically determining the (natural) language
a document or piece of text is written in.
Algorithm
s for this subtask of NLP
include those for the more general task of text classification
, but one of the most popular algorithms is a specialized algorithm devised by Cavnar and Trenkle, based on character-level n-gram
statistics. An older method by Grefenstette was based on the prevalence of certain function word
s (e.g., *the* in English).
Natural language
In the philosophy of language, a natural language is any language which arises in an unpremeditated fashion as the result of the innate facility for language possessed by the human intellect. A natural language is typically used for communication, and may be spoken, signed, or written...
a document or piece of text is written in.
Algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
s for this subtask of NLP
Natural language processing
Natural language processing is a field of computer science and linguistics concerned with the interactions between computers and human languages; it began as a branch of artificial intelligence....
include those for the more general task of text classification
Document classification
Document classification or document categorization is a problem in both library science, information science and computer science. The task is to assign a document to one or more classes or categories. This may be done "manually" or algorithmically...
, but one of the most popular algorithms is a specialized algorithm devised by Cavnar and Trenkle, based on character-level n-gram
N-gram
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items in question can be phonemes, syllables, letters, words or base pairs according to the application...
statistics. An older method by Grefenstette was based on the prevalence of certain function word
Function word
Function words are words that have little lexical meaning or have ambiguous meaning, but instead serve to express grammatical relationships with other words within a sentence, or specify the attitude or mood of the speaker...
s (e.g., *the* in English).