Lexicostatistics
Encyclopedia
Lexicostatistics is an approach to comparative linguistics
that involves quantitative comparison of lexical cognates. Lexicostatistics is related to the comparative method
but does not reconstruct a proto-language
. It is to be distinguished from glottochronology
, which attempts to use lexicostatistical methods to estimate the length of time since two or more languages diverged from a common earlier proto-language. This is merely one application of lexicostatistics, however; other applications of it may not share the assumption of a constant rate of change for basic lexical items.
The term "lexicostatistics" is misleading in that mathematical equations are used but not statistics. Other features of a language may be used other than the lexicon, though this is not usual. Whereas the comparative method used shared identified innovations to determine sub-groups, lexicostatistics does not identify these. The latter is a distance based method but the comparative method considers language characters directly. The lexicostatistics method is a simple and fast technique relative to the comparative method but has limitations that are discussed below. It can be validated by cross-checking the trees produced by both methods.
in a series of articles in the 1950s, based on earlier ideas. The concept's first known use was by Dumont d'Urville in 1834 who compared various "Oceanic" languages and proposed a method for calculating a coefficient of relationship. Hymes (1960) and Embleton (1986) both review the history of lexicostatistics.
Calculations need to be made of nucleus and group lexical percentages.
. He used lexicostatistics to classify Austronesian languages as well as Indo-European ones. A major study of the latter was reported by Dyen, Kruskal and Black (1992). Studies have also been carried out of Amerindian and African languages.
Factors such as borrowing, tradition and taboo can skew the results, as with other methods. Sometimes lexicostatistics has been used with lexical similarity being used rather than cognacy to find resemblances. This is then equivalent to mass comparison.
The choice of meaning slots is subjective as is the choice of synonyms.
Comparative linguistics
Comparative linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness....
that involves quantitative comparison of lexical cognates. Lexicostatistics is related to the comparative method
Comparative method
In linguistics, the comparative method is a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor, as opposed to the method of internal reconstruction, which analyzes the internal...
but does not reconstruct a proto-language
Proto-language
A proto-language in the tree model of historical linguistics is the common ancestor of the languages that form a language family. Occasionally, the German term Ursprache is used instead.Often the proto-language is not known directly...
. It is to be distinguished from glottochronology
Glottochronology
Glottochronology is that part of lexicostatistics dealing with the chronological relationship between languages....
, which attempts to use lexicostatistical methods to estimate the length of time since two or more languages diverged from a common earlier proto-language. This is merely one application of lexicostatistics, however; other applications of it may not share the assumption of a constant rate of change for basic lexical items.
The term "lexicostatistics" is misleading in that mathematical equations are used but not statistics. Other features of a language may be used other than the lexicon, though this is not usual. Whereas the comparative method used shared identified innovations to determine sub-groups, lexicostatistics does not identify these. The latter is a distance based method but the comparative method considers language characters directly. The lexicostatistics method is a simple and fast technique relative to the comparative method but has limitations that are discussed below. It can be validated by cross-checking the trees produced by both methods.
History
Lexicostatistics was developed by Morris SwadeshMorris Swadesh
Morris Swadesh was an influential and controversial American linguist. In his work, he applied basic concepts in historical linguistics to the Indigenous languages of the Americas...
in a series of articles in the 1950s, based on earlier ideas. The concept's first known use was by Dumont d'Urville in 1834 who compared various "Oceanic" languages and proposed a method for calculating a coefficient of relationship. Hymes (1960) and Embleton (1986) both review the history of lexicostatistics.
Create word list
The aim is to generate a list of universally used meanings (hand, mouth, sky, I). Words are then collected for these meaning slots for each language being considered. Swadesh reduced a larger set of meanings down to 200 originally. He later found that it was necessary to reduce it further but that he could include some meanings that were not in his original list, giving his later 100-item list. The Swadesh List in Wiktionary gives the total 207 meanings in a number of languages. Alternative lists for particular purposes have been generated e.g. Dyen, Kruskal and Black have 200 meanings for 84 Indo-European languages in digital form.Determine cognacies
Cognacy decisions need to be made by a trained and experienced linguist. However, the decisions may need to be refined as the state of knowledge increases. However, lexicostatistics does not rely on all the decisions being correct. For each pair of lists the cognacy of a form could be positive, negative or indeterminate. Sometimes a language has two words for one meaning, e.g. small and little for not big.Calculate lexicostatistic percentages
This percentage is related to the proportion of meanings for a particular language pair that are cognate, i.e. relative to the total without indeterminacy. This value is entered into a N x N table of distances, where N is the number of languages being compared. When complete this table is half-filled in triangular form. The higher the proportion of cognacy the closer the languages are related.Create family tree
Creation of the language tree is based solely on the table found above. Various sub-grouping methods can be used but that adopted by Dyen, Krustal and Black was:- all lists are placed in a pool
- the two closest members are removed and form a nucleus which is placed in the pool
- this step is repeated
- under certain conditions a nucleus becomes a group
- this is repeated until the pool only contains one group.
Calculations need to be made of nucleus and group lexical percentages.
Applications
A leading exponent of lexicostatistics application has been Isidore DyenIsidore Dyen
Isidore Dyen was an American linguist, Professor Emeritus of Malayo-Polynesian and Comparative Linguistics at Yale University...
. He used lexicostatistics to classify Austronesian languages as well as Indo-European ones. A major study of the latter was reported by Dyen, Kruskal and Black (1992). Studies have also been carried out of Amerindian and African languages.
Criticisms
People such as Hoijer (1956) have showed that there were difficulties in finding equivalents to the meaning items while many have found it necessary to modify Swadesh's lists. Gudschinsky (1956) questioned whether it was possible to obtain a universal list.Factors such as borrowing, tradition and taboo can skew the results, as with other methods. Sometimes lexicostatistics has been used with lexical similarity being used rather than cognacy to find resemblances. This is then equivalent to mass comparison.
The choice of meaning slots is subjective as is the choice of synonyms.
Improved methods
Some of the modern computational statistical hypothesis testing methods can be regarded as improvements of lexicostatistics in that they use similar word lists and distance measures.See also
- Swadesh listSwadesh listA Swadesh list is one of several lists of vocabulary with basic meanings, developed by Morris Swadesh from 1940 onward, with the final, posthumously published version 1971 [1972], which is used in lexicostatistics and glottochronology .- Versions and authors :There are several versions of Swadesh...
- Intercontinental Dictionary SeriesIntercontinental Dictionary SeriesThe Intercontinental Dictionary Series is a large database of topical vocabulary lists in various world languages. The general editor of the database is Bernard Comrie of the Max Planck Institute for Evolutionary Anthropology, Leipzig. Mary Ritchie Key of the University of California, Irvine is the...
- GlottochronologyGlottochronologyGlottochronology is that part of lexicostatistics dealing with the chronological relationship between languages....
- Mass lexical comparisonMass lexical comparisonMass comparison is a method developed by Joseph Greenberg to determine the level of genetic relatedness between languages. It is now usually called multilateral comparison...
- Basic EnglishBasic EnglishBasic English, also known as Simple English, is an English-based controlled language created by linguist and philosopher Charles Kay Ogden as an international auxiliary language, and as an aid for teaching English as a Second Language...
- Historical linguisticsHistorical linguisticsHistorical linguistics is the study of language change. It has five main concerns:* to describe and account for observed changes in particular languages...
- Proto-languageProto-languageA proto-language in the tree model of historical linguistics is the common ancestor of the languages that form a language family. Occasionally, the German term Ursprache is used instead.Often the proto-language is not known directly...
- CognateCognateIn linguistics, cognates are words that have a common etymological origin. This learned term derives from the Latin cognatus . Cognates within the same language are called doublets. Strictly speaking, loanwords from another language are usually not meant by the term, e.g...
- Indo-European studiesIndo-European studiesIndo-European studies is a field of linguistics dealing with Indo-European languages, both current and extinct. Its goal is to amass information about the hypothetical proto-language from which all of these languages are descended, a language dubbed Proto-Indo-European , and its speakers, the...
- Comparative linguisticsComparative linguisticsComparative linguistics is a branch of historical linguistics that is concerned with comparing languages to establish their historical relatedness....
- Comparative methodComparative methodIn linguistics, the comparative method is a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor, as opposed to the method of internal reconstruction, which analyzes the internal...
- Linguistic distanceLinguistic distanceLinguistic distance is a term loosely used to describe how different one language or dialect is from another. Although there is no uniform approach to quantifying linguistic distance between languages, the concept is used in a variety of linguistic situations, such as learning additional languages,...