Glottochronology
Encyclopedia
Glottochronology is that part of lexicostatistics
dealing with the chronological relationship between languages.
The idea has been developed by Morris Swadesh
under two assumptions: First that there exists a relatively stable "basic vocabulary" (therefore called "Swadesh list
s") in all languages of the world, and secondly that any replacements happen in a way analogical to that in radioactive decay
in constant percentages per time elapsed. Meanwhile there exist many different methods, partly extensions of the Swadesh method, now more and more methods under biological assumptions of replacements in genes. However, Swadesh's technique is so well known that, for many people, 'glottochronology' refers to it alone.
The percentage of cognate
s (words that have a common origin) in these word lists is then measured. The larger the percentage of cognates, the more recently the two languages being compared are presumed to have separated.
obtained a value for the "glottochronological constant" of words by considering the known changes in 13 pairs of languages using the 200 word list. He obtained a value of 0.806 ± 0.0176 with 90% confidence. For the 100 word list Swadesh obtained a value of 0.86, the higher value reflecting the elimination of semantically unstable words. This constant may be related to the retention rate of words by:-
where L is the rate of replacement, ln is the logarithm to base e, and r is the glottochronological constant
where t = a given period of time from one stage of the language to another, c = proportion of wordlist items retained at the end of that period, and L = rate of replacement for that word list.
By testing historically verifiable cases where we have knowledge of t through non-linguistic data (e. g. the approximate distance from Classical Latin to modern Romance languages), Swadesh arrived at the empirical value of approximately 0.14 for L (meaning that the rate of replacement constitutes around 14 words from the 100-wordlist per millennium).
Note that the approach of Gray and Atkinson, after their own words, have nothing to do with "glottochronology".
Glottochronology has been controversial ever since, partly owing to issues of accuracy, as well as the question of whether its basis is sound (see e.g. Bergsland 1958; Bergsland and Vogt 1962; Fodor 1961; Chretien 1962; Guy 1980). These concerns have been addressed by Dobson et al. (1972), Dyen (1973) and Kruskal, Dyen and Black (1973). The assumption of a single-word replacement rate can distort the divergence-time estimate when borrowed words are included (Thomason and Kaufman 1988). Chrétien purported to disprove the mathematics of the Swadesh-model. At a conference at Yale in 1971 his criticisms were shown to be invalid. The same conference saw the application of the theory to Creole language
(Wittmann 1973).
An overview of recent arguments can be obtained from the papers of a conference held at the McDonald Institute in 2000. See Renfrew, McMahon and Trask, 2002. These presentations vary from "Why linguists don't do dates" to the one by Starostin
discussed above.
Since its original inception, glottochronology has been rejected by many linguists, mostly Indo-Europeanists of the school of the traditional comparative method
. Criticisms have been answered in particular around three points of discussion.
Brainard (1970) allowed for chance cognation and drift effects was introduced by Gleason (1959). Sankoff (1973) suggested introducing a borrowing parameter and allowed synonyms.
A combination of these various improvements is given in Sankoff's "Fully Parameterised Lexicostatistics". In 1972 Sankoff in a biological context developed a model of genetic divergence of populations. Embleton (1981) derives a simplified version of this in a linguistic context. She carries out a number of simulations using this which are shown to give good results.
Improvements in statistical methodology related to a completely different branch of science – changes in DNA over time
– have sparked a recent renewed interest. These methods are more robust than the earlier ones because they calibrate points on the tree with known historical events and smooth the rates of change across these. As such, they no longer require the assumption of a constant rate of change (Gray & Atkinson 2003).
, who had proposed that
The resulting formula, taking into account both the time dependence and the individual stability quotients, looks as follows:
In this formula, −Lc reflects the gradual slowing down of the replacement process due to different individual rates (the less stable elements are the first and the quickest to be replaced), whereas the square root represents the reverse trend – acceleration of replacement as items in the original wordlist "age" and become more prone to shifting their meaning. The formula is obviously more complicated than Swadesh's original one, but, as shown in Starostin's work, yields more credible results than the former (and more or less agrees with all the cases of language separation that can be confirmed by historical knowledge). On the other hand, it shows that glottochronology can really only be used as a serious scientific tool on language families the historical phonology of which has been meticulously elaborated (at least to the point of being able to clearly distinguish between cognates and loanwords).
Lexicostatistics
Lexicostatistics is an approach to comparative linguistics that involves quantitative comparison of lexical cognates. Lexicostatistics is related to the comparative method but does not reconstruct a proto-language...
dealing with the chronological relationship between languages.
The idea has been developed by Morris Swadesh
Morris Swadesh
Morris Swadesh was an influential and controversial American linguist. In his work, he applied basic concepts in historical linguistics to the Indigenous languages of the Americas...
under two assumptions: First that there exists a relatively stable "basic vocabulary" (therefore called "Swadesh list
Swadesh list
A Swadesh list is one of several lists of vocabulary with basic meanings, developed by Morris Swadesh from 1940 onward, with the final, posthumously published version 1971 [1972], which is used in lexicostatistics and glottochronology .- Versions and authors :There are several versions of Swadesh...
s") in all languages of the world, and secondly that any replacements happen in a way analogical to that in radioactive decay
Radioactive decay
Radioactive decay is the process by which an atomic nucleus of an unstable atom loses energy by emitting ionizing particles . The emission is spontaneous, in that the atom decays without any physical interaction with another particle from outside the atom...
in constant percentages per time elapsed. Meanwhile there exist many different methods, partly extensions of the Swadesh method, now more and more methods under biological assumptions of replacements in genes. However, Swadesh's technique is so well known that, for many people, 'glottochronology' refers to it alone.
Word list
The original method presumed that the core vocabulary of a language is replaced at a constant (or near constant) rate across all languages and cultures, and can therefore be used to measure the passage of time. The process makes use of a list of lexical terms compiled by Morris Swadesh assumed to be resistant against borrowing (originally designed as a list of 200 items; however, the refined 100 word list in Swadesh (1955) is much more common among modern day linguists). This core vocabulary was designed to encompass concepts common to every human language (such as personal pronouns, body parts, heavenly bodies, verbs of basic actions, numerals 'one' and 'two', etc.), eliminating concepts that are specific to a particular culture or time. It has been found that this ideal is not in fact possible and that the meaning set may need to be tailored to the languages being compared.The percentage of cognate
Cognate
In linguistics, cognates are words that have a common etymological origin. This learned term derives from the Latin cognatus . Cognates within the same language are called doublets. Strictly speaking, loanwords from another language are usually not meant by the term, e.g...
s (words that have a common origin) in these word lists is then measured. The larger the percentage of cognates, the more recently the two languages being compared are presumed to have separated.
Glottochronologic constant
Robert LeesRobert Lees (linguist)
-Education:Lees went to the Massachusetts Institute of Technology in 1956 to work on its machine translation project. He first came to notice with an influential review of Noam Chomsky's Syntactic Structures , and his 1960 book The Grammar of English Nominalizations...
obtained a value for the "glottochronological constant" of words by considering the known changes in 13 pairs of languages using the 200 word list. He obtained a value of 0.806 ± 0.0176 with 90% confidence. For the 100 word list Swadesh obtained a value of 0.86, the higher value reflecting the elimination of semantically unstable words. This constant may be related to the retention rate of words by:-
where L is the rate of replacement, ln is the logarithm to base e, and r is the glottochronological constant
Divergence time
The basic formula of glottochronology in its shortest form is:-where t = a given period of time from one stage of the language to another, c = proportion of wordlist items retained at the end of that period, and L = rate of replacement for that word list.
By testing historically verifiable cases where we have knowledge of t through non-linguistic data (e. g. the approximate distance from Classical Latin to modern Romance languages), Swadesh arrived at the empirical value of approximately 0.14 for L (meaning that the rate of replacement constitutes around 14 words from the 100-wordlist per millennium).
Results
Glottochronology was found to work in the case of Indo-European, accounting for 87% of the variance. It is also postulated to work for Hamito-Semitic (Fleming 1973), Chinese (Munro 1978) and Amerind (Stark 1973; Baumhoff and Olmsted 1963). For the latter, correlations have been obtained with radiocarbon dating and blood groups as well as archaeology.Note that the approach of Gray and Atkinson, after their own words, have nothing to do with "glottochronology".
Discussion
The concept of language change is old and its history is reviewed in Hymes (1973) and Wells (1973). Glottochronology itself dates back to the mid-20th century (see Lees 1953; Swadesh 1955, 1972) An introduction to the subject is given in Embleton (1986) and in McMahon and McMahon (2005).Glottochronology has been controversial ever since, partly owing to issues of accuracy, as well as the question of whether its basis is sound (see e.g. Bergsland 1958; Bergsland and Vogt 1962; Fodor 1961; Chretien 1962; Guy 1980). These concerns have been addressed by Dobson et al. (1972), Dyen (1973) and Kruskal, Dyen and Black (1973). The assumption of a single-word replacement rate can distort the divergence-time estimate when borrowed words are included (Thomason and Kaufman 1988). Chrétien purported to disprove the mathematics of the Swadesh-model. At a conference at Yale in 1971 his criticisms were shown to be invalid. The same conference saw the application of the theory to Creole language
Creole language
A creole language, or simply a creole, is a stable natural language developed from the mixing of parent languages; creoles differ from pidgins in that they have been nativized by children as their primary language, making them have features of natural languages that are normally missing from...
(Wittmann 1973).
An overview of recent arguments can be obtained from the papers of a conference held at the McDonald Institute in 2000. See Renfrew, McMahon and Trask, 2002. These presentations vary from "Why linguists don't do dates" to the one by Starostin
Sergei Starostin
Dr. Sergei Anatolyevich Starostin was a Russian historical linguist and scholar, best known for his work with hypothetical proto-languages, including his work on the reconstruction of the Proto-Borean language, the controversial theory of Altaic languages and the formulation of the Dené–Caucasian...
discussed above.
Since its original inception, glottochronology has been rejected by many linguists, mostly Indo-Europeanists of the school of the traditional comparative method
Comparative method
In linguistics, the comparative method is a technique for studying the development of languages by performing a feature-by-feature comparison of two or more languages with common descent from a shared ancestor, as opposed to the method of internal reconstruction, which analyzes the internal...
. Criticisms have been answered in particular around three points of discussion.
- Criticism levelled against the higher stability of lexemes in Swadesh lists alone (Haarmann 1990) misses the point, because a certain amount of losses only enables the computations (Sankoff 1970).
- Traditional glottochronology did presume that language changes at a stable rate.
- Thus, in Bergsland & Vogt (1962), the authors make an impressive demonstration, on the basis of actual language data verifiable by extra-linguistic sources, that the "rate of change" for IcelandicIcelandic languageIcelandic is a North Germanic language, the main language of Iceland. Its closest relative is Faroese.Icelandic is an Indo-European language belonging to the North Germanic or Nordic branch of the Germanic languages. Historically, it was the westernmost of the Indo-European languages prior to the...
constituted around 4% per millennium, whereas for closely connected Riksmal (Literary Norwegian) it would amount to as much as 20%. (Swadesh's proposed "constant rate" was supposed to be around 14% per millennium). - This and several other similar examples effectively proved that Swadesh's formula would not work on all available material—a serious accusation considering that evidence that can be used to "calibrate" the meaning of L (i. e. language history recorded during prolonged periods of time) is not overwhelmingly large in the first place.
- It is highly likely that the chance of replacement is in fact different for every word or feature ("each word has its own history", among hundreds of other sources:).
- This global assumption has been modified and downgraded to single words even in single languages in many newer attempts (see below).
- A serious argument is that language change arises from socio-historical events which are of course unforeseeable and, therefore, uncomputable.
- New methods developed by Gray & Atkinson avoid these issues, but are still seen as controversial, primarily since they support the Anatolian originAnatolian hypothesisThe Anatolian hypothesis is also called Renfrew's Neolithic Discontinuity Theory ; it proposes that the dispersal of Proto-Indo-Europeans originated in Neolithic Anatolia...
of the Indo-European people over the more popular Kurgan hypothesisKurgan hypothesisThe Kurgan hypothesis is one of the proposals about early Indo-European origins, which postulates that the people of an archaeological "Kurgan culture" in the Pontic steppe were the most likely speakers of the Proto-Indo-European language...
.
Modified glottochronology
Somewhere in between the original concept of Swadesh and the rejection of glottochronology in its entirety lies the idea that glottochronology as a formal method of linguistic analysis becomes valid with the help of several important modifications. Thus, inhomogeneities in the replacement rate were dealt with by Van der Merwe (1966) by splitting the word list into classes each with their own rate, while Dyen, James and Cole (1967) allowed each meaning to have its own rate. Simultaneous estimation of divergence time and replacement rate was studied by Kruskal, Dyen and Black.Brainard (1970) allowed for chance cognation and drift effects was introduced by Gleason (1959). Sankoff (1973) suggested introducing a borrowing parameter and allowed synonyms.
A combination of these various improvements is given in Sankoff's "Fully Parameterised Lexicostatistics". In 1972 Sankoff in a biological context developed a model of genetic divergence of populations. Embleton (1981) derives a simplified version of this in a linguistic context. She carries out a number of simulations using this which are shown to give good results.
Improvements in statistical methodology related to a completely different branch of science – changes in DNA over time
Phylogenetics
In biology, phylogenetics is the study of evolutionary relatedness among groups of organisms , which is discovered through molecular sequencing data and morphological data matrices...
– have sparked a recent renewed interest. These methods are more robust than the earlier ones because they calibrate points on the tree with known historical events and smooth the rates of change across these. As such, they no longer require the assumption of a constant rate of change (Gray & Atkinson 2003).
Starostin's method
Another attempt to introduce such modifications was performed by the Russian linguist Sergei StarostinSergei Starostin
Dr. Sergei Anatolyevich Starostin was a Russian historical linguist and scholar, best known for his work with hypothetical proto-languages, including his work on the reconstruction of the Proto-Borean language, the controversial theory of Altaic languages and the formulation of the Dené–Caucasian...
, who had proposed that
- systematic loanwordLoanwordA loanword is a word borrowed from a donor language and incorporated into a recipient language. By contrast, a calque or loan translation is a related concept where the meaning or idiom is borrowed rather than the lexical item itself. The word loanword is itself a calque of the German Lehnwort,...
s, borrowed from one language into another, are a disruptive factor and have to be eliminated from the calculations; the one thing that really matters is the "native" replacement of items by items from the same language. The failure to notice this factor was a major reason in Swadesh's original estimation of the replacement rate at under 14 words from the 100-wordlist per millennium, when the real rate is, in fact, much slower (around 5 or 6). Introducing this correction effectively cancels out the "Bergsland & Vogt" argument, since a thorough analysis of the Riksmal data shows that its basic wordlist includes about 15–16 borrowings from other Germanic languages (mostly DanishDanish languageDanish is a North Germanic language spoken by around six million people, principally in the country of Denmark. It is also spoken by 50,000 Germans of Danish ethnicity in the northern parts of Schleswig-Holstein, Germany, where it holds the status of minority language...
) – exclusion of these elements from the calculations brings the rate down to the expected rate of 5–6 "native" replacements per millennium; - the rate of change is not really constant, but actually depends on the time period during which the word has existed in the language (i. e. chances of lexeme X being replaced by lexeme Y increase in direct proportion to the time elapsed – the so called "aging of words", empirically understood as gradual "erosion" of the word's primary meaning under the weight of acquired secondary ones);
- individual items on the 100 wordlist have different stability rates (for instance, the word "I" generally has a much lower chance of being replaced than the word "yellow", etc.).
The resulting formula, taking into account both the time dependence and the individual stability quotients, looks as follows:
In this formula, −Lc reflects the gradual slowing down of the replacement process due to different individual rates (the less stable elements are the first and the quickest to be replaced), whereas the square root represents the reverse trend – acceleration of replacement as items in the original wordlist "age" and become more prone to shifting their meaning. The formula is obviously more complicated than Swadesh's original one, but, as shown in Starostin's work, yields more credible results than the former (and more or less agrees with all the cases of language separation that can be confirmed by historical knowledge). On the other hand, it shows that glottochronology can really only be used as a serious scientific tool on language families the historical phonology of which has been meticulously elaborated (at least to the point of being able to clearly distinguish between cognates and loanwords).
Time-depth estimation
The problem of time-depth estimation was the subject of a conference held by the McDonald Institute in 2000. The published papers (Renfrew, McMahon and Trask, 2002) give an idea of the views on glottochronology at the time. These vary from "Why linguists don't do dates" to the one by Starostin discussed above. Note that in the referenced Gray and Atkinson paper, they hold that their methods can not be called "glottochronology", by incorrectly confining this term to its original method.See also
- LexicostatisticsLexicostatisticsLexicostatistics is an approach to comparative linguistics that involves quantitative comparison of lexical cognates. Lexicostatistics is related to the comparative method but does not reconstruct a proto-language...
- Swadesh listSwadesh listA Swadesh list is one of several lists of vocabulary with basic meanings, developed by Morris Swadesh from 1940 onward, with the final, posthumously published version 1971 [1972], which is used in lexicostatistics and glottochronology .- Versions and authors :There are several versions of Swadesh...
- Mass lexical comparisonMass lexical comparisonMass comparison is a method developed by Joseph Greenberg to determine the level of genetic relatedness between languages. It is now usually called multilateral comparison...
- Basic EnglishBasic EnglishBasic English, also known as Simple English, is an English-based controlled language created by linguist and philosopher Charles Kay Ogden as an international auxiliary language, and as an aid for teaching English as a Second Language...
- Historical linguisticsHistorical linguisticsHistorical linguistics is the study of language change. It has five main concerns:* to describe and account for observed changes in particular languages...
- Proto-languageProto-languageA proto-language in the tree model of historical linguistics is the common ancestor of the languages that form a language family. Occasionally, the German term Ursprache is used instead.Often the proto-language is not known directly...
- CognateCognateIn linguistics, cognates are words that have a common etymological origin. This learned term derives from the Latin cognatus . Cognates within the same language are called doublets. Strictly speaking, loanwords from another language are usually not meant by the term, e.g...
- Indo-European studiesIndo-European studiesIndo-European studies is a field of linguistics dealing with Indo-European languages, both current and extinct. Its goal is to amass information about the hypothetical proto-language from which all of these languages are descended, a language dubbed Proto-Indo-European , and its speakers, the...