Detrended Correspondence Analysis
Encyclopedia
Detrended correspondence analysis (DCA) is a multivariate statistical
technique widely used by ecologist
s to find the main factors or gradients in large, species-rich but usually sparse data matrices that typify ecological community
data. For example, Hill and Gauch (1980, p. 55) analyse the data of a vegetation survey of southeast England including 876 species in 3270 relevés
. After eliminating outliers, DCA is able to identify two main axes: The first axis goes from dry to wet conditions, and the second axis from woodland to weed communities.
Institute for Terrestrial Ecology (now
merged into Centre for Ecology and Hydrology
) and implemented in FORTRAN
code package called DECORANA (Detrended Correspondence Analysis), a correspondence analysis
method. DCA is sometimes erroneously referred to as DECORANA; however, DCA is the underlying algorithm, while DECORANA is a tool for implementing it.
data. An example is a time-series of plant species colonising a new habitat; early successional species
are replaced by mid-successional species, then by late successional ones (see example below). When such data are analysed by a standard ordination
such as a correspondence analysis
Outside ecology, the same artifacts occur when gradient data are analysed (eg soil properties along a transect running between 2 different geologies, or behavioural data over the lifespan of an individual) because the curved projection is an accurate representation of the shape of the data in multivariate space.
Ter Braak and Prentice (1987, p. 121) cite a simulation
study analysing two-dimensional species packing models resulting in a better performance of DCA compared to CA.
Ter Braak and Prentice (1987, p.122) warn against the non-linear rescaling of the axes due to robustness issues and recommend using detrending-by-polynomials only.
are available with DCA, although there is a constrained (canonical) version called DCCA in which the axes are forced by Multiple linear regression to correlate optimally with a linear combination
of other (usually environmental) variables; this allows testing of a null model by Monte-Carlo permutation
analysis.
The plot of the first two axes of the correspondence analysis result on the right hand side clearly shows the disadvantages of this procedure: the edge effect, i.e. the points are clustered at the edges of the first axis and the arch effect.
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
technique widely used by ecologist
Ecology
Ecology is the scientific study of the relations that living organisms have with respect to each other and their natural environment. Variables of interest to ecologists include the composition, distribution, amount , number, and changing states of organisms within and among ecosystems...
s to find the main factors or gradients in large, species-rich but usually sparse data matrices that typify ecological community
Community (ecology)
In ecology, a community is an assemblage of two or more populations of different species occupying the same geographical area. The term community has a variety of uses...
data. For example, Hill and Gauch (1980, p. 55) analyse the data of a vegetation survey of southeast England including 876 species in 3270 relevés
Species-area curve
In ecology, a species-area curve is a relationship between the area of a habitat, or of part of a habitat, and the number of species found within that area. Larger areas tend to contain larger numbers of species, and empirically, the relative numbers seem to follow systematic mathematical...
. After eliminating outliers, DCA is able to identify two main axes: The first axis goes from dry to wet conditions, and the second axis from woodland to weed communities.
History of DCA
It was created in 1979 by Mark Hill of the United Kingdom'sUnited Kingdom
The United Kingdom of Great Britain and Northern IrelandIn the United Kingdom and Dependencies, other languages have been officially recognised as legitimate autochthonous languages under the European Charter for Regional or Minority Languages...
Institute for Terrestrial Ecology (now
merged into Centre for Ecology and Hydrology
Centre for Ecology and Hydrology
The Centre for Ecology & Hydrology is the United Kingdom's Centre of Excellence for integrated research in hydrology, terrestrial and freshwater ecosystems and their interaction with the atmosphere...
) and implemented in FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
code package called DECORANA (Detrended Correspondence Analysis), a correspondence analysis
Correspondence analysis
Correspondence analysis is a multivariate statistical technique proposed by Hirschfeld and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data...
method. DCA is sometimes erroneously referred to as DECORANA; however, DCA is the underlying algorithm, while DECORANA is a tool for implementing it.
The problems solved by DCA
According to Hill and Gauch (1980), DCA is used to suppress two artifacts inherent in most other multivariate analyses when applied to gradientGradient
In vector calculus, the gradient of a scalar field is a vector field that points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change....
data. An example is a time-series of plant species colonising a new habitat; early successional species
Ecological succession
Ecological succession, is the phenomenon or process by which a community progressively transforms itself until a stable community is formed. It is a fundamental concept in ecology, and refers to more or less predictable and orderly changes in the composition or structure of an ecological community...
are replaced by mid-successional species, then by late successional ones (see example below). When such data are analysed by a standard ordination
Ordination (statistics)
In multivariate analysis, ordination is a method complementary to data clustering, and used mainly in exploratory data analysis . Ordination orders objects that are characterized by values on multiple variables so that similar objects are near each other and dissimilar objects are farther from...
such as a correspondence analysis
Correspondence analysis
Correspondence analysis is a multivariate statistical technique proposed by Hirschfeld and later developed by Jean-Paul Benzécri. It is conceptually similar to principal component analysis, but applies to categorical rather than continuous data...
- the ordination scores of the samples will exhibit the edge effect, i.e. the variance of the scores at the beginning and the end of a regular succession of species will be considerably smaller than that in the middle,
- when presented as a graph the points will be seen to follow a horseshoeHorseshoeA horseshoe, is a fabricated product, normally made of metal, although sometimes made partially or wholly of modern synthetic materials, designed to protect a horse's hoof from wear and tear. Shoes are attached on the palmar surface of the hooves, usually nailed through the insensitive hoof wall...
shaped curve rather than a straight line (arch effect), even though the process under analysis is a steady and continuous change that human intuition would prefer to see as a linear trend.
Outside ecology, the same artifacts occur when gradient data are analysed (eg soil properties along a transect running between 2 different geologies, or behavioural data over the lifespan of an individual) because the curved projection is an accurate representation of the shape of the data in multivariate space.
Ter Braak and Prentice (1987, p. 121) cite a simulation
Simulation
Simulation is the imitation of some real thing available, state of affairs, or process. The act of simulating something generally entails representing certain key characteristics or behaviours of a selected physical or abstract system....
study analysing two-dimensional species packing models resulting in a better performance of DCA compared to CA.
How DCA solves the problems
DCA is an iterative algorithm that has shown itself to be a highly reliable and useful tool for data exploration and summary in community ecology (Shaw 2003). It starts by running a standard ordination (CA or reciprocal averaging) on the data, to produce the initial horse-shoe curve in which the 1st ordination axis distorts into the 2nd axis. It then divides the first axis into segments (default = 26), and rescales each segment to have mean value of zero on the 2nd axis - this effectively squashes the curve flat. It also rescales the axis so that the ends are no longer compressed relative to the middle, so that 1 DCA unit approximates to the same rate of turnover all the way through the data: the rule of thumb is that 4 DCA units mean that there has been a total turnover in the community.Ter Braak and Prentice (1987, p.122) warn against the non-linear rescaling of the axes due to robustness issues and recommend using detrending-by-polynomials only.
The drawbacks of DCA
No significance testsStatistical significance
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....
are available with DCA, although there is a constrained (canonical) version called DCCA in which the axes are forced by Multiple linear regression to correlate optimally with a linear combination
Linear combination
In mathematics, a linear combination is an expression constructed from a set of terms by multiplying each term by a constant and adding the results...
of other (usually environmental) variables; this allows testing of a null model by Monte-Carlo permutation
Permutation
In mathematics, the notion of permutation is used with several slightly different meanings, all related to the act of permuting objects or values. Informally, a permutation of a set of objects is an arrangement of those objects into a particular order...
analysis.
Example
The example shows an ideal data set: The species data is in rows, samples in columns. For each sample along the gradient a new species is introduced but another species is no longer present. The result is a sparse matrix, ones indicate the presence of a species in a sample. Except at the edges each sample contains five species.1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SP1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP3 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP4 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP5 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP6 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP7 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP8 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP9 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP10 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP11 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
SP12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
SP13 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
SP14 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 |
SP15 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 |
SP16 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 |
SP17 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |
SP18 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 |
SP19 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 |
SP20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 |
The plot of the first two axes of the correspondence analysis result on the right hand side clearly shows the disadvantages of this procedure: the edge effect, i.e. the points are clustered at the edges of the first axis and the arch effect.
See also
- EigenanalysisEigenvalue, eigenvector and eigenspaceThe eigenvectors of a square matrix are the non-zero vectors that, after being multiplied by the matrix, remain parallel to the original vector. For each eigenvector, the corresponding eigenvalue is the factor by which the eigenvector is scaled when multiplied by the matrix...
- Ordination (statistics)Ordination (statistics)In multivariate analysis, ordination is a method complementary to data clustering, and used mainly in exploratory data analysis . Ordination orders objects that are characterized by values on multiple variables so that similar objects are near each other and dissimilar objects are farther from...
- Seriation (archaeology)Seriation (archaeology)In archaeology, seriation is a relative dating method in which assemblages or artifacts from numerous sites, in the same culture, are placed in chronological order. Where absolute dating methods, such as carbon dating, cannot be applied, archaeologists have to use relative dating methods to date...
– including additional examples for the arch effect
External links
- PAST (PAlaeontological STatistics) — free software including DCA with modifications according to Oxanen and Minchin (1997)
- WINBASP — free software including DCA with detrending-by-polynomials according to Ter Braak and Prentice (1988)