Correlation ratio
Encyclopedia
In statistics
, the correlation ratio is a measure of the relationship between the statistical dispersion
within individual categories and the dispersion across the whole population or sample. The measure is defined as the ratio of two standard deviation
s representing these types of variation. The context here is same as that of the intraclass correlation coefficient, whose value is the square of the correlation ratio.
and
where is the mean of the category x and is the mean of the whole population. The correlation ratio η (eta
) is defined as to satisfy
which can be written as
i.e. the weighted variance of the category means divided by the variance of all samples.
It is worth noting that if the relationship between values of and values of is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of the correlation coefficient
, otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships.
Then the subject averages are 36, 33 and 78, with an overall average of 52.
The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:
This gives
suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root
Observe that for the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For a quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78.
The limit refers to the case without dispersion in the categories contributing to the overall dispersion. The trivial requirement for this extreme is that all category means are the same.
as part of analysis of variance
. Ronald Fisher
commented:
(Karl's son) responded by saying
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....
, the correlation ratio is a measure of the relationship between the statistical dispersion
Statistical dispersion
In statistics, statistical dispersion is variability or spread in a variable or a probability distribution...
within individual categories and the dispersion across the whole population or sample. The measure is defined as the ratio of two standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
s representing these types of variation. The context here is same as that of the intraclass correlation coefficient, whose value is the square of the correlation ratio.
Definition
Suppose each observation is yxi where x indicates the category that observation is in and i is the label of the particular observation. Let nx be the number of observations in category x andand
where is the mean of the category x and is the mean of the whole population. The correlation ratio η (eta
Eta (letter)
Eta ) is the seventh letter of the Greek alphabet. Originally denoting a consonant /h/, its sound value in the classical Attic dialect of Ancient Greek was a long vowel , raised to in medieval Greek, a process known as itacism.In the system of Greek numerals it has a value of 8...
) is defined as to satisfy
which can be written as
i.e. the weighted variance of the category means divided by the variance of all samples.
It is worth noting that if the relationship between values of and values of is linear (which is certainly true when there are only two possibilities for x) this will give the same result as the square of the correlation coefficient
Pearson product-moment correlation coefficient
In statistics, the Pearson product-moment correlation coefficient is a measure of the correlation between two variables X and Y, giving a value between +1 and −1 inclusive...
, otherwise the correlation ratio will be larger in magnitude. It can therefore be used for judging non-linear relationships.
Range
The correlation ratio takes values between 0 and 1. The limit represents the special case of no dispersion among the means of the different categories, while refers to no dispersion within the respective categories. Note further, that is undefined when all data points of the complete population take the same value.Example
Suppose there is a distribution of test scores in three topics (categories):- Algebra: 45, 70, 29, 15 and 21 (5 scores)
- Geometry: 40, 20, 30 and 42 (4 scores)
- Statistics: 65, 95, 80, 70, 85 and 73 (6 scores).
Then the subject averages are 36, 33 and 78, with an overall average of 52.
The sums of squares of the differences from the subject averages are 1952 for Algebra, 308 for Geometry and 600 for Statistics, adding to 2860, while the overall sum of squares of the differences from the overall average is 9640. The difference between these of 6780 is also the weighted sum of the square of the differences between the subject averages and the overall average:
This gives
suggesting that most of the overall dispersion is a result of differences between topics, rather than within topics. Taking the square root
Observe that for the overall sample dispersion is purely due to dispersion among the categories and not at all due to dispersion within the individual categories. For a quick comprehension simply imagine all Algebra, Geometry, and Statistics scores being the same respectively, e.g. 5 times 36, 4 times 33, 6 times 78.
The limit refers to the case without dispersion in the categories contributing to the overall dispersion. The trivial requirement for this extreme is that all category means are the same.
Pearson v. Fisher
The correlation ratio was introduced by Karl PearsonKarl Pearson
Karl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics....
as part of analysis of variance
Analysis of variance
In statistics, analysis of variance is a collection of statistical models, and their associated procedures, in which the observed variance in a particular variable is partitioned into components attributable to different sources of variation...
. Ronald Fisher
Ronald Fisher
Sir Ronald Aylmer Fisher FRS was an English statistician, evolutionary biologist, eugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher's exact test and Fisher's equation...
commented:
As a descriptive statistic the utility of the correlation ratio is extremely limited. It will be noticed that the number of degrees of freedomto which Egon PearsonDegrees of freedom (statistics)In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.Estimates of statistical parameters can be based upon different amounts of information or data. The number of independent pieces of information that go into the...
in the numerator of depends on the number of the arrays
Egon Pearson
Egon Sharpe Pearson, CBE FRS was the only son of Karl Pearson, and like his father, a leading British statistician....
(Karl's son) responded by saying
Again, a long-established method such as the use of the correlation ratio [§45 The "Correlation Ratio" η] is passed over in a few words without adequate description, which is perhaps hardly fair to the student who is given no opportunity of judging its scope for himself.