Quantitative parasitology
Encyclopedia
Counting parasites
Quantifying parasites in a sample of hosts or comparing measures of infection across two or more samples can be challenging.The parasitic infection of a sample of hosts inherently exhibits a complex pattern that cannot be adequately quantified by a single statistical measure. As the use of two or more separate indices is advisable, only two or more separate statistical tests can reliably compare infections different samples of hosts.
A few of the available statistical measures have markedly different biological interpretations, while others have more-or-less overlapping interpretations or no interpretations at all. Therefore, one should apply measures that have clear and separate biological interpretations thus do not predict each other.
Parasite individuals typically exhibit an aggregated (right-skewed) distribution
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
among host individuals; most hosts harbour few if any parasites and a few hosts harbour many of them. This quantitative feature of parasitism renders many of traditional statistical methods obsolete and requires the use of advanced computer-intensive statistical methods.
How to describe the parasitic infection of a sample of hosts
Always give the host sample size. In most cases, this is expressed as the number of hosts individuals examined. (Exceptionally, other units may also be used for special cases.)Describe prevalence. This is the proportion of infected hosts among all the hosts examined. Give the confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
(CI) of prevalence (either as a Clopper-Pearson interval or as adjusted Wald/Sterne's interval) to indicate the accuracy of the estimation (use of the confidence intervals belonging to the 95% probability is advisable).
Describe mean intensity. This is the mean number of parasites found in the infected hosts (the zeros of uninfected hosts are excluded). Since sample size and prevalence are known, mean intensity defines the quantity of parasites found in the sample of hosts. Given the typical aggregated (right-skewed) distribution
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
of parasites, its actual value is highly dependent on a few extremely infected hosts. Also give CI to indicate the accuracy of the estimation. Use bias-corrected and accelerated bootstrap (BCa Bootsrap) to get this confidence interval.
Describe median intensity. This is the median number of parasites found in infected hosts (the zeros of uninfected hosts are excluded). Median intensity shows a typical level of infection among the infected hosts. Use exact CI to indicate the accuracy of the estimation.
In certain cases one may prefer to use mean abundance instead of mean intensity. This is the mean number of parasites found in all hosts (involves the zero values of uninfected hosts). Give BCa Bootsrap confidence interval to indicate the accuracy of this estimation. This measure unifies two of the former ones: prevalence and mean intensity. Do not use it, unless you have a clearly specified a reason why to prefer it.
Describing mean crowding (intensity values averaged across parasite individuals) and its confidence interval is essential only for those who study density-dependent characters of parasites. BCa Bootsrap CI can be used to indicate the accuracy of the estimation.
Finally, quantify levels of skewness
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
of the parasites' distribution among hosts. There are 3 indices widely used for this purpose, but their interpretation is quite similar. They predict each other rather well, thus it is not necessary to use all the 3 of them.
How to compare the parasite burdens across two or more samples
Compare prevalences by Fisher's exact testFisher's exact test
Fisher's exact test is a statistical significance test used in the analysis of contingency tables where sample sizes are small. It is named after its inventor, R. A...
. This will show whether the proportion of infected individuals differs significantly between the two (or more) samples. The time need of this test may increase dramatically when several samples are involved. Using Chi-squared test for the same purpose may be advisable in such cases.
Compare mean intensities by a Bootstrap t-test. This will show whether parasite quantities differ significantly between the infected proportions of the two samples.
Compare median intensities by Mood's median test. This will show whether the typical level of infection differs significantly between the infected proportions of the two samples.
One can also compare the frequency distributions of intensities by a Stochastic equality test. It compares several random pairs of individual values taken from the two samples to test whether or not there is a significant tendency to get higher values from one sample than from the other.
In certain cases, one may also decide to compare mean abundances by a Bootstrap t-test. This will show whether parasite quantities differ significantly between two samples. This comparison unifies two of the former ones: the comparison of prevalences and the comparison of mean intensities.
Finally, mean crowding can be compared across samples by a simple method: provided that the two 97.5% confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
s do not overlap, we conclude that the two values are different at a 95% level of significance.
Avoid typical mistakes
Do not use geometric meanGeometric mean
The geometric mean, in mathematics, is a type of mean or average, which indicates the central tendency or typical value of a set of numbers. It is similar to the arithmetic mean, except that the numbers are multiplied and then the nth root of the resulting product is taken.For instance, the...
because this measure is hard to interpret biologically.
Do not apply the usual form of arithmetic mean
Arithmetic mean
In mathematics and statistics, the arithmetic mean, often referred to as simply the mean or average when the context is clear, is a method to derive the central tendency of a sample space...
± standard deviation
Standard deviation
Standard deviation is a widely used measure of variability or diversity used in statistics and probability theory. It shows how much variation or "dispersion" there is from the average...
(mean ± SD) to describe levels of infection because this is useful only for normal distributions, and not for the aggregated (right-skewed) distributions
Skewness
In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable. The skewness value can be positive or negative, or even undefined...
that characterize parasites. Use confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
s to quantify the accuracy of estimations.
Avoid overstatements when interpreting the results.
Literature
- Lotka AJAlfred J. LotkaAlfred James Lotka was a US mathematician, physical chemist, and statistician, famous for his work in population dynamics and energetics. An American biophysicist best known for his proposal of the predator-prey model, developed simultaneously but independently of Vito Volterra...
1923. Contribution to quantitative parasitology. Journal of the Washington Academy of Sciences, 13, 152-158. (a population dynamicsPopulation dynamicsPopulation dynamics is the branch of life sciences that studies short-term and long-term changes in the size and age composition of populations, and the biological and environmental processes influencing those changes...
model, not biostatisticsBiostatisticsBiostatistics is the application of statistics to a wide range of topics in biology...
) - Morales G, Arelis Pino L 1987. Parasitología cuantitativa. Acta Científica Venezolana, p 132.
- Bush AO, Lafferty KD, Lotz JM, Shostak AW 1997. Parasitology meets ecology on its own terms: Margolis et al. revisited. Journal of Parasitology, 83, 575–583.
- Crofton HD 1971. A quantitative approach to parasitism. Parasitology, 62, 179-193.
- Rózsa L, Reiczigel J, Majoros G 2000. Quantifying parasites in samples of hosts. Journal of Parasitology, 86, 228-232.
- Reiczigel J 2003. Confidence intervals for the binomial parameter: some new considerations. Statistics in Medicine, 22, 611-621.
- Neuhauser M, Poulin R 2004. Comparing parasite numbers between samples of hosts. Journal of Parasitology, 90, 689-691.
- Reiczigel J, Lang Z, Rózsa L, Tóthmérész B 2005. Properties of crowding indices and statistical tools to analyze crowding data. Journal of Parasitology, 91, 245-252.
- Reiczigel J, Zakariás I, Rózsa L 2005. A Bootstrap Test of Stochastic Equality of Two Populations. The American Statistician, 59, 156-161.