Volcano plot (statistics)
Encyclopedia
In statistics, a volcano plot is a type of scatter-plot that is used to quickly identify changes in large datasets composed of replicate data . It plots significance versus fold-change on the y- and x-axes, respectively. These plots are increasingly common in omic experiments such as genomics
, proteomics
, and metabolomics
where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a statistical test (e.g., p-value
, ANOVA) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc) that display large-magnitude changes that are also statistically significant
.
A volcano plot is constructed by plotting the negative log of the p-value
on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing towards the top of the plot. The x-axis is the log of the fold change
between the two conditions. The log of the fold-change is used so that changes in both directions (up and down) appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found towards the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance
(hence being towards the top).
Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed.
Genomics
Genomics is a discipline in genetics concerning the study of the genomes of organisms. The field includes intensive efforts to determine the entire DNA sequence of organisms and fine-scale genetic mapping efforts. The field also includes studies of intragenomic phenomena such as heterosis,...
, proteomics
Proteomics
Proteomics is the large-scale study of proteins, particularly their structures and functions. Proteins are vital parts of living organisms, as they are the main components of the physiological metabolic pathways of cells. The term "proteomics" was first coined in 1997 to make an analogy with...
, and metabolomics
Metabolomics
Metabolomics is the scientific study of chemical processes involving metabolites. Specifically, metabolomics is the "systematic study of the unique chemical fingerprints that specific cellular processes leave behind", the study of their small-molecule metabolite profiles...
where one often has a list of many thousands of replicate datapoints between two conditions and one wishes to quickly identify the most-meaningful changes. A volcano plot combines a statistical test (e.g., p-value
P-value
In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...
, ANOVA) with the magnitude of the change enabling quick visual identification of those data-points (genes, etc) that display large-magnitude changes that are also statistically significant
Statistical significance
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....
.
A volcano plot is constructed by plotting the negative log of the p-value
P-value
In statistical significance testing, the p-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. One often "rejects the null hypothesis" when the p-value is less than the significance level α ,...
on the y-axis (usually base 10). This results in datapoints with low p-values (highly significant) appearing towards the top of the plot. The x-axis is the log of the fold change
Fold change
Fold change is a number describing how much a quantity changes going from an initial to a final value. For example, an initial value of 30 and a final value of 60 corresponds to a fold change of 2, or in common terms, a two-fold increase. Fold change is calculated simply as the ratio of the final...
between the two conditions. The log of the fold-change is used so that changes in both directions (up and down) appear equidistant from the center. Plotting points in this way results in two regions of interest in the plot: those points that are found towards the top of the plot that are far to either the left- or the right-hand side. These represent values that display large magnitude fold changes (hence being left- or right- of center) as well as high statistical significance
Statistical significance
In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase test of significance was coined by Ronald Fisher....
(hence being towards the top).
Additional information can be added by coloring the points according to a third dimension of data (such as signal-intensity) but this is not uniformly employed.