History of statistics
Encyclopedia
The history of statistics can be said to start around 1749 although, over time, there have been changes to the interpretation of what the word statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

means. In early times, the meaning was restricted to information about states
Sovereign state
A sovereign state, or simply, state, is a state with a defined territory on which it exercises internal and external sovereignty, a permanent population, a government, and the capacity to enter into relations with other sovereign states. It is also normally understood to be a state which is neither...

. This was later extended to include all collections of information of all types, and later still it was extended to include the analysis and interpretation of such data. In modern terms, "statistics" means both sets of collected information, as in national accounts
National accounts
National accounts or national account systems are the implementation of complete and consistent accounting techniques for measuring the economic activity of a nation. These include detailed underlying measures that rely on double-entry accounting...

 and temperature record
Temperature record
The temperature record shows the fluctuations of the temperature of the atmosphere and the oceans through various spans of time. The most detailed information exists since 1850, when methodical thermometer-based records began. There are numerous estimates of temperatures since the end of the...

s, and analytical work which requires statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

.

Statistical activities are often associated with models expressed using probabilities
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...

, and require probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

 for them to be put on a firm theoretical basis
Theory (mathematical logic)
In mathematical logic, a theory is a set of sentences in a formal language. Usually a deductive system is understood from context. An element \phi\in T of a theory T is then called an axiom of the theory, and any sentence that follows from the axioms is called a theorem of the theory. Every axiom...

: see History of probability
History of probability
Probability has a dual aspect: on the one hand the probability or likelihood of hypotheses given the evidence for them, and on the other hand the behavior of stochastic processes such as the throwing of dice or coins...

.

A number of statistical concepts have had an important impact on a wide range of sciences. These include the design of experiments
Design of experiments
In general usage, design of experiments or experimental design is the design of any information-gathering exercises where variation is present, whether under the full control of the experimenter or not. However, in statistics, these terms are usually used for controlled experiments...

 and approaches to statistical inference such as Bayesian inference
Bayesian inference
In statistics, Bayesian inference is a method of statistical inference. It is often used in science and engineering to determine model parameters, make predictions about unknown variables, and to perform model selection...

, each of which can be considered to have their own sequence in the development of the ideas underlying modern statistics.

Introduction

By the 18th century, the term "statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

" designated the systematic collection
Official statistics
Official statistics are statistics published by government agencies or other public bodies such as international organizations. They provide quantitative or qualitative information on all major areas of citizens' lives, such as economic and social development, living conditions, health, education,...

 of demographic and economic
Economics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...

 data by states. In the early 19th century, the meaning of "statistics" broadened, then including the discipline concerned with the collection, summary, and analysis of data. Today statistics is widely employed in government, business, and all the sciences. Electronic computer
Computer
A computer is a programmable machine designed to sequentially and automatically carry out a sequence of arithmetic or logical operations. The particular sequence of operations can be changed readily, allowing the computer to solve more than one kind of problem...

s have expedited statistical computation
Computational statistics
Computational statistics, or statistical computing, is the interface between statistics and computer science. It is the area of computational science specific to the mathematical science of statistics....

, and have allowed statisticians to develop "computer-intensive" methods.

The term "mathematical statistics
Mathematical statistics
Mathematical statistics is the study of statistics from a mathematical standpoint, using probability theory as well as other branches of mathematics such as linear algebra and analysis...

" designates the mathematical theories of probability
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

 and statistical inference
Statistical inference
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation...

, which are used in statistical practice. The relation between statistics and probability theory developed rather late, however. In the 19th century, statistics increasingly used probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, whose initial results were found in the 17th and 18th centuries, particularly in the analysis of games of chance (gambling). By 1800, astronomy used probability models and statistical theories, particularly the method of least squares, which was invented by Legendre
Adrien-Marie Legendre
Adrien-Marie Legendre was a French mathematician.The Moon crater Legendre is named after him.- Life :...

 and Gauss
Gauss
Gauss may refer to:*Carl Friedrich Gauss, German mathematician and physicist*Gauss , a unit of magnetic flux density or magnetic induction*GAUSS , a software package*Gauss , a crater on the moon...

. Early probability theory and statistics was systematized and extended by Laplace; following Laplace, probability and statistics have been in continual development. In the 19th century, social scientists used statistical reasoning and probability models to advance the new sciences of experimental psychology
Experimental psychology
Experimental psychology is a methodological approach, rather than a subject, and encompasses varied fields within psychology. Experimental psychologists have traditionally conducted research, published articles, and taught classes on neuroscience, developmental psychology, sensation, perception,...

 and sociology
Sociology
Sociology is the study of society. It is a social science—a term with which it is sometimes synonymous—which uses various methods of empirical investigation and critical analysis to develop a body of knowledge about human social activity...

; physical scientists used statistical reasoning and probability models to advance the new sciences of thermodynamics
Thermodynamics
Thermodynamics is a physical science that studies the effects on material bodies, and on radiation in regions of space, of transfer of heat and of work done on or by the bodies or radiation...

 and statistical mechanics
Statistical mechanics
Statistical mechanics or statistical thermodynamicsThe terms statistical mechanics and statistical thermodynamics are used interchangeably...

. The development of statistical reasoning was closely associated with the development of inductive logic and the scientific method
Scientific method
Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering empirical and measurable evidence subject to specific principles of...

.

Statistics is not a field of mathematics
Mathematics
Mathematics is the study of quantity, space, structure, and change. Mathematicians seek out patterns and formulate new conjectures. Mathematicians resolve the truth or falsity of conjectures by mathematical proofs, which are arguments sufficient to convince other mathematicians of their validity...

 but an autonomous mathematical science, like computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

 or operations research
Operations research
Operations research is an interdisciplinary mathematical science that focuses on the effective use of technology by organizations...

. Unlike mathematics, statistics had its origins in public administration
Public administration
Public Administration houses the implementation of government policy and an academic discipline that studies this implementation and that prepares civil servants for this work. As a "field of inquiry with a diverse scope" its "fundamental goal.....

 and maintains a special concern with demography
Demography
Demography is the statistical study of human population. It can be a very general science that can be applied to any kind of dynamic human population, that is, one that changes over time or space...

 and economics
Economics
Economics is the social science that analyzes the production, distribution, and consumption of goods and services. The term economics comes from the Ancient Greek from + , hence "rules of the house"...

. Being concerned with the scientific method
Scientific method
Scientific method refers to a body of techniques for investigating phenomena, acquiring new knowledge, or correcting and integrating previous knowledge. To be termed scientific, a method of inquiry must be based on gathering empirical and measurable evidence subject to specific principles of...

 and inductive logic, statistical theory has close association with the philosophy of science
Philosophy of science
The philosophy of science is concerned with the assumptions, foundations, methods and implications of science. It is also concerned with the use and merit of science and sometimes overlaps metaphysics and epistemology by exploring whether scientific results are actually a study of truth...

; with its emphasis on learning from data and making best predictions, statistics has great overlap with the decision science and microeconomics
Microeconomics
Microeconomics is a branch of economics that studies the behavior of how the individual modern household and firms make decisions to allocate limited resources. Typically, it applies to markets where goods or services are being bought and sold...

. With its concerns with data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

, statistics has overlap with information science
Information science
-Introduction:Information science is an interdisciplinary science primarily concerned with the analysis, collection, classification, manipulation, storage, retrieval and dissemination of information...

 and computer science
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...

.

Etymology

Look up statistics in wiktionary
Wiktionary
Wiktionary is a multilingual, web-based project to create a free content dictionary, available in 158 languages...

, the free dictionary.

The term statistics is ultimately derived from the New Latin
New Latin
The term New Latin, or Neo-Latin, is used to describe the Latin language used in original works created between c. 1500 and c. 1900. Among other uses, Latin during this period was employed in scholarly and scientific publications...

 statisticum collegium ("council of state") and the Italian
Italian language
Italian is a Romance language spoken mainly in Europe: Italy, Switzerland, San Marino, Vatican City, by minorities in Malta, Monaco, Croatia, Slovenia, France, Libya, Eritrea, and Somalia, and by immigrant communities in the Americas and Australia...

 word statista ("statesman
Statesman
A statesman is usually a politician or other notable public figure who has had a long and respected career in politics or government at the national and international level. As a term of respect, it is usually left to supporters or commentators to use the term...

" or "politician
Politician
A politician, political leader, or political figure is an individual who is involved in influencing public policy and decision making...

"). The German
German language
German is a West Germanic language, related to and classified alongside English and Dutch. With an estimated 90 – 98 million native speakers, German is one of the world's major languages and is the most widely-spoken first language in the European Union....

 Statistik, first introduced by Gottfried Achenwall
Gottfried Achenwall
Gottfried Achenwall was a German philosopher, historian, economist, jurist and statistician. He is counted among the inventors of statistics.-Biography:...

 (1749), originally designated the analysis of data
Data
The term data refers to qualitative or quantitative attributes of a variable or set of variables. Data are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables. Data are often viewed as the lowest level of abstraction from which...

 about the state
State (polity)
A state is an organized political community, living under a government. States may be sovereign and may enjoy a monopoly on the legal initiation of force and are not dependent on, or subject to any other power or state. Many states are federated states which participate in a federal union...

, signifying the "science of state" (then called political arithmetic in English). It acquired the meaning of the collection and classification of data generally in the early 19th century. It was introduced into English in 1791 by Sir John Sinclair when he published the first of 21 volumes titled Statistical Account of Scotland.

Thus, the original principal purpose of Statistik was data to be used by governmental and (often centralized) administrative bodies. The collection of data about states and localities continues, largely through national and international statistical services. In particular, censuses provide regular information about the population
Population
A population is all the organisms that both belong to the same group or species and live in the same geographical area. The area that is used to define a sexual population is such that inter-breeding is possible between any pair within the area and more probable than cross-breeding with individuals...

.

The first book to have 'statistics' in its title was "Contributions to Vital Statistics" by
Francis GP Neison, actuary to the Medical Invalid and General Life Office
(1st ed., 1845; 2nd ed., 1846; 3rd ed., 1857).

Origins in probability

The earliest writing on statistics was found in a 9th century book entitled: "Manuscript on Deciphering Cryptographic Messages", written by Al-Kindi
Al-Kindi
' , known as "the Philosopher of the Arabs", was a Muslim Arab philosopher, mathematician, physician, and musician. Al-Kindi was the first of the Muslim peripatetic philosophers, and is unanimously hailed as the "father of Islamic or Arabic philosophy" for his synthesis, adaptation and promotion...

 (801–873 AC). In his book, Al-Kindi
Al-Kindi
' , known as "the Philosopher of the Arabs", was a Muslim Arab philosopher, mathematician, physician, and musician. Al-Kindi was the first of the Muslim peripatetic philosophers, and is unanimously hailed as the "father of Islamic or Arabic philosophy" for his synthesis, adaptation and promotion...

 gave a detailed description of how to use statistics
Statistics
Statistics is the study of the collection, organization, analysis, and interpretation of data. It deals with all aspects of this, including the planning of data collection in terms of the design of surveys and experiments....

 and frequency analysis
Frequency analysis
In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers....

 to decipher encrypted messages, this was the birth of both statistics and cryptanalysis.

The "Nuova Cronica
Nuova Cronica
The Nuova Cronica or New Chronicles is a 14th century history of Florence created in a year-by-year linear format and written by the Florentine banker and official Giovanni Villani...

", a 14th century history of Florence
History of Florence
Florence is a major historical city in Italy, distinguished as one of the most outstanding economical, cultural, political and artistic centres in the peninsula from the late Middle Ages to the Renaissance.-Prehistoric evidence:...

 by the Florentine banker and official Giovanni Villani
Giovanni Villani
Giovanni Villani was an Italian banker, official, diplomat and chronicler from Florence who wrote the Nuova Cronica on the history of Florence. He was a leading statesman of Florence but later gained an unsavory reputation and served time in prison as a result of the bankruptcy of a trading and...

, includes many statistical information on population, ordinances, commerce and trade, education, and religious facilities and has been described as the first introduction of statistics as a positive element in history, though neither the term nor the concept of statistics as a specific field yet existed. But this was proven to be incorrect after the rediscovery of Al-Kindi
Al-Kindi
' , known as "the Philosopher of the Arabs", was a Muslim Arab philosopher, mathematician, physician, and musician. Al-Kindi was the first of the Muslim peripatetic philosophers, and is unanimously hailed as the "father of Islamic or Arabic philosophy" for his synthesis, adaptation and promotion...

's book on frequency analysis
Frequency analysis
In cryptanalysis, frequency analysis is the study of the frequency of letters or groups of letters in a ciphertext. The method is used as an aid to breaking classical ciphers....

.

The mathematical methods of statistics emerged from probability theory
Probability theory
Probability theory is the branch of mathematics concerned with analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of non-deterministic events or measured quantities that may either be single...

, which can be dated to the correspondence of Pierre de Fermat
Pierre de Fermat
Pierre de Fermat was a French lawyer at the Parlement of Toulouse, France, and an amateur mathematician who is given credit for early developments that led to infinitesimal calculus, including his adequality...

 and Blaise Pascal
Blaise Pascal
Blaise Pascal , was a French mathematician, physicist, inventor, writer and Catholic philosopher. He was a child prodigy who was educated by his father, a tax collector in Rouen...

 (1654). Christiaan Huygens (1657) gave the earliest known scientific treatment of the subject. Jakob Bernoulli's Ars Conjectandi
Ars Conjectandi
Ars Conjectandi is a combinatorial mathematical paper written by Jakob Bernoulli and published in 1713, eight years after his death, by his nephew, Niklaus Bernoulli. The seminal work consolidated, most notably among other combinatorial topics, probability theory: indeed, it is widely regarded as...

(posthumous, 1713) and Abraham de Moivre
Abraham de Moivre
Abraham de Moivre was a French mathematician famous for de Moivre's formula, which links complex numbers and trigonometry, and for his work on the normal distribution and probability theory. He was a friend of Isaac Newton, Edmund Halley, and James Stirling...

's The Doctrine of Chances
The Doctrine of Chances
The Doctrine of Chances was the first textbook on probability theory, written by 18th-century French mathematician Abraham de Moivre and first published in 1718. De Moivre wrote in English because he resided in England at the time, having fled France to escape the persecution of Huguenots...

(1718) treated the subject as a branch of mathematics. See Ian Hacking
Ian Hacking
Ian Hacking, CC, FRSC, FBA is a Canadian philosopher, specializing in the philosophy of science.- Life and works :...

's The Emergence of Probability and James Franklin
James Franklin (philosopher)
James Franklin is an Australian philosopher, mathematician and historian of ideas. He was educated at St. Joseph's College, Hunters Hill, New South Wales. His undergraduate work was at the University of Sydney , where he attended St John's College and he was influenced by philosophers David Stove...

's The Science of Conjecture: Evidence and Probability Before Pascal for histories of the early development of the very concept of mathematical probability. In the modern era, the work of Kolmogorov has been instrumental in formulating the fundamental model of Probability Theory, which is used throughout statistics.

The theory of errors may be traced back to Roger Cotes
Roger Cotes
Roger Cotes FRS was an English mathematician, known for working closely with Isaac Newton by proofreading the second edition of his famous book, the Principia, before publication. He also invented the quadrature formulas known as Newton–Cotes formulas and first introduced what is known today as...

' Opera Miscellanea (posthumous, 1722), but a memoir prepared by Thomas Simpson
Thomas Simpson
Thomas Simpson FRS was a British mathematician, inventor and eponym of Simpson's rule to approximate definite integrals...

 in 1755 (printed 1756) first applied the theory to the discussion of errors of observation. The reprint (1757) of this memoir lays down the axiom
Axiom
In traditional logic, an axiom or postulate is a proposition that is not proven or demonstrated but considered either to be self-evident or to define and delimit the realm of analysis. In other words, an axiom is a logical statement that is assumed to be true...

s that positive and negative errors are equally probable, and that there are certain assignable limits within which all errors may be supposed to fall; continuous errors are discussed and a probability curve is given.

Pierre-Simon Laplace
Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste...

 (1774) made the first attempt to deduce a rule for the combination of observations from the principles of the theory of probabilities. He represented the law of probability of errors by a curve. He deduced a formula for the mean of three observations. He also gave (1781) a formula for the law of facility of error (a term due to Joseph Louis Lagrange
Joseph Louis Lagrange
Joseph-Louis Lagrange , born Giuseppe Lodovico Lagrangia, was a mathematician and astronomer, who was born in Turin, Piedmont, lived part of his life in Prussia and part in France, making significant contributions to all fields of analysis, to number theory, and to classical and celestial mechanics...

, 1774), but one which led to unmanageable equations. Daniel Bernoulli
Daniel Bernoulli
Daniel Bernoulli was a Dutch-Swiss mathematician and was one of the many prominent mathematicians in the Bernoulli family. He is particularly remembered for his applications of mathematics to mechanics, especially fluid mechanics, and for his pioneering work in probability and statistics...

 (1778) introduced the principle of the maximum product of the probabilities of a system of concurrent errors.

The method of least squares, which was used to minimize errors in data measurement
Measurement
Measurement is the process or the result of determining the ratio of a physical quantity, such as a length, time, temperature etc., to a unit of measurement, such as the metre, second or degree Celsius...

, was published independently by Adrien-Marie Legendre
Adrien-Marie Legendre
Adrien-Marie Legendre was a French mathematician.The Moon crater Legendre is named after him.- Life :...

 (1805), Robert Adrain
Robert Adrain
Robert Adrain was a scientist and mathematician, considered one of the most brilliant mathematical minds of the time in America....

 (1808), and Carl Friedrich Gauss
Carl Friedrich Gauss
Johann Carl Friedrich Gauss was a German mathematician and scientist who contributed significantly to many fields, including number theory, statistics, analysis, differential geometry, geodesy, geophysics, electrostatics, astronomy and optics.Sometimes referred to as the Princeps mathematicorum...

 (1809). Gauss had used the method in his famous 1801 prediction of the location of the dwarf planet
Dwarf planet
A dwarf planet, as defined by the International Astronomical Union , is a celestial body orbiting the Sun that is massive enough to be spherical as a result of its own gravity but has not cleared its neighboring region of planetesimals and is not a satellite...

 Ceres. Further proofs were given by Laplace (1810, 1812), Gauss (1823), Ivory
James Ivory (mathematician)
Sir James Ivory was a Scottish mathematician.Ivory was born in Dundee and attended Dundee Grammar School. In 1779 he entered the University of St Andrews, distinguishing himself especially in mathematics...

 (1825, 1826), Hagen (1837), Bessel
Friedrich Bessel
-References:* John Frederick William Herschel, A brief notice of the life, researches, and discoveries of Friedrich Wilhelm Bessel, London: Barclay, 1847 -External links:...

 (1838), Donkin (1844, 1856), Herschel
John Herschel
Sir John Frederick William Herschel, 1st Baronet KH, FRS ,was an English mathematician, astronomer, chemist, and experimental photographer/inventor, who in some years also did valuable botanical work...

 (1850), Crofton
Morgan Crofton
Morgan Crofton was a mathematician who contributed to the field of geometric probability theory. He also worked with James Joseph Sylvester and contributed an article on probability to the 9th edition of the Encyclopædia Britannica...

 (1870), and Thiele
Thorvald N. Thiele
Thorvald Nicolai Thiele was a Danish astronomer, actuary and mathematician, most notable for his work in statistics, interpolation and the three-body problem. He was the first to propose a mathematical theory of Brownian motion...

 (1880, 1889).

Other contributors were Ellis (1844), De Morgan
Augustus De Morgan
Augustus De Morgan was a British mathematician and logician. He formulated De Morgan's laws and introduced the term mathematical induction, making its idea rigorous. The crater De Morgan on the Moon is named after him....

 (1864), Glaisher
Glaisher
Glaisher is a surname, and may refer to:*James Glaisher, meteorologist*James Whitbread Lee Glaisher, mathematician...

 (1872), and Giovanni Schiaparelli
Giovanni Schiaparelli
Giovanni Virginio Schiaparelli was an Italian astronomer and science historian. He studied at the University of Turin and Berlin Observatory. In 1859-1860 he worked in Pulkovo Observatory and then worked for over forty years at Brera Observatory...

 (1875). Peters's (1856) formula for , the "probable error" of a single observation was widely used and inspired early robust statistics
Robust statistics
Robust statistics provides an alternative approach to classical statistical methods. The motivation is to produce estimators that are not unduly affected by small departures from model assumptions.- Introduction :...

 (resistant to outliers
Peirce's criterion
In robust statistics, Peirce's criterion is a rule for eliminating outliers from data sets, which was devised by Benjamin Peirce.-The problem of outliers:...

).

In the 19th century authors on statistical theory
Statistical theory
The theory of statistics provides a basis for the whole range of techniques, in both study design and data analysis, that are used within applications of statistics. The theory covers approaches to statistical-decision problems and to statistical inference, and the actions and deductions that...

 included Laplace, S. Lacroix (1816), Littrow (1833), Dedekind
Richard Dedekind
Julius Wilhelm Richard Dedekind was a German mathematician who did important work in abstract algebra , algebraic number theory and the foundations of the real numbers.-Life:...

 (1860), Helmert (1872), Laurant (1873), Liagre, Didion, De Morgan
Augustus De Morgan
Augustus De Morgan was a British mathematician and logician. He formulated De Morgan's laws and introduced the term mathematical induction, making its idea rigorous. The crater De Morgan on the Moon is named after him....

, Boole
George Boole
George Boole was an English mathematician and philosopher.As the inventor of Boolean logic—the basis of modern digital computer logic—Boole is regarded in hindsight as a founder of the field of computer science. Boole said,...

, Edgeworth
Francis Ysidro Edgeworth
Francis Ysidro Edgeworth FBA was an Irish philosopher and political economist who made significant contributions to the methods of statistics during the 1880s...

, and K. Pearson
Karl Pearson
Karl Pearson FRS was an influential English mathematician who has been credited for establishing the disciplineof mathematical statistics....

.

Adolphe Quetelet
Adolphe Quetelet
Lambert Adolphe Jacques Quetelet was a Belgian astronomer, mathematician, statistician and sociologist. He founded and directed the Brussels Observatory and was influential in introducing statistical methods to the social sciences...

 (1796–1874), another important founder of statistics, introduced the notion of the "average man" (l'homme moyen) as a means of understanding complex social phenomena such as crime rates, marriage rates, or suicide rates.

Design of experiments

In 1747, while serving as surgeon on HM Bark Salisbury, James Lind carried out a controlled experiment to develop a cure for scurvy
Scurvy
Scurvy is a disease resulting from a deficiency of vitamin C, which is required for the synthesis of collagen in humans. The chemical name for vitamin C, ascorbic acid, is derived from the Latin name of scurvy, scorbutus, which also provides the adjective scorbutic...

. In this study his subjects' cases "were as similar as I could have them", that is he provided strict entry requirements to reduce extraneous variation. The men were paired, which provided blocking
Blocking (statistics)
In the statistical theory of the design of experiments, blocking is the arranging of experimental units in groups that are similar to one another. For example, an experiment is designed to test a new drug on patients. There are two levels of the treatment, drug, and placebo, administered to male...

. From a modern perspective, the main thing that is missing is randomized allocation of subjects to treatments.

A theory of statistical inference was developed by Charles S. Peirce in "Illustrations of the Logic of Science" (1877–1878) and "A Theory of Probable Inference" (1883), two publications that emphasized the importance of randomization-based inference in statistics.

In another study, Peirce randomly assigned volunteers to a blinded, repeated-measures design
Repeated measures design
The repeated measures design uses the same subjects with every condition of the research, including the control. For instance, repeated measures are collected in a longitudinal study in which change over time is assessed. Other studies compare the same measure under two or more different conditions...

 to evaluate their ability to discriminate weights.
Peirce's experiment inspired other researchers in psychology and education, which developed a research tradition of randomized experiments in laboratories and specialized textbooks in the 1800s.

Charles S. Peirce also contributed the first English-language publication on an optimal design
Optimal design
Optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion.In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum-variance...

 for regression
Regression analysis
In statistics, regression analysis includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables...

-models
Statistical model
A statistical model is a formalization of relationships between variables in the form of mathematical equations. A statistical model describes how one or more random variables are related to one or more random variables. The model is statistical as the variables are not deterministically but...

 in 1876. A pioneering optimal design
Optimal design
Optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion.In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum-variance...

 for polynomial regression
Polynomial regression
In statistics, polynomial regression is a form of linear regression in which the relationship between the independent variable x and the dependent variable y is modeled as an nth order polynomial...

 was suggested by Gergonne
Joseph Diaz Gergonne
Joseph Diaz Gergonne was a French mathematician and logician.-Life:In 1791, Gergonne enlisted in the French army as a captain. That army was undergoing rapid expansion because the French government feared a foreign invasion intended to undo the French Revolution and restore Louis XVI to full power...

 in 1815. In 1918 Kirstine Smith published optimal designs for polynomials of degree six (and less).

The use of a sequence of experiments, where the design of each may depend on the results of previous experiments, including the possible decision to stop experimenting, was pioneered by Abraham Wald
Abraham Wald
- See also :* Sequential probability ratio test * Wald distribution* Wald–Wolfowitz runs test...

 in the context of sequential tests of statistical hypotheses. Herman Chernoff
Herman Chernoff
Herman Chernoff is an American applied mathematician, statistician and physicist formerly a professor at MIT and currently working at Harvard University.-Education:* Ph.D., Applied Mathematics, 1948. Brown University....

 wrote an overview of optimal sequential designs, while adaptive designs have been surveyed by S. Zacks. One specific type of sequential design is the "two-armed bandit", generalized to the multi-armed bandit
Multi-armed bandit
In statistics, particularly in the design of sequential experiments, a multi-armed bandit takes its name from a traditional slot machine . Multiple levers are considered in the motivating applications in statistics. When pulled, each lever provides a reward drawn from a distribution associated...

, on which early work was done by Herbert Robbins
Herbert Robbins
Herbert Ellis Robbins was an American mathematician and statistician who did research in topology, measure theory, statistics, and a variety of other fields. He was the co-author, with Richard Courant, of What is Mathematics?, a popularization that is still in print. The Robbins lemma, used in...

 in 1952.

A methodology for designing experiments was proposed by Ronald A. Fisher
Ronald Fisher
Sir Ronald Aylmer Fisher FRS was an English statistician, evolutionary biologist, eugenicist and geneticist. Among other things, Fisher is well known for his contributions to statistics by creating Fisher's exact test and Fisher's equation...

, in his innovative book The Design of Experiments
The Design of Experiments
The Design of Experiments is a 1935 book by the British statistician R.A. Fisher, which effectively founded the field of design of experiments. The book has been highly influential.-References:...

(1935). As an example, he described how to test the hypothesis
Hypothesis
A hypothesis is a proposed explanation for a phenomenon. The term derives from the Greek, ὑποτιθέναι – hypotithenai meaning "to put under" or "to suppose". For a hypothesis to be put forward as a scientific hypothesis, the scientific method requires that one can test it...

 that a certain lady could distinguish by flavour alone whether the milk or the tea was first placed in the cup. While this sounds like a frivolous application, it allowed him to illustrate the most important ideas of experimental design: see Lady tasting tea
Lady tasting tea
In the design of experiments in statistics, the lady tasting tea is a famous randomized experiment devised by Ronald A. Fisher and reported in his book Statistical methods for research workers . The lady in question was Dr...

.

Inference

Charles S. Peirce (1839—1914) formulated frequentist theories of estimation and hypothesis-testing in (1877—1878) and (1883), in which he introduced "confidence
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

". Peirce also introduced blinded
Blinding
Blinding can refer to:*The act of making someone blind**Metaphorical and extended uses of same: see blindness#Metaphorical uses*Blinding , a technique by which an agent can provide a service to a client in an encoded form without knowing either the real input or the real output*Blinding , a novel...

, controlled randomized experiments
Randomized controlled trial
A randomized controlled trial is a type of scientific experiment - a form of clinical trial - most commonly used in testing the safety and efficacy or effectiveness of healthcare services or health technologies A randomized controlled trial (RCT) is a type of scientific experiment - a form of...

 with a repeated measures design
Repeated measures design
The repeated measures design uses the same subjects with every condition of the research, including the control. For instance, repeated measures are collected in a longitudinal study in which change over time is assessed. Other studies compare the same measure under two or more different conditions...

. Peirce invented an optimal design
Optimal design
Optimal designs are a class of experimental designs that are optimal with respect to some statistical criterion.In the design of experiments for estimating statistical models, optimal designs allow parameters to be estimated without bias and with minimum-variance...

 for experiments on gravity.

Bayesian statistics

The term Bayesian refers to Thomas Bayes
Thomas Bayes
Thomas Bayes was an English mathematician and Presbyterian minister, known for having formulated a specific case of the theorem that bears his name: Bayes' theorem...

 (1702–1761), who proved a special case of what is now called Bayes' theorem
Bayes' theorem
In probability theory and applications, Bayes' theorem relates the conditional probabilities P and P. It is commonly used in science and engineering. The theorem is named for Thomas Bayes ....

. However, it was Pierre-Simon Laplace
Pierre-Simon Laplace
Pierre-Simon, marquis de Laplace was a French mathematician and astronomer whose work was pivotal to the development of mathematical astronomy and statistics. He summarized and extended the work of his predecessors in his five volume Mécanique Céleste...

 (1749–1827) who introduced a general version of the theorem and used it to approach problems in celestial mechanics
Celestial mechanics
Celestial mechanics is the branch of astronomy that deals with the motions of celestial objects. The field applies principles of physics, historically classical mechanics, to astronomical objects such as stars and planets to produce ephemeris data. Orbital mechanics is a subfield which focuses on...

, medical statistics, reliability
Reliability (statistics)
In statistics, reliability is the consistency of a set of measurements or of a measuring instrument, often used to describe a test. Reliability is inversely related to random error.-Types:There are several general classes of reliability estimates:...

, and jurisprudence
Jurisprudence
Jurisprudence is the theory and philosophy of law. Scholars of jurisprudence, or legal theorists , hope to obtain a deeper understanding of the nature of law, of legal reasoning, legal systems and of legal institutions...

. When insufficient knowledge was available to specify an informed prior, Laplace used uniform
Uniform distribution
-Probability theory:* Discrete uniform distribution* Continuous uniform distribution-Other:* "Uniform distribution modulo 1", see Equidistributed sequence*Uniform distribution , a type of species distribution* Distribution of military uniforms...

 priors, according to his "principle of insufficient reason". Laplace also introduced primitive versions of conjugate prior
Conjugate prior
In Bayesian probability theory, if the posterior distributions p are in the same family as the prior probability distribution p, the prior and posterior are then called conjugate distributions, and the prior is called a conjugate prior for the likelihood...

s and the theorem
Bernstein–von Mises theorem
In Bayesian inference, the Bernstein–von Mises theorem provides the basis for the important result that the posterior distribution for unknown quantities in any problem is effectively independent of the prior distribution once the amount of information supplied by a sample of data is large...

 of von Mises and Bernstein, according to which the posteriors corresponding to initially differing priors ultimately agree, as the number of observations increases. This early Bayesian inference, which used uniform priors following Laplace's principle of insufficient reason, was called "inverse probability
Inverse probability
In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.Today, the problem of determining an unobserved variable is called inferential statistics, the method of inverse probability is called Bayesian probability, the "distribution"...

" (because it infer
Inductive reasoning
Inductive reasoning, also known as induction or inductive logic, is a kind of reasoning that constructs or evaluates propositions that are abstractions of observations. It is commonly construed as a form of reasoning that makes generalizations based on individual instances...

s backwards from observations to parameters, or from effects to causes ).

After the 1920s, inverse probability
Inverse probability
In probability theory, inverse probability is an obsolete term for the probability distribution of an unobserved variable.Today, the problem of determining an unobserved variable is called inferential statistics, the method of inverse probability is called Bayesian probability, the "distribution"...

  was largely supplanted by a collection of methods that were developed by Ronald A. Fisher, Jerzy Neyman
Jerzy Neyman
Jerzy Neyman , born Jerzy Spława-Neyman, was a Polish American mathematician and statistician who spent most of his professional career at the University of California, Berkeley.-Life and career:...

 and Egon Pearson
Egon Pearson
Egon Sharpe Pearson, CBE FRS was the only son of Karl Pearson, and like his father, a leading British statistician....

. Their methods came to be called frequentist statistics. Fisher rejected the Bayesian view, writing that "the theory of inverse probability is founded upon an error, and must be wholly rejected" . At the end of his life, however, Fisher expressed greater respect for the essay of Bayes, which Fisher believed to have anticipated his own, fiducial
Fiducial inference
Fiducial inference is one of a number of different types of statistical inference. These are rules, intended for general application, by which conclusions can be drawn from samples of data. In modern statistical practice, attempts to work with fiducial inference have fallen out of fashion in...

 approach to probability; Fisher still maintained that Laplace's views on probability were "fallacious rubbish". Neyman started out as a "quasi-Bayesian", but subsequently developed confidence interval
Confidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...

s (a key method in frequentist statistics) because "the whole theory would look nicer if it were built from the start without reference to Bayesianism and priors".
The word Bayesian appeared in the 1930s, and by the 1960s it became the term preferred by those dissatisfied with the limitations of frequentist statistics.

In the 20th century, the ideas of Laplace were further developed in two different directions, giving rise to objective and subjective currents in Bayesian practice. In the objectivist stream, the statistical analysis depends on only the model assumed and the data analysed. No subjective decisions need to be involved. In contrast, "subjectivist" statisticians deny the possibility of fully objective analysis for the general case.

In the further development of Laplace's ideas, subjective ideas predate objectivist positions. The idea that 'probability' should be interpreted as 'subjective degree of belief in a proposition' was proposed, for example, by John Maynard Keynes
John Maynard Keynes
John Maynard Keynes, Baron Keynes of Tilton, CB FBA , was a British economist whose ideas have profoundly affected the theory and practice of modern macroeconomics, as well as the economic policies of governments...

 in the early 1920s. This idea was taken further by Bruno de Finetti
Bruno de Finetti
Bruno de Finetti was an Italian probabilist, statistician and actuary, noted for the "operational subjective" conception of probability...

 in Italy (Fondamenti Logici del Ragionamento Probabilistico, 1930) and Frank Ramsey
Frank P. Ramsey
Frank Plumpton Ramsey was a British mathematician who, in addition to mathematics, made significant and precocious contributions in philosophy and economics before his death at the age of 26...

 in Cambridge (The Foundations of Mathematics, 1931). The approach was devised to solve problems with the frequentist definition of probability but also with the earlier, objectivist approach of Laplace. The subjective Bayesian methods were further developed and popularized in the 1950s by L.J. Savage
Leonard Jimmie Savage
Leonard Jimmie Savage was an American mathematician and statistician. Nobel Prize-winning economist Milton Friedman said Savage was "one of the few people I have met whom I would unhesitatingly call a genius."...

.

Objective Bayesian inference was further developed due to Harold Jeffreys
Harold Jeffreys
Sir Harold Jeffreys, FRS was a mathematician, statistician, geophysicist, and astronomer. His seminal book Theory of Probability, which first appeared in 1939, played an important role in the revival of the Bayesian view of probability.-Biography:Jeffreys was born in Fatfield, Washington, County...

, whose seminal book "Theory of probability" first appeared in 1939. In 1957, Edwin Jaynes
Edwin Thompson Jaynes
Edwin Thompson Jaynes was Wayman Crow Distinguished Professor of Physics at Washington University in St. Louis...

 promoted the concept of maximum entropy for constructing priors, which is an important principle in the formulation of objective methods, mainly for discrete problems. In 1965, Dennis Lindley
Dennis Lindley
Dennis Victor Lindley is a British statistician, decision theorist and leading advocate of Bayesian statistics.Dennis Lindley grew up in the south-west London suburb of Surbiton. He was an only child and his father was a local building contractor...

's 2-volume work "Introduction to Probability and Statistics from a Bayesian Viewpoint" brought Bayesian methods to a wide audience. In 1979, José-Miguel Bernardo
José-Miguel Bernardo
José-Miguel Bernardo is a Spanish mathematician and statistician. A noted Bayesian, he is currently a professor of Statistics at the University of Valencia.He is a founding co-President of the...

 introduced reference analysis, which offers a general applicable framework for objective analysis. Other well-known proponents of Bayesian probability theory include I.J. Good, B.O. Koopman, Howard Raiffa
Howard Raiffa
Howard Raiffa is the Frank P. Ramsey Professor of Managerial Economics, a joint chair held by the Business School and the Kennedy School of Government at Harvard University...

, Robert Schlaifer
Robert Schlaifer
Robert O. Schlaifer was a pioneer of Bayesian decision theory. At the time of his death he was William Ziegler Professor of Business Administration Emeritus of the Harvard Business School....

 and Alan Turing
Alan Turing
Alan Mathison Turing, OBE, FRS , was an English mathematician, logician, cryptanalyst, and computer scientist. He was highly influential in the development of computer science, providing a formalisation of the concepts of "algorithm" and "computation" with the Turing machine, which played a...

.
In the 1980s, there was a dramatic growth in research and applications of Bayesian methods, mostly attributed to the discovery of Markov chain Monte Carlo
Markov chain Monte Carlo
Markov chain Monte Carlo methods are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample of the...

 methods, which removed many of the computational problem
Computational problem
In theoretical computer science, a computational problem is a mathematical object representing a collection of questions that computers might want to solve. For example, the problem of factoring...

s, and an increasing interest in nonstandard, complex applications. Despite growth of Bayesian research, most undergraduate teaching is still based on frequentist statistics. Nonetheless, Bayesian methods are widely accepted and used, such as for example in the field of machine learning
Machine learning
Machine learning, a branch of artificial intelligence, is a scientific discipline concerned with the design and development of algorithms that allow computers to evolve behaviors based on empirical data, such as from sensor data or databases...

.

Statistics today

During the 20th century, the creation of precise instruments for agricultural research, public health
Public health
Public health is "the science and art of preventing disease, prolonging life and promoting health through the organized efforts and informed choices of society, organizations, public and private, communities and individuals" . It is concerned with threats to health based on population health...

 concerns (epidemiology
Epidemiology
Epidemiology is the study of health-event, health-characteristic, or health-determinant patterns in a population. It is the cornerstone method of public health research, and helps inform policy decisions and evidence-based medicine by identifying risk factors for disease and targets for preventive...

, biostatistics
Biostatistics
Biostatistics is the application of statistics to a wide range of topics in biology...

, etc.), industrial quality control
Quality control
Quality control, or QC for short, is a process by which entities review the quality of all factors involved in production. This approach places an emphasis on three aspects:...

, and economic and social purposes (unemployment
Unemployment
Unemployment , as defined by the International Labour Organization, occurs when people are without jobs and they have actively sought work within the past four weeks...

 rate, econometry, etc.) necessitated substantial advances in statistical practices.

Today the use of statistics has broadened far beyond its origins. Individuals and organizations use statistics to understand data and make informed decisions throughout the natural and social sciences, medicine, business, and other areas.

Statistics is generally regarded not as a subfield of mathematics but rather as a distinct, albeit allied, field. Many universities
University
A university is an institution of higher education and research, which grants academic degrees in a variety of subjects. A university is an organisation that provides both undergraduate education and postgraduate education...

 maintain separate mathematics and statistics departments
Academic department
An academic department is a division of a university or school faculty devoted to a particular academic discipline. This article covers United States usage at the university level....

. Statistics is also taught in departments as diverse as psychology
Psychology
Psychology is the study of the mind and behavior. Its immediate goal is to understand individuals and groups by both establishing general principles and researching specific cases. For many, the ultimate goal of psychology is to benefit society...

, education
Education
Education in its broadest, general sense is the means through which the aims and habits of a group of people lives on from one generation to the next. Generally, it occurs through any experience that has a formative effect on the way one thinks, feels, or acts...

, and public health
Public health
Public health is "the science and art of preventing disease, prolonging life and promoting health through the organized efforts and informed choices of society, organizations, public and private, communities and individuals" . It is concerned with threats to health based on population health...

.

Important contributors to statistics

External links

The source of this article is wikipedia, the free encyclopedia.  The text of this article is licensed under the GFDL.
 
x
OK