Statistical randomness
Encyclopedia
A numeric sequence
is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll
, or the digits of π
exhibit statistical randomness.
Statistical randomness does not necessarily imply "true" randomness
, i.e., objective unpredictability. Pseudorandomness
is sufficient for many uses, such as statistics, hence the name statistical randomness.
Global randomness and local randomness are different. Most philosophical conceptions of randomness are global—because they are based on the idea that "in the long run" a sequence looks truly random, even if certain sub-sequences would not look random. In a "truly" random sequence of numbers of sufficient length, for example, it is probable there would be long sequences of nothing but zeros, though on the whole the sequence might be random. Local randomness refers to the idea that there can be minimum sequence lengths in which random distributions are approximated. Long stretches of the same digits, even those generated by "truly" random processes, would diminish the "local randomness" of a sample (it might only be locally random for sequences of 10,000 digits; taking sequences of less than 1,000 might not appear random at all, for example).
A sequence exhibiting a pattern is not thereby proved not statistically random. According to principles of Ramsey theory
, sufficiently large objects must necessarily contain a given substructure ("complete disorder is impossible").
Legislation concerning gambling
imposes certain standards of statistical randomness to slot machine
s.
in 1938. They were built on statistical tools such as Pearson's chi-squared test
that were developed to distinguish whether experimental phenomena matched their theoretical probabilities. Pearson developed his test originally by showing that a number of dice experiments by W.F.R. Weldon did not display "random" behavior.
Kendall and Smith's original four tests were hypothesis tests
, which took as their null hypothesis
the idea that each number in a given random sequence had an equal chance of occurring, and that various other patterns in the data should be also distributed equiprobably.
If a given sequence was able to pass all of these tests within a given degree of significance (generally 5%), then it was judged to be, in their words "locally random". Kendall and Smith differentiated "local randomness" from "true randomness" in that many sequences generated with truly random methods might not display "local randomness" to a given degree — very large sequences might contain many rows of a single digit. This might be "random" on the scale of the entire sequence, but in a smaller block it would not be "random" (it would not pass their tests), and would be useless for a number of statistical applications.
As random number sets became more and more common, more tests, of increasing sophistication were used. Some modern tests plot random digits as points on a three-dimensional plane, which can then be rotated to look for hidden patterns. In 1995, the statistician George Marsaglia
created a set of tests known as the diehard tests
, which he distributes with a CD-ROM
of 5 billion pseudorandom numbers.
Pseudorandom number generator
s require tests as exclusive verifications for their "randomness," as they are decidedly not produced by "truly random" processes, but rather by deterministic algorithms. Over the history of random number generation, many sources of numbers thought to appear "random" under testing have later been discovered to be very non-random when subjected to certain types of tests. The notion of quasi-random numbers was developed to circumvent some of these problems, though pseudorandom number generators are still extensively used in many applications (even ones known to be extremely "non-random"), as they are "good enough" for most applications.
Other tests :
Sequence
In mathematics, a sequence is an ordered list of objects . Like a set, it contains members , and the number of terms is called the length of the sequence. Unlike a set, order matters, and exactly the same elements can appear multiple times at different positions in the sequence...
is said to be statistically random when it contains no recognizable patterns or regularities; sequences such as the results of an ideal dice roll
Dice
A die is a small throwable object with multiple resting positions, used for generating random numbers...
, or the digits of π
Pi
' is a mathematical constant that is the ratio of any circle's circumference to its diameter. is approximately equal to 3.14. Many formulae in mathematics, science, and engineering involve , which makes it one of the most important mathematical constants...
exhibit statistical randomness.
Statistical randomness does not necessarily imply "true" randomness
Randomness
Randomness has somewhat differing meanings as used in various fields. It also has common meanings which are connected to the notion of predictability of events....
, i.e., objective unpredictability. Pseudorandomness
Pseudorandomness
A pseudorandom process is a process that appears to be random but is not. Pseudorandom sequences typically exhibit statistical randomness while being generated by an entirely deterministic causal process...
is sufficient for many uses, such as statistics, hence the name statistical randomness.
Global randomness and local randomness are different. Most philosophical conceptions of randomness are global—because they are based on the idea that "in the long run" a sequence looks truly random, even if certain sub-sequences would not look random. In a "truly" random sequence of numbers of sufficient length, for example, it is probable there would be long sequences of nothing but zeros, though on the whole the sequence might be random. Local randomness refers to the idea that there can be minimum sequence lengths in which random distributions are approximated. Long stretches of the same digits, even those generated by "truly" random processes, would diminish the "local randomness" of a sample (it might only be locally random for sequences of 10,000 digits; taking sequences of less than 1,000 might not appear random at all, for example).
A sequence exhibiting a pattern is not thereby proved not statistically random. According to principles of Ramsey theory
Ramsey theory
Ramsey theory, named after the British mathematician and philosopher Frank P. Ramsey, is a branch of mathematics that studies the conditions under which order must appear...
, sufficiently large objects must necessarily contain a given substructure ("complete disorder is impossible").
Legislation concerning gambling
Gambling
Gambling is the wagering of money or something of material value on an event with an uncertain outcome with the primary intent of winning additional money and/or material goods...
imposes certain standards of statistical randomness to slot machine
Slot machine
A slot machine , informally fruit machine , the slots , poker machine or "pokies" or simply slot is a casino gambling machine with three or more reels which spin when a button is pushed...
s.
Tests
The first tests for random numbers were published by M.G. Kendall and Bernard Babington Smith in the Journal of the Royal Statistical SocietyRoyal Statistical Society
The Royal Statistical Society is a learned society for statistics and a professional body for statisticians in the UK.-History:It was founded in 1834 as the Statistical Society of London , though a perhaps unrelated London Statistical Society was in existence at least as early as 1824...
in 1938. They were built on statistical tools such as Pearson's chi-squared test
Pearson's chi-squared test
Pearson's chi-squared test is the best-known of several chi-squared tests – statistical procedures whose results are evaluated by reference to the chi-squared distribution. Its properties were first investigated by Karl Pearson in 1900...
that were developed to distinguish whether experimental phenomena matched their theoretical probabilities. Pearson developed his test originally by showing that a number of dice experiments by W.F.R. Weldon did not display "random" behavior.
Kendall and Smith's original four tests were hypothesis tests
Statistical hypothesis testing
A statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
, which took as their null hypothesis
Null hypothesis
The practice of science involves formulating and testing hypotheses, assertions that are capable of being proven false using a test of observed data. The null hypothesis typically corresponds to a general or default position...
the idea that each number in a given random sequence had an equal chance of occurring, and that various other patterns in the data should be also distributed equiprobably.
- The frequency test, was very basic: checking to make sure that there were roughly the same number of 0s, 1s, 2s, 3s, etc.
- The serial test, did the same thing but for sequences of two digits at a time (00, 01, 02, etc.), comparing their observed frequencies with their hypothetical predictions were they equally distributed.
- The poker test, tested for certain sequences of five numbers at a time (aaaaa, aaaab, aaabb, etc.) based on hands in the game pokerPokerPoker is a family of card games that share betting rules and usually hand rankings. Poker games differ in how the cards are dealt, how hands may be formed, whether the high or low hand wins the pot in a showdown , limits on bet sizes, and how many rounds of betting are allowed.In most modern poker...
. - The gap test, looked at the distances between zeroes (00 would be a distance of 0, 030 would be a distance of 1, 02250 would be a distance of 3, etc.).
If a given sequence was able to pass all of these tests within a given degree of significance (generally 5%), then it was judged to be, in their words "locally random". Kendall and Smith differentiated "local randomness" from "true randomness" in that many sequences generated with truly random methods might not display "local randomness" to a given degree — very large sequences might contain many rows of a single digit. This might be "random" on the scale of the entire sequence, but in a smaller block it would not be "random" (it would not pass their tests), and would be useless for a number of statistical applications.
As random number sets became more and more common, more tests, of increasing sophistication were used. Some modern tests plot random digits as points on a three-dimensional plane, which can then be rotated to look for hidden patterns. In 1995, the statistician George Marsaglia
George Marsaglia
George Marsaglia was an American mathematician and computer scientist. He established the lattice structure of congruential random number generators in the paper "Random numbers fall mainly in the planes". This phenomenon is sometimes called the Marsaglia effect...
created a set of tests known as the diehard tests
Diehard tests
The diehard tests are a battery of statistical tests for measuring the quality of a random number generator. They were developed by George Marsaglia over several years and first published in 1995 on a CD-ROM of random numbers.These are the tests:...
, which he distributes with a CD-ROM
CD-ROM
A CD-ROM is a pre-pressed compact disc that contains data accessible to, but not writable by, a computer for data storage and music playback. The 1985 “Yellow Book” standard developed by Sony and Philips adapted the format to hold any form of binary data....
of 5 billion pseudorandom numbers.
Pseudorandom number generator
Pseudorandom number generator
A pseudorandom number generator , also known as a deterministic random bit generator , is an algorithm for generating a sequence of numbers that approximates the properties of random numbers...
s require tests as exclusive verifications for their "randomness," as they are decidedly not produced by "truly random" processes, but rather by deterministic algorithms. Over the history of random number generation, many sources of numbers thought to appear "random" under testing have later been discovered to be very non-random when subjected to certain types of tests. The notion of quasi-random numbers was developed to circumvent some of these problems, though pseudorandom number generators are still extensively used in many applications (even ones known to be extremely "non-random"), as they are "good enough" for most applications.
Other tests :
- The Monobit test treats each output bit of the random number generator as a coin flip test, and determine if the observed number of heads and tails are close to the expected 50% frequency. The number of heads in a coin flip trail forms a binomial distribution.
- The Wald–Wolfowitz runs test tests for the number of bit transitions between 0 bits, and 1 bits, comparing the observed frequencies with expected frequency of a random bit sequence.
- Information entropyInformation entropyIn information theory, entropy is a measure of the uncertainty associated with a random variable. In this context, the term usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message, usually in units such as bits...
- AutocorrelationAutocorrelationAutocorrelation is the cross-correlation of a signal with itself. Informally, it is the similarity between observations as a function of the time separation between them...
test - Kolmogorov–Smirnov test
- The Spectral Test
- Maurer's Universal Statistical Test.
See also
- Checking if a coin is fairChecking if a coin is fairIn statistics, the question of checking whether a coin is fair is one whose importance lies, firstly, in providing a simple problem on which to illustrate basic ideas of statistical inference and, secondly, in providing a simple problem that can be used to compare various competing methods of...
- Normal numberNormal numberIn mathematics, a normal number is a real number whose infinite sequence of digits in every base b is distributed uniformly in the sense that each of the b digit values has the same natural density 1/b, also all possible b2 pairs of digits are equally likely with density b−2,...
- RandomnessRandomnessRandomness has somewhat differing meanings as used in various fields. It also has common meanings which are connected to the notion of predictability of events....
- Random numberRandom numberRandom number may refer to:* A number generated for or part of a set exhibiting statistical randomness.* A random sequence obtained from a stochastic process.* An algorithmically random sequence in algorithmic information theory....
- Statistical hypothesis testingStatistical hypothesis testingA statistical hypothesis test is a method of making decisions using data, whether from a controlled experiment or an observational study . In statistics, a result is called statistically significant if it is unlikely to have occurred by chance alone, according to a pre-determined threshold...
- One-time padOne-time padIn cryptography, the one-time pad is a type of encryption, which has been proven to be impossible to crack if used correctly. Each bit or character from the plaintext is encrypted by a modular addition with a bit or character from a secret random key of the same length as the plaintext, resulting...
- Randomness testsRandomness testsThe issue of randomness is an important philosophical and theoretical question.Many random number generators in use today generate what are called "random sequences" but they are actually the result of prescribed algorithms and so they are called pseudo-random number generators.These generators do...
- Seven states of randomnessSeven states of randomnessThe seven states of randomness in probability theory, fractals and risk analysis are extensions of the concept of normal distribution. These seven states were first introduced in by Benoît Mandelbrot in his 1997 book Fractals and scaling in finance which applied fractal analysis to the study of...
- Complete spatial randomnessComplete spatial randomnessComplete spatial randomness describes a point process whereby point events occur within a given study area in a completely random fashion. Such a process is often modeled using only one parameter, i.e. the density of points, \rho within the defined area...
External links
- DieHarder: A free (GPLGNU General Public LicenseThe GNU General Public License is the most widely used free software license, originally written by Richard Stallman for the GNU Project....
) CC (programming language)C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
Random Number Test Suite. - Generating Normal Distributed Random Numbers