Approximate counting algorithm
Encyclopedia
The approximate counting algorithm allows the counting of a large number of events using a small amount of memory. Invented in 1977 by Robert Morris of Bell Labs
, it uses probabilistic techniques
to increment the counter
.
estimate" of the actual count. The approximation is mathematically unbiased.
In order to increment the counter, a pseudo-random event is used, such that the incrementing is a probabilistic event. In order to save space, only the exponent is kept. For example, in base 2, the counter can estimate the count to be 1, 2, 4, 8, 16, 32, and all of the powers of two. The memory requirement is simply to hold the exponent.
As an example, to increment from 4 to 8, a pseudo-random number would be generated such that a probability of .25 generates a positive change in the counter. Otherwise, the counter remains at 4.
The table below illustrates some of the potential values of the counter:
If the counter holds the value of 101, which equates to an exponent of 5 (the decimal equivalent of 101), then the estimated count is 2^5, or 32. There is a very low probability that the actual count of increment events was 5 (which would imply that an extremely rare event occurred with the pseudo-random number generator, the same probability as getting 10 consecutive heads in 10 coin flips). The actual count of increment events is likely to be around 32, but it could be infinitely high (with decreasing probabilities for actual counts above 32).
This can be done programmatically by generating "c" pseudo-random bits (where "c" is the current value of the counter), and using the logical AND function on all of those bits. The result is a zero if any of those pseudo-random bits are zero, and a one if they are all ones. Simply add the result to the counter. This procedure should be executed each time the request is made to increment the counter.
, sight and sound recognition, and other artificial intelligence
applications.
Bell Labs
Bell Laboratories is the research and development subsidiary of the French-owned Alcatel-Lucent and previously of the American Telephone & Telegraph Company , half-owned through its Western Electric manufacturing subsidiary.Bell Laboratories operates its...
, it uses probabilistic techniques
Randomized algorithm
A randomized algorithm is an algorithm which employs a degree of randomness as part of its logic. The algorithm typically uses uniformly random bits as an auxiliary input to guide its behavior, in the hope of achieving good performance in the "average case" over all possible choices of random bits...
to increment the counter
Counter
In digital logic and computing, a counter is a device which stores the number of times a particular event or process has occurred, often in relationship to a clock signal.- Electronic counters :...
.
Theory of operation
Using Morris' algorithm, the counter represents an "order of magnitudeOrder of magnitude
An order of magnitude is the class of scale or magnitude of any amount, where each class contains values of a fixed ratio to the class preceding it. In its most common usage, the amount being scaled is 10 and the scale is the exponent being applied to this amount...
estimate" of the actual count. The approximation is mathematically unbiased.
In order to increment the counter, a pseudo-random event is used, such that the incrementing is a probabilistic event. In order to save space, only the exponent is kept. For example, in base 2, the counter can estimate the count to be 1, 2, 4, 8, 16, 32, and all of the powers of two. The memory requirement is simply to hold the exponent.
As an example, to increment from 4 to 8, a pseudo-random number would be generated such that a probability of .25 generates a positive change in the counter. Otherwise, the counter remains at 4.
The table below illustrates some of the potential values of the counter:
Stored Binary Value of Counter | Approximation | Range of Possible Values for the Actual Count |
---|---|---|
0 | 1 | 0, or initial value |
1 | 2 | 1 or more |
10 | 4 | 2 or more |
11 | 8 | 3 or more |
100 | 16 | 4 or more |
101 | 32 | 5 or more |
If the counter holds the value of 101, which equates to an exponent of 5 (the decimal equivalent of 101), then the estimated count is 2^5, or 32. There is a very low probability that the actual count of increment events was 5 (which would imply that an extremely rare event occurred with the pseudo-random number generator, the same probability as getting 10 consecutive heads in 10 coin flips). The actual count of increment events is likely to be around 32, but it could be infinitely high (with decreasing probabilities for actual counts above 32).
Algorithm
When incrementing the counter, simply "flip a coin" the number of times of the counter's current value. If it comes up "Heads" each time, then increment the counter. Otherwise, do not increment it.This can be done programmatically by generating "c" pseudo-random bits (where "c" is the current value of the counter), and using the logical AND function on all of those bits. The result is a zero if any of those pseudo-random bits are zero, and a one if they are all ones. Simply add the result to the counter. This procedure should be executed each time the request is made to increment the counter.
Applications
The algorithm is useful in examining large data streams for patterns. This is particularly useful in applications of data compressionData compression
In computer science and information theory, data compression, source coding or bit-rate reduction is the process of encoding information using fewer bits than the original representation would use....
, sight and sound recognition, and other artificial intelligence
Artificial intelligence
Artificial intelligence is the intelligence of machines and the branch of computer science that aims to create it. AI textbooks define the field as "the study and design of intelligent agents" where an intelligent agent is a system that perceives its environment and takes actions that maximize its...
applications.
Sources
- Morris, R. Counting large numbers of events in small registers. Communications of the ACM 21, 10 (1977), 840–842
- Flajolet, P. Approximate Counting: A Detailed Analysis. BIT 25, (1985), 113-134 http://algo.inria.fr/flajolet/Publications/Flajolet85c.pdf