Marzullo's algorithm
Encyclopedia
Marzullo's algorithm, invented by Keith Marzullo
for his Ph.D. dissertation in 1984, is an agreement algorithm used to select sources for estimating accurate time from a number of noisy
time sources. A refined version of it, renamed the "intersection algorithm
", forms part of the modern Network Time Protocol
.
s where the actual value may be outside the confidence interval for some sources. In this case the best estimate is taken to be the smallest interval consistent with the largest number of sources.
If we have the estimates 10 ± 2, 12 ± 1 and 11 ± 1 then these intervals are [8,12], [11,13] and [10,12] which intersect to form [11,12] or 11.5 ± 0.5 as consistent with all three values.
If instead the ranges are [8,12], [11,13] and [14,15] then there is no interval consistent with all these values but [11,12] is consistent with the largest number of sources — namely, two of them.
Finally, if the ranges are [8,9], [8,12] and [10,12] then both the intervals [8,9] and [10,12] are consistent with the largest number of sources.
This procedure determines an interval. If the desired result is a best value from that interval then a naive approach would be to take the center of the interval as the value, which is what was specified in the original Marzullo algorithm. A more sophisticated approach would recognize that this could be throwing away useful information from the confidence intervals of the sources and that a probabilistic model of the sources could return a value other than the center.
Note that the computed value is probably better described as "optimistic" rather than "optimal". For example, consider three intervals [10,12], [11, 13] and [11.99,13]. The algorithm described below computes [11.99, 12] or 11.995 ± 0.005 which is a very precise value. If we suspect that one of the estimates might be incorrect, then at least two of the estimates must be correct. Under this condition, the best estimate is [11,13] since this is the largest interval that always intersects at least two estimates. The algorithm described below is easily parameterized with the maximum number of incorrect estimates.
s of the form. One tuple will represent the beginning of the range, marked with type −1 as and the other will represent the end with type +1 as .
The description of the algorithm uses the following variables: best (largest number of overlapping intervals found), cnt (current number of overlapping intervals), beststart and bestend (the beginning and end of best interval found so far), i (an index), and the table of tuples.
0) Build the table of tuples.
1) Sort
the table by the offset. (If two tuples with the same offset but opposite types exist, indicating that one interval ends just as another begins, then a method of deciding which comes first is necessary. Such an occurrence can be considered an overlap with no duration, which can be found by the algorithm by putting type −1 before type +1. If such pathological overlaps are considered objectionable they can be avoided by putting type +1 before −1 in this case.)
2) [initialize] best=0 cnt=0
3) [loop] go through each tuple in the table in ascending order
6) [end loop] return [beststart,bestend] as optimal interval. The number of false sources (ones which do not overlap the optimal interval returned) is the number of sources minus the value of best.
, where n is the number of sources. In considering the asymptotic time requirement the algorithm can be considered to consist of building the table, sorting it and searching it. Sorting can be done in O(n log n) time, and this dominates the building and searching phases which can be performed in linear
time. Therefore the time efficiency of Marzullo's algorithm is O(n log n)
.
Once the table has been built and sorted it is possible to update the interval for one source (when new information is received) in linear time. Therefore, updating data for one source and finding the best interval can be done in O(n) time.
Keith Marzullo
Keith Marzullo is the inventor of Marzullo's algorithm, which is part of the basis of the Network Time Protocol and the Windows Time Service.Currently he is Professor and Chair, Department of Computer Science and Engineering at University of California, San Diego.-Research:*RAMP *GriPhyN *MURI...
for his Ph.D. dissertation in 1984, is an agreement algorithm used to select sources for estimating accurate time from a number of noisy
Noise
In common use, the word noise means any unwanted sound. In both analog and digital electronics, noise is random unwanted perturbation to a wanted signal; it is called noise as a generalisation of the acoustic noise heard when listening to a weak radio transmission with significant electrical noise...
time sources. A refined version of it, renamed the "intersection algorithm
Intersection algorithm
The Intersection Algorithm is an agreement algorithm used to select sources for estimating accurate time from a number of noisy time sources, it forms part of the modern Network Time Protocol...
", forms part of the modern Network Time Protocol
Network Time Protocol
The Network Time Protocol is a protocol and software implementation for synchronizing the clocks of computer systems over packet-switched, variable-latency data networks. Originally designed by David L...
.
Purpose
Marzullo's algorithm is efficient in terms of time for producing an optimal value from a set of estimates with confidence intervalConfidence interval
In statistics, a confidence interval is a particular kind of interval estimate of a population parameter and is used to indicate the reliability of an estimate. It is an observed interval , in principle different from sample to sample, that frequently includes the parameter of interest, if the...
s where the actual value may be outside the confidence interval for some sources. In this case the best estimate is taken to be the smallest interval consistent with the largest number of sources.
If we have the estimates 10 ± 2, 12 ± 1 and 11 ± 1 then these intervals are [8,12], [11,13] and [10,12] which intersect to form [11,12] or 11.5 ± 0.5 as consistent with all three values.
If instead the ranges are [8,12], [11,13] and [14,15] then there is no interval consistent with all these values but [11,12] is consistent with the largest number of sources — namely, two of them.
Finally, if the ranges are [8,9], [8,12] and [10,12] then both the intervals [8,9] and [10,12] are consistent with the largest number of sources.
This procedure determines an interval. If the desired result is a best value from that interval then a naive approach would be to take the center of the interval as the value, which is what was specified in the original Marzullo algorithm. A more sophisticated approach would recognize that this could be throwing away useful information from the confidence intervals of the sources and that a probabilistic model of the sources could return a value other than the center.
Note that the computed value is probably better described as "optimistic" rather than "optimal". For example, consider three intervals [10,12], [11, 13] and [11.99,13]. The algorithm described below computes [11.99, 12] or 11.995 ± 0.005 which is a very precise value. If we suspect that one of the estimates might be incorrect, then at least two of the estimates must be correct. Under this condition, the best estimate is [11,13] since this is the largest interval that always intersects at least two estimates. The algorithm described below is easily parameterized with the maximum number of incorrect estimates.
Method
Marzullo's algorithm begins by preparing a table of the sources, sorting it and then searching (efficiently) for the intersections of intervals. For each source there is a range [c−r,c+r] defined by c ± r. For each range the table will have two tupleTuple
In mathematics and computer science, a tuple is an ordered list of elements. In set theory, an n-tuple is a sequence of n elements, where n is a positive integer. There is also one 0-tuple, an empty sequence. An n-tuple is defined inductively using the construction of an ordered pair...
s of the form
The description of the algorithm uses the following variables: best (largest number of overlapping intervals found), cnt (current number of overlapping intervals), beststart and bestend (the beginning and end of best interval found so far), i (an index), and the table of tuples.
0) Build the table of tuples.
1) Sort
Sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a list in a certain order. The most-used orders are numerical order and lexicographical order...
the table by the offset. (If two tuples with the same offset but opposite types exist, indicating that one interval ends just as another begins, then a method of deciding which comes first is necessary. Such an occurrence can be considered an overlap with no duration, which can be found by the algorithm by putting type −1 before type +1. If such pathological overlaps are considered objectionable they can be avoided by putting type +1 before −1 in this case.)
2) [initialize] best=0 cnt=0
3) [loop] go through each tuple in the table in ascending order
- 4) [current number of overlapping intervals] cnt=cnt-type[i]
- 5) if cnt>best then best=cnt beststart=offset[i] bestend=offset[i+1]
- commentary: the next tuple, at [i+1], will either be an end of an interval (type=+1) in which case it ends this best interval, or it will be a beginning of an interval (type=−1) and in the next step will replace best.
- ambiguity: unspecified is what to do if best=cnt. This is a condition of a tie for greatest overlap. The decision can either be made to take the smaller of bestend−beststart or offset[i+1]−offset[i] or just take an arbitrary one of the two equally good entries.
6) [end loop] return [beststart,bestend] as optimal interval. The number of false sources (ones which do not overlap the optimal interval returned) is the number of sources minus the value of best.
Efficiency
Marzullo's algorithm is efficient in both space and time. The asymptotic space usage is O(n)Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
, where n is the number of sources. In considering the asymptotic time requirement the algorithm can be considered to consist of building the table, sorting it and searching it. Sorting can be done in O(n log n) time, and this dominates the building and searching phases which can be performed in linear
Linear
In mathematics, a linear map or function f is a function which satisfies the following two properties:* Additivity : f = f + f...
time. Therefore the time efficiency of Marzullo's algorithm is O(n log n)
Big O notation
In mathematics, big O notation is used to describe the limiting behavior of a function when the argument tends towards a particular value or infinity, usually in terms of simpler functions. It is a member of a larger family of notations that is called Landau notation, Bachmann-Landau notation, or...
.
Once the table has been built and sorted it is possible to update the interval for one source (when new information is received) in linear time. Therefore, updating data for one source and finding the best interval can be done in O(n) time.