Version vector
Encyclopedia
A version vector is a mechanism for tracking changes to data in a distributed system, where multiple agents might update the data at different times. The version vector allows the participants to determine if one update preceded another (happened-before
), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable causality
tracking among data replicas and are a basic mechanism for optimistic replication
. In mathematical terms, the version vector generates a preorder
that tracks the events that precede, and may therefore influence, later updates.
Version vectors maintain state identical to that in a vector clock, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:
Pairs of replicas, , , can be compared by inspecting their version vectors and determined to be either: identical (), concurrent (), or ordered ( or ). The ordered relation is defined as: Vector if and only if every element of is less than its corresponding element in , and at least one of the elements is strictly less than. If neither or , but the vectors are not identical, then the two vectors must be concurrent.
Version vectors or variants are used to track updates in many distributed file systems, such as Coda (file system)
and Ficus, and are the main data structure behind optimistic replication .
Happened-before
In computer science, the happened-before relation is a means of ordering events based on the potential causal relationship of pairs of events in a concurrent system, especially asynchronous distributed systems...
), followed it, or if the two updates happened concurrently (and therefore might conflict with each other). In this way, version vectors enable causality
Causality
Causality is the relationship between an event and a second event , where the second event is understood as a consequence of the first....
tracking among data replicas and are a basic mechanism for optimistic replication
Optimistic replication
Optimistic replication is a strategy for replication in which replicas are allowed to diverge.Traditional pessimistic replication systems try to guarantee from the beginning that all of the replicas are identical to each other, as if there was only a single copy of the data all along...
. In mathematical terms, the version vector generates a preorder
Preorder
In mathematics, especially in order theory, preorders are binary relations that are reflexive and transitive.For example, all partial orders and equivalence relations are preorders...
that tracks the events that precede, and may therefore influence, later updates.
Version vectors maintain state identical to that in a vector clock, but the update rules differ slightly; in this example, replicas can either experience local updates (e.g., the user editing a file on the local node), or can synchronize with another replica:
- Initially all vector counters are zero.
- Each time a replica experiences a local update event, it increments its own counter in the vector by one.
- Each time two replicas synchronize, they both set the elements in their copy of the vector to the maximum of the element across both counters: . After synchronization, the two replicas have identical version vectors.
Pairs of replicas, , , can be compared by inspecting their version vectors and determined to be either: identical (), concurrent (), or ordered ( or ). The ordered relation is defined as: Vector if and only if every element of is less than its corresponding element in , and at least one of the elements is strictly less than. If neither or , but the vectors are not identical, then the two vectors must be concurrent.
Version vectors or variants are used to track updates in many distributed file systems, such as Coda (file system)
Coda (file system)
Coda is a distributed file system developed as a research project at Carnegie Mellon University since 1987 under the direction of Mahadev Satyanarayanan. It descended directly from an older version of AFS and offers many similar features. The InterMezzo file system was inspired by Coda...
and Ficus, and are the main data structure behind optimistic replication .
Other Mechanisms
- Hash Histories avoid the use of counters by keeping a set of hashes of each updated version and comparing those sets by set inclusion. However this mechanism can only give probabilistic guaranties.
- Concise Version Vectors allow significant space savings when handling multiple replicated items, such as in directory structures in filesystems.
- Version Stamps allow tracking of a variable number of replicas and do not resort to counters. This mechanism can depict scalability problems in some settings, but can be replaced by Interval Tree Clocks.
- Interval Tree Clocks generalize version vectors and vector clocks and allows dynamic numbers of replicas/processes.
- Bounded Version Vectors allow a bounded implementation, with bounded size counters, as long as replica pairs can be atomically synchronized.