Snapshot algorithm
Encyclopedia
The snapshot algorithm is an algorithm
used in distributed systems for recording a consistent global state of an asynchronous
system. It is also known as Chandy-Lamport Algorithm for the determination of consistent global states and obtaining consistent cuts, after Leslie Lamport
and K. Mani Chandy
.
. He posed the problem to me over dinner, but we had both had too much wine to think about it right then. The next morning, in the shower, I came up with the solution. When I arrived at Chandy's office, he was waiting for me with the same solution.”
It was defined in a paper titled “Distributed Snapshots: Determining Global States of a Distributed System.”
The algorithm works using marker messages. Each process that wants to initiate a snapshot records its local state and sends a marker on each of its outgoing channels. All the other processes, upon receiving a marker, record their local state, the state of the channel from which the marker just came as empty, and send marker messages on all of their outgoing channels. If a process receives a marker after having recorded its local state, it records the state of the incoming channel from which the marker came as carrying all the messages received since it first recorded its local state.
Some of the assumptions of the algorithm can be facilitated using a more reliable communication protocol such as TCP/IP
. The algorithm can be adapted so that there could be multiple snapshots occurring simultaneously.
From this, the observer builds up a complete snapshot: a saved state for each process and all messages “in the ether” are saved.
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
used in distributed systems for recording a consistent global state of an asynchronous
Asynchronous communication
In telecommunications, asynchronous communication is transmission of data without the use of an external clock signal, where data can be transmitted intermittently rather than in a steady stream. Any timing required to recover data from the communication symbols is encoded within the symbols...
system. It is also known as Chandy-Lamport Algorithm for the determination of consistent global states and obtaining consistent cuts, after Leslie Lamport
Leslie Lamport
Leslie Lamport is an American computer scientist. A graduate of the Bronx High School of Science, he received a B.S. in mathematics from the Massachusetts Institute of Technology in 1960, and M.A. and Ph.D. degrees in mathematics from Brandeis University, respectively in 1963 and 1972...
and K. Mani Chandy
K. Mani Chandy
Kanianthra Mani Chandy is the Simon Ramo Professor of Computer Science at the California Institute of Technology. He has been the Executive Officer of the Computer Science Department twice, and he has been a professor at Caltech since 1989....
.
History
According to Leslie Lamport's website, “The distributed snapshot algorithm described here came about when I visited Chandy, who was then at the University of Texas in AustinUniversity of Texas at Austin
The University of Texas at Austin is a state research university located in Austin, Texas, USA, and is the flagship institution of the The University of Texas System. Founded in 1883, its campus is located approximately from the Texas State Capitol in Austin...
. He posed the problem to me over dinner, but we had both had too much wine to think about it right then. The next morning, in the shower, I came up with the solution. When I arrived at Chandy's office, he was waiting for me with the same solution.”
It was defined in a paper titled “Distributed Snapshots: Determining Global States of a Distributed System.”
Definition
The assumptions of the algorithm are as follows:- There are no failures and all messages arrive intact and only once
- The communication channels are unidirectional and FIFOFIFOFIFO is an acronym for First In, First Out, an abstraction related to ways of organizing and manipulation of data relative to time and prioritization...
ordered - There is a communication path between any two processes in the system
- Any process may initiate the snapshot algorithm
- The snapshot algorithm does not interfere with the normal execution of the processes
- Each process in the system records its local state and the state of its incoming channels
The algorithm works using marker messages. Each process that wants to initiate a snapshot records its local state and sends a marker on each of its outgoing channels. All the other processes, upon receiving a marker, record their local state, the state of the channel from which the marker just came as empty, and send marker messages on all of their outgoing channels. If a process receives a marker after having recorded its local state, it records the state of the incoming channel from which the marker came as carrying all the messages received since it first recorded its local state.
Some of the assumptions of the algorithm can be facilitated using a more reliable communication protocol such as TCP/IP
Internet protocol suite
The Internet protocol suite is the set of communications protocols used for the Internet and other similar networks. It is commonly known as TCP/IP from its most important protocols: Transmission Control Protocol and Internet Protocol , which were the first networking protocols defined in this...
. The algorithm can be adapted so that there could be multiple snapshots occurring simultaneously.
Working
The snapshot algorithm works like this:- The observer process (the process taking a snapshot):
- Saves its own local state
- Sends a snapshot request message bearing a snapshot token to all other processes
- A process receiving the snapshot token for the first time on any message:
- Sends the observer process its own saved state
- Attaches the snapshot token to all subsequent messages (to help propagate the snapshot token)
- Should a process that has already received the snapshot token receive a message that does not bear the snapshot token, this process will forward that message to the observer process. This message was obviously sent before the snapshot “cut off” (as it does not bear a snapshot token and thus must have come from before the snapshot token was sent out) and needs to be included in the snapshot.
From this, the observer builds up a complete snapshot: a saved state for each process and all messages “in the ether” are saved.