Consensus (computer science)
Encyclopedia
Consensus is a problem in distributed computing
that encapsulates the task of group agreement in the presence of faults.
In particular, any process
in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication
.
, the typical consensus problem does not label any single process as a "sender". Every process "proposes" a value; the goal of the protocol is for all correct processes to choose a single value from among those proposed. Valid consensus protocols must provide important guarantees to all processes involved. All correct processes must eventually decide the same value, for example, and that value must be one of those proposed. A correct process is therefore guaranteed that the value it decides was also decided by all other correct processes, and can act on that value accordingly.
More precisely, a consensus protocol must satisfy the four formal properties below.
Termination: every correct process decides some value.
Validity: if all processes propose the same value , then every correct process decides .
Integrity: every correct process decides at most one value, and if it decides some value , then must have been proposed by some process.
Agreement: if a correct process decides , then every correct process decides .
The possibility of faults in the system makes these properties more difficult to satisfy. A simple but invalid consensus protocol might have every process broadcast its proposal to all others, and have a process decide on the smallest value received. Such a protocol, as described, does not satisfy Agreement if faults can occur: if a process crashes after sending its proposal to some processes, but before sending it to others, then the two sets of processes may decide different values.
of distributed computing.
In an asynchronous system, where processes have no common clock and run at arbitrarily varying speeds, the problem is impossible to solve if one process may crash
and processes communicate by sending messages to one another. The technique used to prove this result is sometimes called an FLP impossibility proof, named after its creators, Michael J. Fischer
, Nancy A. Lynch
and Michael S. Paterson, who won the Dijkstra Prize
for this result. The technique has been widely used to prove other impossibility results. For example, a similar proof can be used to show that consensus is also impossible in asynchronous systems where processes communicate by reading and writing shared variables if one process may crash.
A notable exception to the FLP impossibility proof is in quantum computing
, where it has been shown that asynchronous consensus can always be achieved even in the presence of faults, crashes, or deliberate efforts to undermine the consensus by participating processes.
The FLP result does not state that consensus can never be reached: merely that under the model's assumptions, no algorithm can always reach consensus in bounded time. There exist algorithms, even under the asynchronous model, that can reach consensus with probability one. The FLP proof hinges on demonstrating the existence of an order of message receipts that causes the system to never reach consensus. This "bad" input however may be vanishingly unlikely in practice.
In a synchronous system, where all processes run at the same speed, consensus is impossible if processes communicate by sending messages to one another and one third of the processes can experience Byzantine failures
.
. In this scheme, Chubby clients communicate with the Paxos master in order to access/update the replicated log; i.e., read/write to the files.
Bitcoin
uses proof of work to maintain consensus in its peer-to-peer
network. Nodes in the bitcoin network attempt to solve a cryptographic proof-of-work problem, where probability of finding the solution is proportional to the computational effort, in hashes per second, expended, and the node that solves the problem has their version of the block of transactions added to the peer-to-peer distributed timestamp server accepted by all of the other nodes. As any node in the network can attempt to solve the proof-of-work problem, a Sybil attack
becomes unfeasible unless the attacker has over 50% of the computational resources of the network.
Distributed computing
Distributed computing is a field of computer science that studies distributed systems. A distributed system consists of multiple autonomous computers that communicate through a computer network. The computers interact with each other in order to achieve a common goal...
that encapsulates the task of group agreement in the presence of faults.
In particular, any process
Process (computing)
In computing, a process is an instance of a computer program that is being executed. It contains the program code and its current activity. Depending on the operating system , a process may be made up of multiple threads of execution that execute instructions concurrently.A computer program is a...
in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication
State machine replication
-State Machine Definition:For the subsequent discussion a State Machine will be defined as the following tuple of values :* A set of States* A set of Inputs* A set of Outputs...
.
Problem description
A process is called "correct" if it does not fail at any point during its execution. Unlike Terminating Reliable BroadcastTerminating Reliable Broadcast
Terminating Reliable Broadcast is a problem in distributed computing that encapsulates the task of broadcasting a message to a set of receiving processes in the presence of faults...
, the typical consensus problem does not label any single process as a "sender". Every process "proposes" a value; the goal of the protocol is for all correct processes to choose a single value from among those proposed. Valid consensus protocols must provide important guarantees to all processes involved. All correct processes must eventually decide the same value, for example, and that value must be one of those proposed. A correct process is therefore guaranteed that the value it decides was also decided by all other correct processes, and can act on that value accordingly.
More precisely, a consensus protocol must satisfy the four formal properties below.
Termination: every correct process decides some value.
Validity: if all processes propose the same value , then every correct process decides .
Integrity: every correct process decides at most one value, and if it decides some value , then must have been proposed by some process.
Agreement: if a correct process decides , then every correct process decides .
The possibility of faults in the system makes these properties more difficult to satisfy. A simple but invalid consensus protocol might have every process broadcast its proposal to all others, and have a process decide on the smallest value received. Such a protocol, as described, does not satisfy Agreement if faults can occur: if a process crashes after sending its proposal to some processes, but before sending it to others, then the two sets of processes may decide different values.
Failure model
The consensus problem is challenging primarily because one or more of the processes involved may fail at any time. Consensus protocols typically make one of two assumptions about how processes may fail:- In a fail-stop or fail-crash model, a failed process simply stops participating in the protocol. This models a communication failure (e.g., network partition) or a system crash. A variant is the crash-recovery model, where halted processes can restart and restore their state previous to the crash, through stable storageStable storageStable storage is a classification of computer data storage technology that guarantees atomicity for any given write operation and allows software to be written that is robust against some hardware and power failures...
. - In a Byzantine failureByzantine fault toleranceByzantine fault tolerance is a sub-field of fault tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem....
model, a failed process may behave in an arbitrary fashion. In particular, a failed process may collaborate with other failed processes in order to deliberately subvert the operations of the consensus protocol. Because failed processes are allowed to behave arbitrarily and to collaborate, this models both software bugs and the possibility of attack by a hostile adversary.
Impossibility
Consensus has been shown to be impossible to solve in several modelsModel of computation
In computability theory and computational complexity theory, a model of computation is the definition of the set of allowable operations used in computation and their respective costs...
of distributed computing.
In an asynchronous system, where processes have no common clock and run at arbitrarily varying speeds, the problem is impossible to solve if one process may crash
Crash (computing)
A crash in computing is a condition where a computer or a program, either an application or part of the operating system, ceases to function properly, often exiting after encountering errors. Often the offending program may appear to freeze or hang until a crash reporting service documents...
and processes communicate by sending messages to one another. The technique used to prove this result is sometimes called an FLP impossibility proof, named after its creators, Michael J. Fischer
Michael J. Fischer
Michael John Fischer is a computer scientist who works in the fields of distributed computing, parallel computing, cryptography, algorithms and data structures, and computational complexity.-Career:...
, Nancy A. Lynch
Nancy Lynch
Nancy Ann Lynch is a professor at the Massachusetts Institute of Technology. She is the NEC Professor of Software Science and Engineering in the EECS department and heads the Theory of Distributed Systems research group at MIT's Computer Science and Artificial Intelligence Laboratory.She is the...
and Michael S. Paterson, who won the Dijkstra Prize
Dijkstra Prize
The Edsger W. Dijkstra Prize in Distributed Computing is given for outstanding papers on the principles of distributed computing, whose significance and impact on the theory and/or practice of distributed computing has been evident for at least a decade...
for this result. The technique has been widely used to prove other impossibility results. For example, a similar proof can be used to show that consensus is also impossible in asynchronous systems where processes communicate by reading and writing shared variables if one process may crash.
A notable exception to the FLP impossibility proof is in quantum computing
Quantum computer
A quantum computer is a device for computation that makes direct use of quantum mechanical phenomena, such as superposition and entanglement, to perform operations on data. Quantum computers are different from traditional computers based on transistors...
, where it has been shown that asynchronous consensus can always be achieved even in the presence of faults, crashes, or deliberate efforts to undermine the consensus by participating processes.
The FLP result does not state that consensus can never be reached: merely that under the model's assumptions, no algorithm can always reach consensus in bounded time. There exist algorithms, even under the asynchronous model, that can reach consensus with probability one. The FLP proof hinges on demonstrating the existence of an order of message receipts that causes the system to never reach consensus. This "bad" input however may be vanishingly unlikely in practice.
In a synchronous system, where all processes run at the same speed, consensus is impossible if processes communicate by sending messages to one another and one third of the processes can experience Byzantine failures
Byzantine fault tolerance
Byzantine fault tolerance is a sub-field of fault tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem....
.
Important consensus protocols
Google has implemented a distributed lock service library called Chubby. Chubby maintains lock information in small files which are stored in a replicated database to achieve high availability in the face of failures. The database is implemented on top of a fault-tolerant log layer which is based on the Paxos consensus algorithmPaxos algorithm
Paxos is a family of protocols for solving consensus in a network of unreliable processors.Consensus is the process of agreeing on one result among a group of participants...
. In this scheme, Chubby clients communicate with the Paxos master in order to access/update the replicated log; i.e., read/write to the files.
Bitcoin
Bitcoin
Bitcoin is a decentralized, peer-to-peer network over which users make transactions that are tracked and verified through this network. The word Bitcoin also refers to the digital currency implemented as the currency medium for user transactions over this network...
uses proof of work to maintain consensus in its peer-to-peer
Peer-to-peer
Peer-to-peer computing or networking is a distributed application architecture that partitions tasks or workloads among peers. Peers are equally privileged, equipotent participants in the application...
network. Nodes in the bitcoin network attempt to solve a cryptographic proof-of-work problem, where probability of finding the solution is proportional to the computational effort, in hashes per second, expended, and the node that solves the problem has their version of the block of transactions added to the peer-to-peer distributed timestamp server accepted by all of the other nodes. As any node in the network can attempt to solve the proof-of-work problem, a Sybil attack
Sybil attack
The Sybil attack in computer security is an attack wherein a reputation system is subverted by forging identities in peer-to-peer networks. It is named after the subject of the book Sybil, a fictional case study of a woman with multiple personality disorder...
becomes unfeasible unless the attacker has over 50% of the computational resources of the network.
- Chandra-Toueg consensus algorithmChandra-Toueg consensus algorithmThe Chandra–Toueg consensus algorithm is an algorithm for solving consensus in a network of unreliable processes equipped with an eventually strong failure detector. The failure detector is an abstract version of timeouts; it signals to each process when other processes may have crashed...
- Randomized consensus
See also
- Uniform ConsensusUniform ConsensusIn computer science, Uniform Consensus is a distributed computing problem that is a similar to the consensus problem with one more condition which is no two processes decide differently....
- Quantum Byzantine agreementQuantum Byzantine AgreementByzantine fault tolerant protocols are algorithms that are robust to arbitrary types of failures in distributed algorithms. With the advent and popularity of the Internet, there is a need to develop algorithms that do not require any centralized control that have some guarantee of always working...
- Byzantine fault toleranceByzantine fault toleranceByzantine fault tolerance is a sub-field of fault tolerance research inspired by the Byzantine Generals' Problem, which is a generalized version of the Two Generals' Problem....