Atomic commit
Encyclopedia
An atomic commit is an operation in which a set of distinct changes is applied as a single operation. If the changes are applied then the atomic commit is said to have succeeded. If there is a failure before the atomic commit can be completed then all of the changes completed in the atomic commit are reversed. This ensures that the system is always left in a consistent state. The other key property of isolation comes from their nature as atomic
operations. Isolation ensures that only one atomic commit is processed at a time. The most common uses of atomic commits are in database systems and revision control systems
.
The problem with atomic commits is that they require coordination between multiple systems. As computer networks are unreliable services this means no algorithm can coordinate with all systems as proven in the Two Generals Problem
. As databases become more and more distributed this coordination will increase the difficulty of making truly atomic commits.
This example is complicated by a transaction to check the balance of account Y during a transaction for transferring 100 dollars from account X to Y. To start, first 100 dollars is removed from account X. Second, 100 dollars is added to account Y. If the entire operation is not completed as one atomic commit several problems could occur. If the system fails in the middle of the operation, after removing the money from X and before adding into Y, then 100 dollars has just disappeared. Another issue is if the balance of Y is checked before the 100 dollars is added. The wrong balance for Y will be reported.
With atomic commits neither of these cases can happen, in the first case of the system failure, the atomic commit would be rolled back and the money returned to X. In the second case, the request of the balance of Y cannot occur until the atomic commit is fully completed.
, atomicity and consistency. Consistency is only achieved if each change in the atomic commit is consistent.
As shown in the example atomic commits are critical to multistep operations in databases. Due to modern hardware design the physical disk
on which the database resides true atomic commits cannot exist. The smallest area that can be written to on disk is known as a sector. A single database entry may span several different sectors. Only one sector can be written at a time. This writing limit is why true atomic commits are not possible. After the database entries in memory
have been modified they are queued up to be written to disk. This means the same problems identified in the example have reoccurred. Any algorithmic solution to this problem will still encounter the Two Generals’ Problem. The two-phase commit protocol
and three-phase commit protocol
attempt to solve this and some of the other problems associated with atomic commits.
The two-phase commit protocol requires a coordinator to maintain all the information needed to recover the original state of the database if something goes wrong. As the name indicates there are two phases, voting and commit.
During the voting phase each node writes the changes in the atomic commit to its own disk. The nodes then report their status to the coordinator. If any node does not report to the coordinator or their status message is lost the coordinator assumes the node’s write failed. Once all of the nodes have reported to the coordinator the second phase begins.
During the commit phase the coordinator sends a commit message to each of the nodes to record in their individual logs. Until this message is added to a node's log, any changes made will be recorded as incomplete. If any of the nodes reported a failure the coordinator will instead send a rollback message. This will remove any changes the nodes have written to disk.
The three-phase commit protocol seeks to remove the main problem with the two phase commit protocol which occurs if a coordinator and another node fail at the same time during the commit phase neither can tell what action should occur. To solve this problem a third phase is added to the protocol. The prepare to commit phase occurs after the voting phase and before the commit phase.
In the voting phase, similar to the two-phase commit, the coordinator requests that each node is ready to commit. If any node fails the coordinator will timeout while waiting for the failed node. If this happens the coordinator sends an abort message to every node. The same action will be undertaken if any of the nodes return a failure message.
Upon receiving success messages from each node in the voting phase the prepare to commit phase begins. During this phase the coordinator sends a prepare message to each node. Each node must acknowledge the prepare message and reply. If any reply is missed or any node return that they are not prepared then the coordinator sends an abort message. Any node that does not receive a prepare message before the timeout expires aborts the commit.
After all nodes have replied to the prepare message then the commit phase begins. In this phase the coordinator sends a
commit message to each node. When each node receives this message is performs the actual commit. If the commit message does not reach a node due to the message being lost or the coordinator fails they will perform the commit if the timeout expires. If the coordinator fails upon recovery it will send a commit message to each node.
systems. This allows multiple modified files to be uploaded and merged into the source. Most revision control systems support atomic commits (CVS
and VSS
are the major exceptions).
Like database systems commits may fail due to a problem in applying the changes on disk. Unlike a database system which overwrites any existing data with the data from the changeset
, revision control systems merge
the modification in the changeset into the existing data. If the system cannot complete the merge then the commit will be rejected. If a merge cannot be resolved by the revision control software it is up to the user to merge the changes. For revision control systems that support atomic commits, this failure in merging would result in a failed commit.
Atomic commits are crucial for maintaining a consistent state in the repository. Without atomic commits some changes a developer has made may be applied but other changes may not. If these changes have any kind of coupling this will result in errors. Atomic commits prevent this by not applying partial changes which would create these errors. Note that if the changes already contain errors, atomic commits offer no fix.
The greater understandability comes from the small size and focused nature of the commit. It is much easier to understand what is changed and reasoning behind the changes if you are only looking for one kind of change. This becomes especially important when making format changes to the source code. If format and functional changes are combined it becomes very difficult to identify useful changes. Imagine if the spacing in a file is changed from using tabs to three spaces every tab in the file will show as having been changed. This becomes critical if some functional changes are also made as a reviewer may simply not see the functional changes.
If only atomic commits are made then commits that introduce errors become much simpler to identify. You are not required to look though every commit to see if it was the cause of the error, only the commits dealing with that functionality need to be examined. If the error is to be rolled back atomic commits again make the job much simpler. Instead of having to revert
to the offending revision and remove the changes manually before integrating any later changes; the developer can simply revert any changes in the identified commit. This also means that a developer will not remove changes that did not cause the error by accident.
Atomic commits also allow bug fixes to be easily reviewed if only a single bug fixes committed at a time. Instead of having to check multiple potentially unrelated files the reviewer must only check files and changes that directly impact the bug being fixed. This also means that bug fixes can be easily packaged for testing as only the changes that fix the bug are in the commit.
Atomic
Atomic may refer to:* Of or relating to the Atom, the smallest particle of a chemical element that retains its chemical properties* Atomic Age, also known as the "Atomic Era"* Atomic , an Australian computing and technology magazine...
operations. Isolation ensures that only one atomic commit is processed at a time. The most common uses of atomic commits are in database systems and revision control systems
Revision control
Revision control, also known as version control and source control , is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files...
.
The problem with atomic commits is that they require coordination between multiple systems. As computer networks are unreliable services this means no algorithm can coordinate with all systems as proven in the Two Generals Problem
Two Generals' Problem
In computing, the Two Generals' Problem is a thought experiment meant to illustrate the pitfalls and design challenges of attempting to coordinate an action by communicating over an unreliable link...
. As databases become more and more distributed this coordination will increase the difficulty of making truly atomic commits.
Necessity for Atomic Commits
Atomic commits are essential for multi-step updates to data. This can be clearly shown in a simple example of a money transfer between two checking accounts.This example is complicated by a transaction to check the balance of account Y during a transaction for transferring 100 dollars from account X to Y. To start, first 100 dollars is removed from account X. Second, 100 dollars is added to account Y. If the entire operation is not completed as one atomic commit several problems could occur. If the system fails in the middle of the operation, after removing the money from X and before adding into Y, then 100 dollars has just disappeared. Another issue is if the balance of Y is checked before the 100 dollars is added. The wrong balance for Y will be reported.
With atomic commits neither of these cases can happen, in the first case of the system failure, the atomic commit would be rolled back and the money returned to X. In the second case, the request of the balance of Y cannot occur until the atomic commit is fully completed.
Database System
Atomic commits in database systems fulfil two of the key properties of ACIDACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...
, atomicity and consistency. Consistency is only achieved if each change in the atomic commit is consistent.
As shown in the example atomic commits are critical to multistep operations in databases. Due to modern hardware design the physical disk
Data storage device
thumb|200px|right|A reel-to-reel tape recorder .The magnetic tape is a data storage medium. The recorder is data storage equipment using a portable medium to store the data....
on which the database resides true atomic commits cannot exist. The smallest area that can be written to on disk is known as a sector. A single database entry may span several different sectors. Only one sector can be written at a time. This writing limit is why true atomic commits are not possible. After the database entries in memory
Computer memory
In computing, memory refers to the physical devices used to store programs or data on a temporary or permanent basis for use in a computer or other digital electronic device. The term primary memory is used for the information in physical systems which are fast In computing, memory refers to the...
have been modified they are queued up to be written to disk. This means the same problems identified in the example have reoccurred. Any algorithmic solution to this problem will still encounter the Two Generals’ Problem. The two-phase commit protocol
Two-phase commit protocol
In transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol . It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction...
and three-phase commit protocol
Three-phase commit protocol
In computer networking and databases, the three-phase commit protocol is a distributed algorithm which lets all nodes in a distributed system agree to commit a transaction. Unlike the two-phase commit protocol however, 3PC is non-blocking. Specifically, 3PC places an upper bound on the amount...
attempt to solve this and some of the other problems associated with atomic commits.
The two-phase commit protocol requires a coordinator to maintain all the information needed to recover the original state of the database if something goes wrong. As the name indicates there are two phases, voting and commit.
During the voting phase each node writes the changes in the atomic commit to its own disk. The nodes then report their status to the coordinator. If any node does not report to the coordinator or their status message is lost the coordinator assumes the node’s write failed. Once all of the nodes have reported to the coordinator the second phase begins.
During the commit phase the coordinator sends a commit message to each of the nodes to record in their individual logs. Until this message is added to a node's log, any changes made will be recorded as incomplete. If any of the nodes reported a failure the coordinator will instead send a rollback message. This will remove any changes the nodes have written to disk.
The three-phase commit protocol seeks to remove the main problem with the two phase commit protocol which occurs if a coordinator and another node fail at the same time during the commit phase neither can tell what action should occur. To solve this problem a third phase is added to the protocol. The prepare to commit phase occurs after the voting phase and before the commit phase.
In the voting phase, similar to the two-phase commit, the coordinator requests that each node is ready to commit. If any node fails the coordinator will timeout while waiting for the failed node. If this happens the coordinator sends an abort message to every node. The same action will be undertaken if any of the nodes return a failure message.
Upon receiving success messages from each node in the voting phase the prepare to commit phase begins. During this phase the coordinator sends a prepare message to each node. Each node must acknowledge the prepare message and reply. If any reply is missed or any node return that they are not prepared then the coordinator sends an abort message. Any node that does not receive a prepare message before the timeout expires aborts the commit.
After all nodes have replied to the prepare message then the commit phase begins. In this phase the coordinator sends a
commit message to each node. When each node receives this message is performs the actual commit. If the commit message does not reach a node due to the message being lost or the coordinator fails they will perform the commit if the timeout expires. If the coordinator fails upon recovery it will send a commit message to each node.
Revision Control
The other area where atomic commits are employed is revision controlRevision control
Revision control, also known as version control and source control , is the management of changes to documents, programs, and other information stored as computer files. It is most commonly used in software development, where a team of people may change the same files...
systems. This allows multiple modified files to be uploaded and merged into the source. Most revision control systems support atomic commits (CVS
Concurrent Versions System
The Concurrent Versions System , also known as the Concurrent Versioning System, is a client-server free software revision control system in the field of software development. Version control system software keeps track of all work and all changes in a set of files, and allows several developers ...
and VSS
Microsoft Visual SourceSafe
Microsoft Visual SourceSafe is a source control software package oriented towards small software development projects. Like most source control systems, SourceSafe creates a virtual library of computer files...
are the major exceptions).
Like database systems commits may fail due to a problem in applying the changes on disk. Unlike a database system which overwrites any existing data with the data from the changeset
Changeset
In revision control, a changeset is a way to group a number of modifications that are relevant to each other in one atomic package, that may be canceled or propagated as needed. This is one synchronization model....
, revision control systems merge
Merge (revision control)
Merging in revision control, is a fundamental operation that reconciles multiple changes made to a revision-controlled collection of files. Most often, it is necessary when a file is modified by two people on two different computers at the same time...
the modification in the changeset into the existing data. If the system cannot complete the merge then the commit will be rejected. If a merge cannot be resolved by the revision control software it is up to the user to merge the changes. For revision control systems that support atomic commits, this failure in merging would result in a failed commit.
Atomic commits are crucial for maintaining a consistent state in the repository. Without atomic commits some changes a developer has made may be applied but other changes may not. If these changes have any kind of coupling this will result in errors. Atomic commits prevent this by not applying partial changes which would create these errors. Note that if the changes already contain errors, atomic commits offer no fix.
Atomic Commit Convention
When using a revision control systems a common convention is to use small commits. These are sometimes referred to as atomic commits as they (ideally) only affect a single aspect of the system. These atomic commits allow for greater understandability, less effort to rollback changes, easier bug identification.The greater understandability comes from the small size and focused nature of the commit. It is much easier to understand what is changed and reasoning behind the changes if you are only looking for one kind of change. This becomes especially important when making format changes to the source code. If format and functional changes are combined it becomes very difficult to identify useful changes. Imagine if the spacing in a file is changed from using tabs to three spaces every tab in the file will show as having been changed. This becomes critical if some functional changes are also made as a reviewer may simply not see the functional changes.
If only atomic commits are made then commits that introduce errors become much simpler to identify. You are not required to look though every commit to see if it was the cause of the error, only the commits dealing with that functionality need to be examined. If the error is to be rolled back atomic commits again make the job much simpler. Instead of having to revert
Reversion (software development)
In software development , reversion or reverting is the abandonment of one or more recent changes in favor of a return to a previous version of the material at hand In software development (and by extension in content editing environments, especially wikis, that make use of the software development...
to the offending revision and remove the changes manually before integrating any later changes; the developer can simply revert any changes in the identified commit. This also means that a developer will not remove changes that did not cause the error by accident.
Atomic commits also allow bug fixes to be easily reviewed if only a single bug fixes committed at a time. Instead of having to check multiple potentially unrelated files the reviewer must only check files and changes that directly impact the bug being fixed. This also means that bug fixes can be easily packaged for testing as only the changes that fix the bug are in the commit.
See also
- Two-phase commit protocolTwo-phase commit protocolIn transaction processing, databases, and computer networking, the two-phase commit protocol is a type of atomic commitment protocol . It is a distributed algorithm that coordinates all the processes that participate in a distributed atomic transaction on whether to commit or abort the transaction...
- Three-phase commit protocolThree-phase commit protocolIn computer networking and databases, the three-phase commit protocol is a distributed algorithm which lets all nodes in a distributed system agree to commit a transaction. Unlike the two-phase commit protocol however, 3PC is non-blocking. Specifically, 3PC places an upper bound on the amount...
- Commit (data management)Commit (data management)In the context of computer science and data management, commit refers to the idea of making a set of tentative changes permanent. A popular usage is at the end of a transaction. A commit is an act of committing.-Data management:...
- Atomic operation