Isolation (computer science)
Encyclopedia
In database
systems, isolation is a property that defines how/when the changes made by one operation become visible to other concurrent operations. Isolation is one of the ACID
(Atomicity, Consistency, Isolation, Durability) properties.
properties in a DBMS
(Database Management System), the isolation property is the one most often relaxed. When attempting to maintain the highest level of isolation, a DBMS usually acquires locks
on data or implements multiversion concurrency control
, which may result in a loss of concurrency
. This requires adding additional logic for the application to function correctly.
Most DBMS's offer a number of transaction isolation levels, which control the degree of locking that occurs when selecting data. For many database applications, the majority of database transactions can be constructed to avoid requiring high isolation levels (e.g. SERIALIZABLE level), thus reducing the locking overhead for the system. The programmer must carefully analyze database access code to ensure that any relaxation of isolation does not cause software bugs that are difficult to find. Conversely, if higher isolation levels are used, the possibility of deadlock
is increased, which also requires careful analysis and programming techniques to avoid.
The isolation levels defined by the ANSI
/ISO
SQL
standard are listed as follows.
With a lock-based concurrency control
DBMS implementation, serializability
requires read and write locks (acquired on selected data) to be released at the end of the transaction. Also range-locks must be acquired when a SELECT
query uses a ranged WHERE clause, especially to avoid the phantom reads phenomenon (see below).
When using non-lock based concurrency control, no locks are acquired; however, if the system detects a write collision among several concurrent transactions, only one of them is allowed to commit. See snapshot isolation
for more details on this topic.
DBMS implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so the phantom reads phenomenon can occur (see below).
DBMS implementation keeps write locks (acquired on selected data) until the end of the transaction, but read locks are released as soon as the SELECT
operation is performed (so the non-repeatable reads phenomenon can occur in this isolation level, as discussed below). As in the previous level, range-locks are not managed.
's varies quite widely. Most databases that feature transactions allow the user to set any isolation level. Some DBMS's also require additional syntax when performing a SELECT statement to acquire locks (e.g. SELECT ... FOR UPDATE to acquire exclusive write locks on accessed rows).
However, the definitions above have been criticised in the paper A Critique of ANSI SQL Isolation Levels as being ambiguous, and as not accurately reflecting the isolation provided by many databases:
There are also other criticisms concerning ANSI SQL's isolation definition, in that it encourages implementors to do "bad things":
In the following examples, two transactions take place. In the first, Query 1 is performed. Then, in the second transaction, Query 2 is performed and committed. Finally, in the first transaction, Query 1 is performed again.
The queries use the following data table:
Dirty reads work similarly to non-repeatable reads; however, the second transaction would not need to be committed for the first query to return a different result. The only thing that may be prevented in the READ UNCOMMITTED isolation level is updates appearing out of order in the results; that is, earlier updates will always appear in a result set before later updates.
In our example, Transaction 2 changes a row, but does not commit the changes. Transaction 1 then reads the uncommitted data. Now if Transaction 2 rolls back his changes (already read by Transaction 1) or updates different changes to the database, then the view of the data may be wrong in the records of Transaction 1.
Non-repeatable reads phenomenon may occur in a lock-based concurrency control method when read locks are not acquired when performing a SELECT
, or when the acquired locks on affected rows are released as soon as the SELECT
operation is performed. Under the multiversion concurrency control
method, non-repeatable reads may occur when the requirement that a transaction affected by a commit conflict must roll back is relaxed.
In this example, Transaction 2 commits successfully, which means that its changes to the row with id 1 should become visible. However, Transaction 1 has already seen a different value for age in that row. At the SERIALIZABLE and REPEATABLE READ isolation levels, the DBMS must return the old value. At READ COMMITTED and READ UNCOMMITTED, the DBMS may return the updated value; this is a non-repeatable read.
There are two basic strategies used to prevent non-repeatable reads. The first is to delay the execution of Transaction 2 until Transaction 1 has committed or rolled back. This method is used when locking is used, and produces the serial schedule
T1, T2. A serial schedule does not exhibit non-repeatable reads behaviour.
In the other strategy, as used in multiversion concurrency control
, Transaction 2 is permitted to commit first, which provides for better concurrency. However, Transaction 1, which commenced prior to Transaction 2, must continue to operate on a past version of the database — a snapshot of the moment it was started. When Transaction 1 eventually tries to commit, the DBMS checks if the result of committing Transaction 1 would be equivalent to the schedule T1, T2. If it is, then Transaction 1 can proceed. If it cannot be seen to be equivalent, however, Transaction 1 must roll back with a serialization failure.
Using a lock-based concurrency control method, at the REPEATABLE READ isolation mode, the row with ID = 1 would be locked, thus blocking Query 2 until the first transaction was committed or rolled back. In READ COMMITTED mode, the second time Query 1 was executed, the age would have changed.
Under multiversion concurrency control, at the SERIALIZABLE isolation level, both SELECT queries see a snapshot of the database taken at the start of Transaction 1. Therefore, they return the same data. However, if Transaction 1 then attempted to UPDATE that row as well, a serialization failure would occur and Transaction 1 would be forced to roll back.
At the READ COMMITTED isolation level, each query sees a snapshot of the database taken at the start of each query. Therefore, they each see different data for the updated row. No serialization failure is possible in this mode (because no promise of serializability is made), and Transaction 1 will not have to be retried.
This can occur when range locks are not acquired on performing a SELECT
... WHERE operation.
The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT
... WHERE query and, in the middle of both operations, Transaction 2 creates (i.e. INSERT
) new rows (in the target table) fulfilling that WHERE clause.
Note that Transaction 1 executed the same query twice. If the highest level of isolation were maintained, the same set of rows should be returned both times, and indeed that is what is mandated to occur in a database operating at the SQL SERIALIZABLE isolation level. However, at the lesser isolation levels, a different set of rows may be returned the second time.
In the SERIALIZABLE isolation mode, Query 1 would result in all records with age in the range 10 to 30 being locked, thus Query 2 would block until the first transaction was committed. In REPEATABLE READ mode, the range would not be locked, allowing the record to be inserted and the second execution of Query 1 to include the new row in its results.
"may occur" means that the isolation level suffers that phenomenon, while "-" means that it does not suffer it.
"V" indicates that the method locks for that operation, keeping that lock till the end of the transaction containing that operation.
Note: Read (i.e. SELECT
) operations can acquire read (shared) locks in the Read Committed isolation level, but they are released immediately after the read operation is performed.
Database
A database is an organized collection of data for one or more purposes, usually in digital form. The data are typically organized to model relevant aspects of reality , in a way that supports processes requiring this information...
systems, isolation is a property that defines how/when the changes made by one operation become visible to other concurrent operations. Isolation is one of the ACID
ACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...
(Atomicity, Consistency, Isolation, Durability) properties.
Isolation levels
Of the four ACIDACID
In computer science, ACID is a set of properties that guarantee database transactions are processed reliably. In the context of databases, a single logical operation on the data is called a transaction...
properties in a DBMS
Database management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...
(Database Management System), the isolation property is the one most often relaxed. When attempting to maintain the highest level of isolation, a DBMS usually acquires locks
Lock (database)
A lock is used when multiple users need to access a database concurrently. This prevents data from being corrupted or invalidated when multiple users try to write to the database. Any single user can only modify those database records to which they have applied a lock that gives them exclusive...
on data or implements multiversion concurrency control
Multiversion concurrency control
Multiversion concurrency control , in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will...
, which may result in a loss of concurrency
Concurrency (computer science)
In computer science, concurrency is a property of systems in which several computations are executing simultaneously, and potentially interacting with each other...
. This requires adding additional logic for the application to function correctly.
Most DBMS's offer a number of transaction isolation levels, which control the degree of locking that occurs when selecting data. For many database applications, the majority of database transactions can be constructed to avoid requiring high isolation levels (e.g. SERIALIZABLE level), thus reducing the locking overhead for the system. The programmer must carefully analyze database access code to ensure that any relaxation of isolation does not cause software bugs that are difficult to find. Conversely, if higher isolation levels are used, the possibility of deadlock
Deadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...
is increased, which also requires careful analysis and programming techniques to avoid.
The isolation levels defined by the ANSI
American National Standards Institute
The American National Standards Institute is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organization also coordinates U.S. standards with international...
/ISO
International Organization for Standardization
The International Organization for Standardization , widely known as ISO, is an international standard-setting body composed of representatives from various national standards organizations. Founded on February 23, 1947, the organization promulgates worldwide proprietary, industrial and commercial...
SQL
SQL
SQL is a programming language designed for managing data in relational database management systems ....
standard are listed as follows.
SERIALIZABLE
This is the highest isolation level. It specifies that all transactions occur in a completely isolated fashion, or, in other words, as if all transactions in the system had executed serially, one after the other. The DBMS may execute two or more transactions at the same time only if the illusion of serial execution can be maintained.With a lock-based concurrency control
Concurrency control
In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...
DBMS implementation, serializability
Serializability
In concurrency control of databases, transaction processing , and various transactional applications , both centralized and distributed, a transaction schedule is serializable, has the serializability property, if its outcome In concurrency control of databases, transaction processing (transaction...
requires read and write locks (acquired on selected data) to be released at the end of the transaction. Also range-locks must be acquired when a SELECT
Select
Select or SELECT may refer to:* Select , an album by Kim Wilde* Select , a British music magazine* Select , a keyword in SQL* select , a system call for polling multiple file descriptors...
query uses a ranged WHERE clause, especially to avoid the phantom reads phenomenon (see below).
When using non-lock based concurrency control, no locks are acquired; however, if the system detects a write collision among several concurrent transactions, only one of them is allowed to commit. See snapshot isolation
Snapshot isolation
In databases, and transaction processing , snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database , and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates...
for more details on this topic.
REPEATABLE READS
In this isolation level, a lock-based concurrency controlConcurrency control
In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...
DBMS implementation keeps read and write locks (acquired on selected data) until the end of the transaction. However, range-locks are not managed, so the phantom reads phenomenon can occur (see below).
READ COMMITTED
In this isolation level, a lock-based concurrency controlConcurrency control
In information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...
DBMS implementation keeps write locks (acquired on selected data) until the end of the transaction, but read locks are released as soon as the SELECT
Select
Select or SELECT may refer to:* Select , an album by Kim Wilde* Select , a British music magazine* Select , a keyword in SQL* select , a system call for polling multiple file descriptors...
operation is performed (so the non-repeatable reads phenomenon can occur in this isolation level, as discussed below). As in the previous level, range-locks are not managed.
READ UNCOMMITTED
This is the lowest isolation level. In this level, dirty reads are allowed (see below), so one transaction may see not-yet-committed changes made by other transactions.Default isolation level
The default isolation level of different DBMSDatabase management system
A database management system is a software package with computer programs that control the creation, maintenance, and use of a database. It allows organizations to conveniently develop databases for various applications by database administrators and other specialists. A database is an integrated...
's varies quite widely. Most databases that feature transactions allow the user to set any isolation level. Some DBMS's also require additional syntax when performing a SELECT statement to acquire locks (e.g. SELECT ... FOR UPDATE to acquire exclusive write locks on accessed rows).
However, the definitions above have been criticised in the paper A Critique of ANSI SQL Isolation Levels as being ambiguous, and as not accurately reflecting the isolation provided by many databases:
- This paper shows a number of weaknesses in the anomaly approach to defining isolation levels. The three ANSI phenomena are ambiguous. Even their broadest interpretations do not exclude anomalous behavior. This leads to some counter-intuitive results. In particular, lock-based isolation levels have different characteristics than their ANSI equivalents. This is disconcerting because commercial database systems typically use locking. Additionally, the ANSI phenomena do not distinguish among several isolation levels popular in commercial systems.
There are also other criticisms concerning ANSI SQL's isolation definition, in that it encourages implementors to do "bad things":
- ... it relies in subtle ways on an assumption that a locking schema is used for concurrency control, as opposed to an optimistic or multi-version concurrency scheme. This implies that the proposed semantics are ill-defined.
Read phenomena
The ANSI/ISO standard SQL 92 refers to three different read phenomena when Transaction 1 reads data that Transaction 2 might have changed.In the following examples, two transactions take place. In the first, Query 1 is performed. Then, in the second transaction, Query 2 is performed and committed. Finally, in the first transaction, Query 1 is performed again.
The queries use the following data table:
id | name | age |
---|---|---|
1 | Joe | 20 |
2 | Jill | 25 |
Dirty reads
A dirty read occurs when a transaction is allowed to read data from a row that has been modified by another running transaction and not yet committed.Dirty reads work similarly to non-repeatable reads; however, the second transaction would not need to be committed for the first query to return a different result. The only thing that may be prevented in the READ UNCOMMITTED isolation level is updates appearing out of order in the results; that is, earlier updates will always appear in a result set before later updates.
In our example, Transaction 2 changes a row, but does not commit the changes. Transaction 1 then reads the uncommitted data. Now if Transaction 2 rolls back his changes (already read by Transaction 1) or updates different changes to the database, then the view of the data may be wrong in the records of Transaction 1.
Transaction 1 | Transaction 2 |
---|---|
Non-repeatable reads
A non-repeatable read occurs, when during the course of a transaction, a row is retrieved twice and the values within the row differ between reads.Non-repeatable reads phenomenon may occur in a lock-based concurrency control method when read locks are not acquired when performing a SELECT
Select
Select or SELECT may refer to:* Select , an album by Kim Wilde* Select , a British music magazine* Select , a keyword in SQL* select , a system call for polling multiple file descriptors...
, or when the acquired locks on affected rows are released as soon as the SELECT
Select
Select or SELECT may refer to:* Select , an album by Kim Wilde* Select , a British music magazine* Select , a keyword in SQL* select , a system call for polling multiple file descriptors...
operation is performed. Under the multiversion concurrency control
Multiversion concurrency control
Multiversion concurrency control , in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will...
method, non-repeatable reads may occur when the requirement that a transaction affected by a commit conflict must roll back is relaxed.
Transaction 1 | Transaction 2 |
---|---|
In this example, Transaction 2 commits successfully, which means that its changes to the row with id 1 should become visible. However, Transaction 1 has already seen a different value for age in that row. At the SERIALIZABLE and REPEATABLE READ isolation levels, the DBMS must return the old value. At READ COMMITTED and READ UNCOMMITTED, the DBMS may return the updated value; this is a non-repeatable read.
There are two basic strategies used to prevent non-repeatable reads. The first is to delay the execution of Transaction 2 until Transaction 1 has committed or rolled back. This method is used when locking is used, and produces the serial schedule
Schedule (computer science)
In the fields of databases and transaction processing , a schedule of a system is an abstract model to describe execution of transactions running in the system. Often it is a list of operations ordered by time, performed by a set of transactions that are executed together in the system...
T1, T2. A serial schedule does not exhibit non-repeatable reads behaviour.
In the other strategy, as used in multiversion concurrency control
Multiversion concurrency control
Multiversion concurrency control , in the database field of computer science, is a concurrency control method commonly used by database management systems to provide concurrent access to the database and in programming languages to implement transactional memory.For instance, a database will...
, Transaction 2 is permitted to commit first, which provides for better concurrency. However, Transaction 1, which commenced prior to Transaction 2, must continue to operate on a past version of the database — a snapshot of the moment it was started. When Transaction 1 eventually tries to commit, the DBMS checks if the result of committing Transaction 1 would be equivalent to the schedule T1, T2. If it is, then Transaction 1 can proceed. If it cannot be seen to be equivalent, however, Transaction 1 must roll back with a serialization failure.
Using a lock-based concurrency control method, at the REPEATABLE READ isolation mode, the row with ID = 1 would be locked, thus blocking Query 2 until the first transaction was committed or rolled back. In READ COMMITTED mode, the second time Query 1 was executed, the age would have changed.
Under multiversion concurrency control, at the SERIALIZABLE isolation level, both SELECT queries see a snapshot of the database taken at the start of Transaction 1. Therefore, they return the same data. However, if Transaction 1 then attempted to UPDATE that row as well, a serialization failure would occur and Transaction 1 would be forced to roll back.
At the READ COMMITTED isolation level, each query sees a snapshot of the database taken at the start of each query. Therefore, they each see different data for the updated row. No serialization failure is possible in this mode (because no promise of serializability is made), and Transaction 1 will not have to be retried.
Phantom reads
A phantom read occurs when, in the course of a transaction, two identical queries are executed, and the collection of rows returned by the second query is different from the first.This can occur when range locks are not acquired on performing a SELECT
Select
Select or SELECT may refer to:* Select , an album by Kim Wilde* Select , a British music magazine* Select , a keyword in SQL* select , a system call for polling multiple file descriptors...
... WHERE operation.
The phantom reads anomaly is a special case of Non-repeatable reads when Transaction 1 repeats a ranged SELECT
Select
Select or SELECT may refer to:* Select , an album by Kim Wilde* Select , a British music magazine* Select , a keyword in SQL* select , a system call for polling multiple file descriptors...
... WHERE query and, in the middle of both operations, Transaction 2 creates (i.e. INSERT
Insert
Insert may refer to:*Insert *Insert *Insert *Insert *Insert key on a computer keyboard, used to switch between insert mode and overstrike mode*Insert *Insert *Fireplace insert...
) new rows (in the target table) fulfilling that WHERE clause.
Transaction 1 | Transaction 2 |
---|---|
Note that Transaction 1 executed the same query twice. If the highest level of isolation were maintained, the same set of rows should be returned both times, and indeed that is what is mandated to occur in a database operating at the SQL SERIALIZABLE isolation level. However, at the lesser isolation levels, a different set of rows may be returned the second time.
In the SERIALIZABLE isolation mode, Query 1 would result in all records with age in the range 10 to 30 being locked, thus Query 2 would block until the first transaction was committed. In REPEATABLE READ mode, the range would not be locked, allowing the record to be inserted and the second execution of Query 1 to include the new row in its results.
Isolation Levels vs Read Phenomena
Isolation level | Dirty reads | Non-repeatable reads | Phantoms |
---|---|---|---|
Read Uncommitted | may occur | may occur | may occur |
Read Committed | - | may occur | may occur |
Repeatable Read | - | - | may occur |
Serializable | - | - | - |
"may occur" means that the isolation level suffers that phenomenon, while "-" means that it does not suffer it.
Isolation Levels vs Locks
Isolation level | Write Lock | Read Lock | Range Lock |
---|---|---|---|
Read Uncommitted | - | - | - |
Read Committed | V | - | - |
Repeatable Read | V | V | - |
Serializable | V | V | V |
"V" indicates that the method locks for that operation, keeping that lock till the end of the transaction containing that operation.
Note: Read (i.e. SELECT
Select
Select or SELECT may refer to:* Select , an album by Kim Wilde* Select , a British music magazine* Select , a keyword in SQL* select , a system call for polling multiple file descriptors...
) operations can acquire read (shared) locks in the Read Committed isolation level, but they are released immediately after the read operation is performed.
See also
- Atomicity
- Consistency
- Durability
- Lock (database)Lock (database)A lock is used when multiple users need to access a database concurrently. This prevents data from being corrupted or invalidated when multiple users try to write to the database. Any single user can only modify those database records to which they have applied a lock that gives them exclusive...
- Optimistic concurrency controlOptimistic concurrency controlIn the field of relational database management systems, optimistic concurrency control is a concurrency control method that assumes that multiple transactions can complete without affecting each other, and that therefore transactions can proceed without locking the data resources that they affect...
- Relational Database Management SystemRelational database management systemA relational database management system is a database management system that is based on the relational model as introduced by E. F. Codd. Most popular databases currently in use are based on the relational database model....
- Snapshot isolationSnapshot isolationIn databases, and transaction processing , snapshot isolation is a guarantee that all reads made in a transaction will see a consistent snapshot of the database , and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates...