Non-blocking synchronization
Encyclopedia
In computer science
, a non-blocking algorithm ensures that threads competing for a shared resource
do not have their execution
indefinitely postponed by mutual exclusion
. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress
; wait-free if there is also guaranteed per-thread progress.
Literature up to the turn of the 21st century used "non-blocking" synonymously with lock-free. However, since 2003, the term has been weakened to only prevent progress-blocking interactions with a preemptive scheduler
. In modern usage, therefore, an algorithm is non-blocking if the suspension of one or more threads will not stop the potential progress of the remaining threads. They are designed to avoid requiring a critical section
. Often, these algorithms allow multiple processes to make progress on a problem without ever blocking each other. For some operations, these algorithms provide an alternative to locking mechanism
s.
to synchronize access to shared resources. Synchronization primitives such as mutexes
, semaphores
, and critical section
s are all mechanisms by which a programmer can ensure that certain sections of code do not execute concurrently if doing so would corrupt shared memory structures. If one thread attempts to acquire a lock that is already held by another thread, the thread will block until the lock is free.
Blocking a thread is undesirable for many reasons. An obvious reason is that while the thread is blocked, it cannot accomplish anything. If the blocked thread is performing a high-priority or real-time
task, it is highly undesirable to halt its progress. Other problems are less obvious. Certain interactions between locks can lead to error conditions such as deadlock
, livelock, and priority inversion
. Using locks also involves a trade-off between coarse-grained locking, which can significantly reduce opportunities for parallelism
, and fine-grained locking, which requires more careful design, increases locking overhead and is more prone to bugs.
Non-blocking algorithms are also safe for use in interrupt handler
s: even though the preempted thread cannot be resumed, progress is still possible without it. In contrast, global data structures protected by mutual exclusion cannot safely be accessed in a handler, as the preempted thread may be the one holding the lock.
Non-blocking algorithms have the potential to prevent priority inversion
, as no thread is forced to wait for a suspended thread to complete. However, as livelock is still possible, threads have to wait when they encounter contention; hence, priority inversion is still possible depending upon the contention management system used. Lock-free algorithms, below, avoid priority inversion.
read-modify-write
primitives that the hardware must provide, the most notable of which is compare and swap (CAS)
. Critical section
s are almost always implemented using standard interfaces over these primitives. Until recently, all non-blocking algorithms had to be written "natively" with the underlying primitives to achieve acceptable performance. However, the emerging field of software transactional memory
promises standard abstractions for writing efficient non-blocking code.
Much research has also been done in providing basic data structure
s such as stacks
, queues, sets
, and hash table
s. These allow programs to easily exchange data between threads asynchronously.
Additionally, some data structures are weak enough to be implemented without special atomic primitives. These exceptions include:
-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.
It was shown in the 1980s that all algorithms can be implemented wait-free, and many transformations from serial code, called universal constructions, have been demonstrated. However, the resulting performance does not in general match even naïve blocking designs. It has also been shown that the widely available atomic conditional primitives, CAS
and LL/SC
, cannot provide starvation-free implementations of many common data structures without memory costs growing linearly in the number of threads. Wait-free algorithms are therefore rare, both in research and in practice.
progress (for some sensible definition of progress). All wait-free algorithms are lock-free.
In general, a lock-free algorithm can run in four phases: completing one's own operation, assisting an obstructing operation, aborting an obstructing operation, and waiting. Completing one's own operation is complicated by the possibility of concurrent assistance and abortion, but is invariably the fastest path to completion.
The decision about when to assist, abort or wait when an obstruction is met is the responsibility of a contention manager. This may be very simple (assist higher priority operations, abort lower priority ones), or may be more optimized to achieve better throughput, or lower the latency of prioritized operations.
Correct concurrent assistance is typically the most complex part of a lock-free algorithm, and often very costly to execute: not only does the assisting thread slow down, but thanks to the mechanics of shared memory, the thread being assisted will be slowed, too, if it is still running.
Obstruction-freedom demands only that any partially completed operation can be aborted and the changes made rolled back. Dropping concurrent assistance can often result in much simpler algorithms that are easier to validate. Preventing the system from continually live-locking is the task of a contention manager.
Obstruction-freedom is also called optimistic concurrency control
.
Some obstruction-free algorithms use a pair of "consistency markers" in the data structure. Processes reading the data structure first read one consistency marker, then read the relevant data into an internal buffer, then read the other marker, and then compare the markers. The data is consistent if the two markers are identical. Markers may be non-identical when the read is interrupted by another process updating the data structure. In such a case, the process discards the data in the internal buffer and tries again.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a non-blocking algorithm ensures that threads competing for a shared resource
Resource (computer science)
A resource, or system resource, is any physical or virtual component of limited availability within a computer system. Every device connected to a computer system is a resource. Every internal system component is a resource...
do not have their execution
Execution (computers)
Execution in computer and software engineering is the process by which a computer or a virtual machine carries out the instructions of a computer program. The instructions in the program trigger sequences of simple actions on the executing machine...
indefinitely postponed by mutual exclusion
Mutual exclusion
Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...
. A non-blocking algorithm is lock-free if there is guaranteed system-wide progress
Resource starvation
In computer science, starvation is a multitasking-related problem, where a process is perpetually denied necessary resources. Without those resources, the program can never finish its task....
; wait-free if there is also guaranteed per-thread progress.
Literature up to the turn of the 21st century used "non-blocking" synonymously with lock-free. However, since 2003, the term has been weakened to only prevent progress-blocking interactions with a preemptive scheduler
Computer multitasking
In computing, multitasking is a method where multiple tasks, also known as processes, share common processing resources such as a CPU. In the case of a computer with a single CPU, only one task is said to be running at any point in time, meaning that the CPU is actively executing instructions for...
. In modern usage, therefore, an algorithm is non-blocking if the suspension of one or more threads will not stop the potential progress of the remaining threads. They are designed to avoid requiring a critical section
Critical section
In concurrent programming a critical section is a piece of code that accesses a shared resource that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will have to wait a fixed time to...
. Often, these algorithms allow multiple processes to make progress on a problem without ever blocking each other. For some operations, these algorithms provide an alternative to locking mechanism
Lock (computer science)
In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies.-Types:...
s.
Motivation
The traditional approach to multi-threaded programming is to use locksLock (computer science)
In computer science, a lock is a synchronization mechanism for enforcing limits on access to a resource in an environment where there are many threads of execution. Locks are one way of enforcing concurrency control policies.-Types:...
to synchronize access to shared resources. Synchronization primitives such as mutexes
Mutual exclusion
Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...
, semaphores
Semaphore (programming)
In computer science, a semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment....
, and critical section
Critical section
In concurrent programming a critical section is a piece of code that accesses a shared resource that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will have to wait a fixed time to...
s are all mechanisms by which a programmer can ensure that certain sections of code do not execute concurrently if doing so would corrupt shared memory structures. If one thread attempts to acquire a lock that is already held by another thread, the thread will block until the lock is free.
Blocking a thread is undesirable for many reasons. An obvious reason is that while the thread is blocked, it cannot accomplish anything. If the blocked thread is performing a high-priority or real-time
Real-time computing
In computer science, real-time computing , or reactive computing, is the study of hardware and software systems that are subject to a "real-time constraint"— e.g. operational deadlines from event to system response. Real-time programs must guarantee response within strict time constraints...
task, it is highly undesirable to halt its progress. Other problems are less obvious. Certain interactions between locks can lead to error conditions such as deadlock
Deadlock
A deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...
, livelock, and priority inversion
Priority inversion
In computer science, priority inversion is a problematic scenario in scheduling when a higher priority task is indirectly preempted by a lower priority task effectively "inverting" the relative priorities of the two tasks....
. Using locks also involves a trade-off between coarse-grained locking, which can significantly reduce opportunities for parallelism
Parallel computing
Parallel computing is a form of computation in which many calculations are carried out simultaneously, operating on the principle that large problems can often be divided into smaller ones, which are then solved concurrently . There are several different forms of parallel computing: bit-level,...
, and fine-grained locking, which requires more careful design, increases locking overhead and is more prone to bugs.
Non-blocking algorithms are also safe for use in interrupt handler
Interrupt handler
An interrupt handler, also known as an interrupt service routine , is a callback subroutine in microcontroller firmware, operating system or device driver whose execution is triggered by the reception of an interrupt...
s: even though the preempted thread cannot be resumed, progress is still possible without it. In contrast, global data structures protected by mutual exclusion cannot safely be accessed in a handler, as the preempted thread may be the one holding the lock.
Non-blocking algorithms have the potential to prevent priority inversion
Priority inversion
In computer science, priority inversion is a problematic scenario in scheduling when a higher priority task is indirectly preempted by a lower priority task effectively "inverting" the relative priorities of the two tasks....
, as no thread is forced to wait for a suspended thread to complete. However, as livelock is still possible, threads have to wait when they encounter contention; hence, priority inversion is still possible depending upon the contention management system used. Lock-free algorithms, below, avoid priority inversion.
Implementation
With few exceptions, non-blocking algorithms use atomicLinearizability
In concurrent programming, an operation is atomic, linearizable, indivisible or uninterruptible if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes...
read-modify-write
Read-modify-write
In computer science, read-modify-write is a class of atomic operations such as test-and-set, fetch-and-add, and compare-and-swap which both read a memory location and write a new value into it simultaneously, either with a completely new value or some function of the previous value. These...
primitives that the hardware must provide, the most notable of which is compare and swap (CAS)
Compare-and-swap
In computer science, the compare-and-swap CPU instruction is a special instruction that atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value...
. Critical section
Critical section
In concurrent programming a critical section is a piece of code that accesses a shared resource that must not be concurrently accessed by more than one thread of execution. A critical section will usually terminate in fixed time, and a thread, task or process will have to wait a fixed time to...
s are almost always implemented using standard interfaces over these primitives. Until recently, all non-blocking algorithms had to be written "natively" with the underlying primitives to achieve acceptable performance. However, the emerging field of software transactional memory
Software transactional memory
In computer science, software transactional memory is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. A transaction in this context is a piece of code that...
promises standard abstractions for writing efficient non-blocking code.
Much research has also been done in providing basic data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
s such as stacks
Stack (data structure)
In computer science, a stack is a last in, first out abstract data type and linear data structure. A stack can have any abstract data type as an element, but is characterized by only three fundamental operations: push, pop and stack top. The push operation adds a new item to the top of the stack,...
, queues, sets
Set (computer science)
In computer science, a set is an abstract data structure that can store certain values, without any particular order, and no repeated values. It is a computer implementation of the mathematical concept of a finite set...
, and hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
s. These allow programs to easily exchange data between threads asynchronously.
Additionally, some data structures are weak enough to be implemented without special atomic primitives. These exceptions include:
- single-reader single-writer ring bufferCircular bufferA circular buffer, cyclic buffer or ring buffer is a data structure that uses a single, fixed-size buffer as if it were connected end-to-end.This structure lends itself easily to buffering data streams.-Uses:...
FIFOFIFOFIFO is an acronym for First In, First Out, an abstraction related to ways of organizing and manipulation of data relative to time and prioritization... - Read-copy-updateRead-copy-updateIn computer operating systems, read-copy-update is a synchronization mechanism implementing a kind of mutual exclusionPlease note that RCU does not implement mutual exclusion in the conventional sense: RCU readers can and do run concurrently with RCU updates...
with a single writer and any number of readers. (The readers are wait-free; the writer is usually lock-free, until it needs to reclaim memory).
Wait-freedom
Wait-freedom is the strongest non-blocking guarantee of progress, combining guaranteed system-wide throughput with starvationResource starvation
In computer science, starvation is a multitasking-related problem, where a process is perpetually denied necessary resources. Without those resources, the program can never finish its task....
-freedom. An algorithm is wait-free if every operation has a bound on the number of steps the algorithm will take before the operation completes.
It was shown in the 1980s that all algorithms can be implemented wait-free, and many transformations from serial code, called universal constructions, have been demonstrated. However, the resulting performance does not in general match even naïve blocking designs. It has also been shown that the widely available atomic conditional primitives, CAS
Compare-and-swap
In computer science, the compare-and-swap CPU instruction is a special instruction that atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value...
and LL/SC
Load-Link/Store-Conditional
In computer science, load-link and store-conditional are a pair of instructions that together implement a lock-free atomic read-modify-write operation....
, cannot provide starvation-free implementations of many common data structures without memory costs growing linearly in the number of threads. Wait-free algorithms are therefore rare, both in research and in practice.
Lock-freedom
Lock-freedom allows individual threads to starve but guarantees system-wide throughput. An algorithm is lock-free if it satisfies that when the program threads are run sufficiently long at least one of the threads makesprogress (for some sensible definition of progress). All wait-free algorithms are lock-free.
In general, a lock-free algorithm can run in four phases: completing one's own operation, assisting an obstructing operation, aborting an obstructing operation, and waiting. Completing one's own operation is complicated by the possibility of concurrent assistance and abortion, but is invariably the fastest path to completion.
The decision about when to assist, abort or wait when an obstruction is met is the responsibility of a contention manager. This may be very simple (assist higher priority operations, abort lower priority ones), or may be more optimized to achieve better throughput, or lower the latency of prioritized operations.
Correct concurrent assistance is typically the most complex part of a lock-free algorithm, and often very costly to execute: not only does the assisting thread slow down, but thanks to the mechanics of shared memory, the thread being assisted will be slowed, too, if it is still running.
Obstruction-freedom
Obstruction-freedom is possibly the weakest natural non-blocking progress guarantee. An algorithm is obstruction-free if at any point, a single thread executed in isolation (i.e., with all obstructing threads suspended) for a bounded number of steps will complete its operation. All lock-free algorithms are obstruction-free.Obstruction-freedom demands only that any partially completed operation can be aborted and the changes made rolled back. Dropping concurrent assistance can often result in much simpler algorithms that are easier to validate. Preventing the system from continually live-locking is the task of a contention manager.
Obstruction-freedom is also called optimistic concurrency control
Optimistic concurrency control
In the field of relational database management systems, optimistic concurrency control is a concurrency control method that assumes that multiple transactions can complete without affecting each other, and that therefore transactions can proceed without locking the data resources that they affect...
.
Some obstruction-free algorithms use a pair of "consistency markers" in the data structure. Processes reading the data structure first read one consistency marker, then read the relevant data into an internal buffer, then read the other marker, and then compare the markers. The data is consistent if the two markers are identical. Markers may be non-identical when the read is interrupted by another process updating the data structure. In such a case, the process discards the data in the internal buffer and tries again.
See also
- Concurrent data structureConcurrent data structureIn computer science, a concurrent data structure is aparticular way of storing and organizing data for access bymultiple computing threads on a computer.Historically, such data structures were used on uniprocessor...
- ABA problemABA problemIn multithreaded computing, the ABA problem occurs during synchronization, when a location is read twice, has the same value for both reads, and "value is the same" is used to indicate "nothing has changed"...
- Compare-and-swapCompare-and-swapIn computer science, the compare-and-swap CPU instruction is a special instruction that atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value...
- Concurrency controlConcurrency controlIn information technology and computer science, especially in the fields of computer programming , operating systems , multiprocessors, and databases, concurrency control ensures that correct results for concurrent operations are generated, while getting those results as quickly as possible.Computer...
- DeadlockDeadlockA deadlock is a situation where in two or more competing actions are each waiting for the other to finish, and thus neither ever does. It is often seen in a paradox like the "chicken or the egg"...
- LinearizabilityLinearizabilityIn concurrent programming, an operation is atomic, linearizable, indivisible or uninterruptible if it appears to the rest of the system to occur instantaneously. Atomicity is a guarantee of isolation from concurrent processes...
- Load-Link/Store-ConditionalLoad-Link/Store-ConditionalIn computer science, load-link and store-conditional are a pair of instructions that together implement a lock-free atomic read-modify-write operation....
- Lock (software engineering)
- Memory barrierMemory barrierMemory barrier, also known as membar or memory fence or fence instruction, is a type of barrier and a class of instruction which causes a central processing unit or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.CPUs employ...
- Mutual exclusionMutual exclusionMutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...
- Pre-emptive multitasking
- Priority inversionPriority inversionIn computer science, priority inversion is a problematic scenario in scheduling when a higher priority task is indirectly preempted by a lower priority task effectively "inverting" the relative priorities of the two tasks....
- Read-copy-updateRead-copy-updateIn computer operating systems, read-copy-update is a synchronization mechanism implementing a kind of mutual exclusionPlease note that RCU does not implement mutual exclusion in the conventional sense: RCU readers can and do run concurrently with RCU updates...
- Resource starvationResource starvationIn computer science, starvation is a multitasking-related problem, where a process is perpetually denied necessary resources. Without those resources, the program can never finish its task....
- Room synchronizationRoom synchronizationThe room synchronization technique is a form of concurrency control in computer science.The room synchronization problem involves supporting a set of m mutually exclusive "rooms" where any number of users can execute code simultaneously in a shared room , but no two users can simultaneously execute...
- Software transactional memorySoftware transactional memoryIn computer science, software transactional memory is a concurrency control mechanism analogous to database transactions for controlling access to shared memory in concurrent computing. It is an alternative to lock-based synchronization. A transaction in this context is a piece of code that...
- Partitioned global address spacePartitioned global address spaceIn computer science, a partitioned global address space is a parallel programming model. It assumes a global memory address space that is logically partitioned and a portion of it is local to each processor. The novelty of PGAS is that the portions of the shared memory space may have an affinity...
External links
- Article "Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms" by Maged M. Michael and Michael L. ScottMichael L. ScottMichael Lee Scott is a professor of computer science at the University of Rochester in Rochester, New York.-Education and teaching:...
- Discussion "Communication between Threads, without blocking"
- Survey "Some Notes on Lock-Free and Wait-Free Algorithms" by Ross Bencina – supports lock-free and thread-safe programming on single variables
- System.Threading.Interlocked - Provides atomic operations for variables that are shared by multiple threads (.NET Framework)
- The Jail-Ust Container Library
- Practical lock-free data structures
- Thesis "Efficient and Practical Non-Blocking Data Structures" (1414 KB) by Per Håkan SundellPer Håkan SundellPer Håkan Sundell is a programmer and computer scientist with roots in the scene and early computer enthusiasts of the eighties, when he was known as PHS of CCS .- Biography :Håkan currently holds a Ph.D...
- WARPing - Wait-free techniques for Real-time Processing
- Non-blocking Synchronization: Algorithms and Performance Evaluation. (1926 KB) by Yi Zhang
- "Design and verification of lock-free parallel algorithms" by Hui Gao
- "Asynchronous Data Sharing in Multiprocessor Real-Time Systems Using Process Consensus" by Jing Chen and Alan BurnsAlan BurnsProfessor Alan Burns FREng FIET FBCS SMIEEE CEng is a professor in the Computer Science Department at the University of York, England. He has been at the University of York since 1990, and held the post of Head of Department from 1999 until 30 June 2006, when he was succeeded by John McDermid.He is...
- Discussion "lock-free versus lock-based algorithms"
- Atomic Ptr Plus Project - collection of various lock-free synchronization primitives
- AppCore: A Portable High-Performance Thread Synchronization Library - An Effective Marriage between Lock-Free and Lock-Based Algorithms
- WaitFreeSynchronization and LockFreeSynchronization at the Portland Pattern Repository
- Multiplatform library with atomic operations
- A simple C++ lock-free LIFO implementation
- 1024cores - a site devoted to lock-free, wait-free, obstruction-free and just scalable non-blocking synchronization algorithms and related topics