Test-and-set
Encyclopedia
In computer science
, the test-and-set instruction is an instruction used to write to a memory location and return its old value as a single atomic (i.e., non-interruptible) operation. If multiple processes may access the same memory, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process is done. CPU
s may use test-and-set instructions offered by other electronic components, such as dual-port RAM; CPUs may also offer a test-and-set instruction themselves.
A lock can be built using an atomic test-and-set instruction as follows:
function Lock(boolean *lock) {
while (test_and_set (lock) 1)
;
}
The calling process obtains the lock if the old value was 0. It spins writing 1 to the variable until this occurs.
Maurice Herlihy
(1991) proved that test-and-set has a finite consensus number, in contrast to the compare-and-swap
operation. The test-and-set operation can solve the wait-free consensus problem
for no more than two concurrent processes. However, more than two decades before Herlihy's proof, IBM
had replaced test-and-set by compare-and-swap, which is a more general solution to this problem. Ultimately, IBM would release a processor family with 12 processors, whereas Amdahl
would release a processor family with the architectural maximum of 16 processors.
Hardware implementation of test-and-set
DPRAM test-and-set instructions can work in many ways. Here are two variations, both of which describe a DPRAM which provides exactly 2 ports, allowing 2 separate electronic components (such as 2 CPUs) access to every memory location on the DPRAM.
or spinlock
using the interrupt mechanism. Since this all happens at hardware speeds, CPU 2's wait to get out of the spin-lock is very short.
Whether or not CPU 2 was trying access the memory location, the DPRAM performs the test given by CPU 1. If the test succeeds, the DPRAM sets the memory location to the value given by CPU 1. Then the DPRAM wipes out its "internal note" that CPU 1 was writing there. At this point, CPU 2 could issue a test-and-set, which would succeed.
Whether or not CPU 2 was trying access the memory location, the DPRAM now performs CPU 1's test. If the test succeeds, the DPRAM sets memory location A to the value specified by CPU 1. If the test fails, the DPRAM copies the value back from the special register to memory location A. Either operation wipes out the special flag value. If CPU 2 now issues a test-and-set, it will succeed.
Software implementation of test-and-set
Many processors have an atomic test-and-set machine language instruction.
Those that do not can still implement an atomic test-and-set using an atomic swap (as shown below) or some other atomic read-modify-write
instruction.
The test and set instruction when used with boolean values behaves like the following function. Crucially the entire function is executed atomically: no process can interrupt the function mid-execution and hence see a state that only exists during the execution of the function. This code only serves to help explain the behaviour of test-and-set; atomicity requires explicit hardware support and hence can't be implemented as a simple function. NOTE: In this example, 'lock' is assumed
to be passed by reference (or by name) but the assignment to 'initial' creates a new value (not just copying a reference).
function TestAndSet(boolean lock) {
boolean initial = lock
lock = true
return initial
}
The above code segment is not atomic in the sense of the test-and-set instruction. It also differs from the descriptions of DPRAM hardware test-and-set above in that here the "set" value and the test are fixed and invariant, and the "set" part of the operation is done regardless of the outcome of the test, whereas in the description of DPRAM test-and-set, the memory is set only upon passage of the test, and the value to set and the test condition are specified by the CPU. Here, the value to set can only be 1, but if 0 and 1 are considered the only valid values for the memory location, and "value is nonzero" is the only allowed test, then this equates to the case described for DPRAM hardware (or, more specifically, the DPRAM case reduces to this under these constraints). From that viewpoint, this can correctly be called "test-and-set" in the full conventional sense of the term. The essential point to note, which this software function does embody, is the general intent and principle of test-and-set: that a value both is tested and is set in one atomic operation such that no other program thread might cause the target memory location to change after it is tested but before it is set, which would violate the logic requiring that the location will only be set when it has a certain value. (That is, critically, as opposed to merely when it very recently had that value.)
In the C programming language
, the implementation would be like:
where SwapAtomic atomically first reads the current value pointed to by lockPtr and then writes 1 to the location. Being atomic, SwapAtomic never uses cached values and always commits to the shared memory store (RAM).
The code also shows that TestAndSet is really two operations: an atomical swap and a test. Only the swap needs to be atomic. (This is true because delaying the value comparison itself by any amount of time will not change the result of the test, once the value to test has been obtained. Once the swap acquires the initial value, the result of the test has been determined, even if it has not been computed yet—e.g. in the C language example, by the operator.)
Implementing mutual exclusion with test-and-set
One way to implement mutual exclusion
uses test-and-set in the following way:
boolean lock = false
function Critical{
while TestAndSet(lock)
skip //spin until lock is acquired
critical section //only one process can be in this section at a time
lock = false //release lock when finished with the critical section
}
In pseudo C it would be like:
volatile int lock = 0;
void Critical {
while (TestAndSet(&lock) 1);
critical section //only one process can be in this section at a time
lock = 0 //release lock when finished with the critical section
}
Note the volatile
keyword. In absence of volatile, the compiler and/or the CPU(s) will quite certainly optimize access to lock and/or use cached values, thus rendering the above code erroneous.
Conversely, and unfortunately, the presence of volatile does not guarantee that reads and writes are committed to memory. Some compilers issue memory barrier
s to ensure that operations are committed to memory, but since the semantics of volatile in C/C++ is quite vague, not all compilers will do that. Consult your compiler's documentation to determine if it does.
This function can be called by multiple processes, but it is guaranteed that only one process will be in the critical section at a time.
Another way to implement Mutual exclusion
, known as Test and Test-and-set
, is more efficient than the above technique on multiprocessor machines. The "Test and Test-and-set" technique uses the same "test-and-set" instruction as the above technique, but has better cache coherence
.
Both the above technique and "Test and Test-and-set" are examples of a spinlock
: the while-loop spins waiting to acquire the lock.
We see that TSL tests a shared memory location (flag): It copies it to register reg, then if it's zero, it sets it to 1, otherwise does nothing. TSL is atomic – it can't be interrupted. That is, all of its actions can be regarded as taking place at once. By using it with a spin lock (jnz enter_region), we repeatedly test if the resource is busy; when it is released (flag becomes zero) TSL will have reset it non-zero for us, and we will have claimed the resource because the spin lock testing reg will release, since reg has the previous value of flag, which was zero.
s. In uniprocessor systems this technique isn't needed (unless using multiple processes to access the same data); to use semaphores, it is sufficient to disable interrupt
s before accessing a semaphore. However, in multiprocessor systems, it is undesirable, if not impossible, to disable interrupts on all processors at the same time. Even with interrupts disabled, two or more processors could be attempting to access the same semaphore's memory at the same time. In this case, the test-and-set instruction may be used.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, the test-and-set instruction is an instruction used to write to a memory location and return its old value as a single atomic (i.e., non-interruptible) operation. If multiple processes may access the same memory, and if a process is currently performing a test-and-set, no other process may begin another test-and-set until the first process is done. CPU
Central processing unit
The central processing unit is the portion of a computer system that carries out the instructions of a computer program, to perform the basic arithmetical, logical, and input/output operations of the system. The CPU plays a role somewhat analogous to the brain in the computer. The term has been in...
s may use test-and-set instructions offered by other electronic components, such as dual-port RAM; CPUs may also offer a test-and-set instruction themselves.
A lock can be built using an atomic test-and-set instruction as follows:
function Lock(boolean *lock) {
while (test_and_set (lock) 1)
;
}
The calling process obtains the lock if the old value was 0. It spins writing 1 to the variable until this occurs.
Maurice Herlihy
Maurice Herlihy
Maurice Herlihy is a computer scientist active in the field of multiprocessor synchronization. Herlihy has contributed to the design of concurrent algorithms, and in particular to the exposition and quantification of the properties and uses of hardware synchronization operations...
(1991) proved that test-and-set has a finite consensus number, in contrast to the compare-and-swap
Compare-and-swap
In computer science, the compare-and-swap CPU instruction is a special instruction that atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value...
operation. The test-and-set operation can solve the wait-free consensus problem
Consensus (computer science)
Consensus is a problem in distributed computing that encapsulates the task of group agreement in the presence of faults.In particular, any process in the group may fail at any time. Consensus is fundamental to core techniques in fault tolerance, such as state machine replication.- Problem...
for no more than two concurrent processes. However, more than two decades before Herlihy's proof, IBM
IBM
International Business Machines Corporation or IBM is an American multinational technology and consulting corporation headquartered in Armonk, New York, United States. IBM manufactures and sells computer hardware and software, and it offers infrastructure, hosting and consulting services in areas...
had replaced test-and-set by compare-and-swap, which is a more general solution to this problem. Ultimately, IBM would release a processor family with 12 processors, whereas Amdahl
Amdahl Corporation
Amdahl Corporation is an information technology company which specializes in IBM mainframe-compatible computer products. Founded in 1970 by Dr. Gene Amdahl, a former IBM employee, it has been a wholly owned subsidiary of Fujitsu since 1997...
would release a processor family with the architectural maximum of 16 processors.
Hardware implementation of test-and-set
DPRAM test-and-set instructions can work in many ways. Here are two variations, both of which describe a DPRAM which provides exactly 2 ports, allowing 2 separate electronic components (such as 2 CPUs) access to every memory location on the DPRAM.
Variation 1
When CPU 1 issues a test-and-set instruction, the DPRAM first makes an "internal note" of this by storing the address of the memory location in a special place. If at this point, CPU 2 happens to issue a test-and-set instruction for the same memory location, the DPRAM first checks its "internal note", recognizes the situation, and issues a BUSY interrupt, which tells CPU 2 that it must wait and retry. This is an implementation of a busy waitingBusy waiting
In software engineering, busy-waiting or spinning is a technique in which a process repeatedly checks to see if a condition is true, such as whether keyboard input is available, or if a lock is available. Spinning can also be used to generate an arbitrary time delay, a technique that was necessary...
or spinlock
Spinlock
In software engineering, a spinlock is a lock where the thread simply waits in a loop repeatedly checking until the lock becomes available. Since the thread remains active but isn't performing a useful task, the use of such a lock is a kind of busy waiting...
using the interrupt mechanism. Since this all happens at hardware speeds, CPU 2's wait to get out of the spin-lock is very short.
Whether or not CPU 2 was trying access the memory location, the DPRAM performs the test given by CPU 1. If the test succeeds, the DPRAM sets the memory location to the value given by CPU 1. Then the DPRAM wipes out its "internal note" that CPU 1 was writing there. At this point, CPU 2 could issue a test-and-set, which would succeed.
Variation 2
CPU 1 issues a test-and-set instruction to write to "memory location A". The DPRAM does not immediately store the value in memory location A, but instead simultaneously moves the current value to a special register, while setting the contents of memory location A to a special "flag value". If at this point, CPU 2 issues a test-and-set to memory location A, the DPRAM detects the special flag value, and as in Variation 1, issues a BUSY interrupt.Whether or not CPU 2 was trying access the memory location, the DPRAM now performs CPU 1's test. If the test succeeds, the DPRAM sets memory location A to the value specified by CPU 1. If the test fails, the DPRAM copies the value back from the special register to memory location A. Either operation wipes out the special flag value. If CPU 2 now issues a test-and-set, it will succeed.
Software implementation of test-and-set
Many processors have an atomic test-and-set machine language instruction.
Those that do not can still implement an atomic test-and-set using an atomic swap (as shown below) or some other atomic read-modify-write
Read-modify-write
In computer science, read-modify-write is a class of atomic operations such as test-and-set, fetch-and-add, and compare-and-swap which both read a memory location and write a new value into it simultaneously, either with a completely new value or some function of the previous value. These...
instruction.
The test and set instruction when used with boolean values behaves like the following function. Crucially the entire function is executed atomically: no process can interrupt the function mid-execution and hence see a state that only exists during the execution of the function. This code only serves to help explain the behaviour of test-and-set; atomicity requires explicit hardware support and hence can't be implemented as a simple function. NOTE: In this example, 'lock' is assumed
to be passed by reference (or by name) but the assignment to 'initial' creates a new value (not just copying a reference).
function TestAndSet(boolean lock) {
boolean initial = lock
lock = true
return initial
}
The above code segment is not atomic in the sense of the test-and-set instruction. It also differs from the descriptions of DPRAM hardware test-and-set above in that here the "set" value and the test are fixed and invariant, and the "set" part of the operation is done regardless of the outcome of the test, whereas in the description of DPRAM test-and-set, the memory is set only upon passage of the test, and the value to set and the test condition are specified by the CPU. Here, the value to set can only be 1, but if 0 and 1 are considered the only valid values for the memory location, and "value is nonzero" is the only allowed test, then this equates to the case described for DPRAM hardware (or, more specifically, the DPRAM case reduces to this under these constraints). From that viewpoint, this can correctly be called "test-and-set" in the full conventional sense of the term. The essential point to note, which this software function does embody, is the general intent and principle of test-and-set: that a value both is tested and is set in one atomic operation such that no other program thread might cause the target memory location to change after it is tested but before it is set, which would violate the logic requiring that the location will only be set when it has a certain value. (That is, critically, as opposed to merely when it very recently had that value.)
In the C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, the implementation would be like:
where SwapAtomic atomically first reads the current value pointed to by lockPtr and then writes 1 to the location. Being atomic, SwapAtomic never uses cached values and always commits to the shared memory store (RAM).
The code also shows that TestAndSet is really two operations: an atomical swap and a test. Only the swap needs to be atomic. (This is true because delaying the value comparison itself by any amount of time will not change the result of the test, once the value to test has been obtained. Once the swap acquires the initial value, the result of the test has been determined, even if it has not been computed yet—e.g. in the C language example, by the
operator.)
Implementing mutual exclusion with test-and-set One way to implement mutual exclusion
Mutual exclusion
Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...
uses test-and-set in the following way:
boolean lock = false
function Critical{
while TestAndSet(lock)
skip //spin until lock is acquired
critical section //only one process can be in this section at a time
lock = false //release lock when finished with the critical section
}
In pseudo C it would be like:
volatile int lock = 0;
void Critical {
while (TestAndSet(&lock) 1);
critical section //only one process can be in this section at a time
lock = 0 //release lock when finished with the critical section
}
Note the volatile
Volatile variable
In computer programming, particularly in the C, C++, C#, and Java programming languages, a variable or object declared with the volatile keyword usually has special properties related to optimization and/or threading...
keyword. In absence of volatile, the compiler and/or the CPU(s) will quite certainly optimize access to lock and/or use cached values, thus rendering the above code erroneous.
Conversely, and unfortunately, the presence of volatile does not guarantee that reads and writes are committed to memory. Some compilers issue memory barrier
Memory barrier
Memory barrier, also known as membar or memory fence or fence instruction, is a type of barrier and a class of instruction which causes a central processing unit or compiler to enforce an ordering constraint on memory operations issued before and after the barrier instruction.CPUs employ...
s to ensure that operations are committed to memory, but since the semantics of volatile in C/C++ is quite vague, not all compilers will do that. Consult your compiler's documentation to determine if it does.
This function can be called by multiple processes, but it is guaranteed that only one process will be in the critical section at a time.
Another way to implement Mutual exclusion
Mutual exclusion
Mutual exclusion algorithms are used in concurrent programming to avoid the simultaneous use of a common resource, such as a global variable, by pieces of computer code called critical sections. A critical section is a piece of code in which a process or thread accesses a common resource...
, known as Test and Test-and-set
Test and Test-and-set
In computer science, the test-and-set CPU instruction is used to implementmutual exclusion in multiprocessor environments. Although a correct lock can be implemented with test-and-set, it can lead to resource contention in busy lock .To lower the overhead a more elaborate locking...
, is more efficient than the above technique on multiprocessor machines. The "Test and Test-and-set" technique uses the same "test-and-set" instruction as the above technique, but has better cache coherence
Cache coherency
In computing, cache coherence refers to the consistency of data stored in local caches of a shared resource.When clients in a system maintain caches of a common memory resource, problems may arise with inconsistent data. This is particularly true of CPUs in a multiprocessing system...
.
Both the above technique and "Test and Test-and-set" are examples of a spinlock
Spinlock
In software engineering, a spinlock is a lock where the thread simply waits in a loop repeatedly checking until the lock becomes available. Since the thread remains active but isn't performing a useful task, the use of such a lock is a kind of busy waiting...
: the while-loop spins waiting to acquire the lock.
Hardware implementation of test-and-set
We see that TSL tests a shared memory location (flag): It copies it to register reg, then if it's zero, it sets it to 1, otherwise does nothing. TSL is atomic – it can't be interrupted. That is, all of its actions can be regarded as taking place at once. By using it with a spin lock (jnz enter_region), we repeatedly test if the resource is busy; when it is released (flag becomes zero) TSL will have reset it non-zero for us, and we will have claimed the resource because the spin lock testing reg will release, since reg has the previous value of flag, which was zero.
Implementing semaphores using test-and-set
It's possible to use the test-and-set instruction to implement semaphoreSemaphore (programming)
In computer science, a semaphore is a variable or abstract data type that provides a simple but useful abstraction for controlling access by multiple processes to a common resource in a parallel programming environment....
s. In uniprocessor systems this technique isn't needed (unless using multiple processes to access the same data); to use semaphores, it is sufficient to disable interrupt
Interrupt
In computing, an interrupt is an asynchronous signal indicating the need for attention or a synchronous event in software indicating the need for a change in execution....
s before accessing a semaphore. However, in multiprocessor systems, it is undesirable, if not impossible, to disable interrupts on all processors at the same time. Even with interrupts disabled, two or more processors could be attempting to access the same semaphore's memory at the same time. In this case, the test-and-set instruction may be used.
See also
- Fetch-and-addFetch-and-addIn computer science, the fetch-and-add CPU instruction is a special instruction that atomically modifies the contents of a memory location. It is used to implement mutual exclusion and concurrent algorithms in multiprocessor systems, a generalization of semaphores.In uniprocessor systems, it is...
- Test and Test-and-setTest and Test-and-setIn computer science, the test-and-set CPU instruction is used to implementmutual exclusion in multiprocessor environments. Although a correct lock can be implemented with test-and-set, it can lead to resource contention in busy lock .To lower the overhead a more elaborate locking...
- Load-Link/Store-ConditionalLoad-Link/Store-ConditionalIn computer science, load-link and store-conditional are a pair of instructions that together implement a lock-free atomic read-modify-write operation....
- Compare-and-swapCompare-and-swapIn computer science, the compare-and-swap CPU instruction is a special instruction that atomically compares the contents of a memory location to a given value and, only if they are the same, modifies the contents of that memory location to a given new value...
External links
- Description from Encyclopaedia of Delay-Insensitive Systems
- "Wait-free Test-and-Set" by Yehuda Afek
- int testandset(int *lock) - C-callable routine written in Sun Sparc assembly language