XOR swap algorithm
Encyclopedia
In computer programming
, the XOR swap is an algorithm
that uses the XOR
bitwise operation
to swap
values of distinct variable
s having the same data type
without using a temporary variable. "Distinct" means that the variables are stored at different memory addresses; the actual values of the variables do not have to be different.
X := X XOR Y
Y := X XOR Y
X := X XOR Y
The algorithm typically corresponds to three machine code
instructions. Since XOR is a commutative operation, X XOR Y can be replaced with Y XOR X in any of the lines. When coded in assembly language, this commutativity is often exercised in the second line. For example, in IBM System/370
assembly code:
XR R1,R2
XR R2,R1
XR R1,R2
where R1 and R2 are distinct register
s and each XR operation leaves its result in the register named in the first argument. Using x86 assembly (NASM syntax), assuming that values X and Y are in registers eax and ebx (respectively) the XOR swap algorithm is formalized as follows:
xorl %ebx, %eax
xorl %eax, %ebx
xorl %ebx, %eax
where xorl places the result of the operation in the second register.
However, the algorithm fails if x and y use the same storage location, since the value stored in that location will be zeroed out by the first XOR instruction, and then remain zero; it will not be "swapped with itself". (Note that this is not the same as if x and y have the same values. The trouble only comes when x and y use the same storage location. However, if their values are equal, there is no point. We can simply declare them to be swapped out of hand, in this case.)
XOR over bit strings of length exhibits the following properties (where denotes XOR):
Suppose that we have two distinct registers
function that implements the XOR swap algorithm:
Note that the code does not swap the integers passed immediately, but first checks if their addresses are distinct. This is because, if the addresses are equal, the algorithm will fold to a triple *x ^= *x resulting in zero. However, swapping something with itself is a contradiction so generally the code can be shortened (and faster) by removing the compare.
The body of this function is sometimes seen incorrectly shortened to
.
Because these situations are rare, most optimizing compilers do not generate XOR swap code.
On modern CPU architectures, the XOR technique is considerably slower than using a temporary variable to do swapping. One reason is that modern CPUs strive to execute instructions in parallel via instruction pipeline
s. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order. If efficiency is of tremendous concern, it is advised to test the speeds of both the XOR technique and temporary variable swapping on the target architecture.
An XCHG operation was available as long ago as 1964, on the PDP-6
(where it was called EXCH) and in 1970 on the Datacraft 6024 series (where it was called XCHG). The Intel 8086
, released in 1978, also included an instruction named XCHG. All three of these instructions swapped registers with registers, or registers with memory, but were unable to swap the contents of two memory locations. The Motorola 68000
's EXG operation can only swap registers with registers. The PDP-10
inherited the PDP-6's EXCH instruction, but the PDP-11
(the machine on which the C programming language
was developed) did not.
On the common x86
architecture, the XCHG instruction with a memory operand has an implicit LOCK prefix, so that the operation is atomic, requiring hundreds of cpu cycles to synchronize with every other device that can access main memory. By comparison, on many processors a single MOV instruction can be issued at the same time as other common instructions in a single clock cycle. Between two register
s, XCHG may still be slower than three MOVs (but most probably due to register renaming
the same micro-ops would be issued for both), but is smaller, so may be used where code size matters. x86's XCHG is primarily useful for its locking form, for writing locking primitives
used in threaded or multiprocessing applications.
. As noted above, if an attempt is made to XOR-swap the contents of some location with itself, the result is that the location is zeroed out and its value lost. Therefore, XOR swapping must not be used blindly in a high-level language if aliasing is possible.
Unlike the XOR swap, this variation requires that the underlying processor or programming language uses a method such as modular arithmetic
or bignums to guarantee that the computation of
. Therefore, it is seen even more rarely in practice than the XOR swap.
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
, the XOR swap is an algorithm
Algorithm
In mathematics and computer science, an algorithm is an effective method expressed as a finite list of well-defined instructions for calculating a function. Algorithms are used for calculation, data processing, and automated reasoning...
that uses the XOR
Exclusive disjunction
The logical operation exclusive disjunction, also called exclusive or , is a type of logical disjunction on two operands that results in a value of true if exactly one of the operands has a value of true...
bitwise operation
Bitwise operation
A bitwise operation operates on one or more bit patterns or binary numerals at the level of their individual bits. This is used directly at the digital hardware level as well as in microcode, machine code and certain kinds of high level languages...
to swap
Swap (computer science)
In computer programming, the act of swapping two variables refers to mutually exchanging the values of the variables. Usually, this is done with the data in memory...
values of distinct variable
Variable (programming)
In computer programming, a variable is a symbolic name given to some known or unknown quantity or information, for the purpose of allowing the name to be used independently of the information it represents...
s having the same data type
Data type
In computer programming, a data type is a classification identifying one of various types of data, such as floating-point, integer, or Boolean, that determines the possible values for that type; the operations that can be done on values of that type; the meaning of the data; and the way values of...
without using a temporary variable. "Distinct" means that the variables are stored at different memory addresses; the actual values of the variables do not have to be different.
The algorithm
Conventional swapping requires the use of a temporary storage variable. Using the XOR swap algorithm, however, no temporary storage is needed. The algorithm is as follows:X := X XOR Y
Y := X XOR Y
X := X XOR Y
The algorithm typically corresponds to three machine code
Machine code
Machine code or machine language is a system of impartible instructions executed directly by a computer's central processing unit. Each instruction performs a very specific task, typically either an operation on a unit of data Machine code or machine language is a system of impartible instructions...
instructions. Since XOR is a commutative operation, X XOR Y can be replaced with Y XOR X in any of the lines. When coded in assembly language, this commutativity is often exercised in the second line. For example, in IBM System/370
System/370
The IBM System/370 was a model range of IBM mainframes announced on June 30, 1970 as the successors to the System/360 family. The series maintained backward compatibility with the S/360, allowing an easy migration path for customers; this, plus improved performance, were the dominant themes of the...
assembly code:
XR R1,R2
XR R2,R1
XR R1,R2
where R1 and R2 are distinct register
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
s and each XR operation leaves its result in the register named in the first argument. Using x86 assembly (NASM syntax), assuming that values X and Y are in registers eax and ebx (respectively) the XOR swap algorithm is formalized as follows:
xorl %ebx, %eax
xorl %eax, %ebx
xorl %ebx, %eax
where xorl places the result of the operation in the second register.
However, the algorithm fails if x and y use the same storage location, since the value stored in that location will be zeroed out by the first XOR instruction, and then remain zero; it will not be "swapped with itself". (Note that this is not the same as if x and y have the same values. The trouble only comes when x and y use the same storage location. However, if their values are equal, there is no point. We can simply declare them to be swapped out of hand, in this case.)
Proof of correctness
The binary operationBinary operation
In mathematics, a binary operation is a calculation involving two operands, in other words, an operation whose arity is two. Examples include the familiar arithmetic operations of addition, subtraction, multiplication and division....
XOR over bit strings of length exhibits the following properties (where denotes XOR):
- L1. Commutativity:
- L2. AssociativityAssociativityIn mathematics, associativity is a property of some binary operations. It means that, within an expression containing two or more occurrences in a row of the same associative operator, the order in which the operations are performed does not matter as long as the sequence of the operands is not...
: - L3. Identity existsIdentity elementIn mathematics, an identity element is a special type of element of a set with respect to a binary operation on that set. It leaves other elements unchanged when combined with them...
: there is a bit string, 0, (of length N) such that for any - L4. Each element is its own inverseInverse elementIn abstract algebra, the idea of an inverse element generalises the concept of a negation, in relation to addition, and a reciprocal, in relation to multiplication. The intuition is of an element that can 'undo' the effect of combination with another given element...
: for each , .
Suppose that we have two distinct registers
R1
and R2
as in the table below, with initial values A and B respectively. We perform the operations below in sequence, and reduce our results using the properties listed above.Step | Operation | Register 1 | Register 2 | Reduction |
---|---|---|---|---|
0 | Initial value | — | ||
1 | R1 := R1 XOR R2 |
— | ||
2 | R2 := R1 XOR R2 |
L2 L4 L3 |
||
3 | R1 := R1 XOR R2 |
L1 L2 L4 L3 |
||
Code example
A CC (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
function that implements the XOR swap algorithm:
Note that the code does not swap the integers passed immediately, but first checks if their addresses are distinct. This is because, if the addresses are equal, the algorithm will fold to a triple *x ^= *x resulting in zero. However, swapping something with itself is a contradiction so generally the code can be shortened (and faster) by removing the compare.
The body of this function is sometimes seen incorrectly shortened to
if (x != y) *x^=*y^=*x^=*y;
. This code has undefined behavior, since it modifies the lvalue *x
twice without an intervening sequence pointSequence point
A sequence point in imperative programming defines any point in a computer program's execution at which it is guaranteed that all side effects of previous evaluations will have been performed, and no side effects from subsequent evaluations have yet been performed...
.
Reasons for use in practice
In most practical scenarios, the trivial swap algorithm using a temporary register is more efficient. Limited situations in which XOR swapping may be practical include:- On a processor where the instruction set encoding permits the XOR swap to be encoded in a smaller number of bytes;
- In a region with high register pressure, it may allow the register allocator to avoid spilling a register.
- In Microcontrollers where available RAM is very limited.
Because these situations are rare, most optimizing compilers do not generate XOR swap code.
Reasons for avoidance in practice
Most modern compilers can optimize away the temporary variable in the naive swap, in which case the naive swap uses the same amount of memory and the same number of registers as the XOR swap and is at least as fast, and often faster. The XOR swap is also much less readable, and can be completely opaque to anyone who isn't already familiar with the technique.On modern CPU architectures, the XOR technique is considerably slower than using a temporary variable to do swapping. One reason is that modern CPUs strive to execute instructions in parallel via instruction pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
s. In the XOR technique, the inputs to each operation depend on the results of the previous operation, so they must be executed in strictly sequential order. If efficiency is of tremendous concern, it is advised to test the speeds of both the XOR technique and temporary variable swapping on the target architecture.
The XCHG instruction
Modern optimizing compilers work by translating the code they are given into an internal flow-based representation which they transform in many ways before producing their machine-code output. These compilers are more likely to recognize and optimize a conventional (temporary-based) swap than to recognize the high-level language statements that correspond to an XOR swap. Many times, what is written as a swap in high-level code is translated by the compiler into a simple internal note that two variables have swapped memory addresses, rather than any amount of machine code. Other times, when the target architecture supports it, the compiler can use a single XCHG (exchange) instruction which performs the swap in a single operation.An XCHG operation was available as long ago as 1964, on the PDP-6
PDP-6
The PDP-6 was a computer model developed by Digital Equipment Corporation in 1963. It was influential primarily as the prototype for the later PDP-10; the instruction sets of the two machines are almost identical.The PDP-6 was DEC's first "big" machine...
(where it was called EXCH) and in 1970 on the Datacraft 6024 series (where it was called XCHG). The Intel 8086
Intel 8086
The 8086 is a 16-bit microprocessor chip designed by Intel between early 1976 and mid-1978, when it was released. The 8086 gave rise to the x86 architecture of Intel's future processors...
, released in 1978, also included an instruction named XCHG. All three of these instructions swapped registers with registers, or registers with memory, but were unable to swap the contents of two memory locations. The Motorola 68000
Motorola 68000
The Motorola 68000 is a 16/32-bit CISC microprocessor core designed and marketed by Freescale Semiconductor...
's EXG operation can only swap registers with registers. The PDP-10
PDP-10
The PDP-10 was a mainframe computer family manufactured by Digital Equipment Corporation from the late 1960s on; the name stands for "Programmed Data Processor model 10". The first model was delivered in 1966...
inherited the PDP-6's EXCH instruction, but the PDP-11
PDP-11
The PDP-11 was a series of 16-bit minicomputers sold by Digital Equipment Corporation from 1970 into the 1990s, one of a succession of products in the PDP series. The PDP-11 replaced the PDP-8 in many real-time applications, although both product lines lived in parallel for more than 10 years...
(the machine on which the C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
was developed) did not.
On the common x86
X86 architecture
The term x86 refers to a family of instruction set architectures based on the Intel 8086 CPU. The 8086 was launched in 1978 as a fully 16-bit extension of Intel's 8-bit based 8080 microprocessor and also introduced segmentation to overcome the 16-bit addressing barrier of such designs...
architecture, the XCHG instruction with a memory operand has an implicit LOCK prefix, so that the operation is atomic, requiring hundreds of cpu cycles to synchronize with every other device that can access main memory. By comparison, on many processors a single MOV instruction can be issued at the same time as other common instructions in a single clock cycle. Between two register
Processor register
In computer architecture, a processor register is a small amount of storage available as part of a CPU or other digital processor. Such registers are addressed by mechanisms other than main memory and can be accessed more quickly...
s, XCHG may still be slower than three MOVs (but most probably due to register renaming
Register renaming
In computer architecture, register renaming refers to a technique used to avoid unnecessary serialization of program operations imposed by the reuse of registers by those operations.-Problem definition:...
the same micro-ops would be issued for both), but is smaller, so may be used where code size matters. x86's XCHG is primarily useful for its locking form, for writing locking primitives
Spinlock
In software engineering, a spinlock is a lock where the thread simply waits in a loop repeatedly checking until the lock becomes available. Since the thread remains active but isn't performing a useful task, the use of such a lock is a kind of busy waiting...
used in threaded or multiprocessing applications.
Aliasing
The XOR swap is also complicated in practice by aliasingAliasing (computing)
In computing, aliasing describes a situation in which a data location in memory can be accessed through different symbolic names in the program. Thus, modifying the data through one name implicitly modifies the values associated to all aliased names, which may not be expected by the programmer...
. As noted above, if an attempt is made to XOR-swap the contents of some location with itself, the result is that the location is zeroed out and its value lost. Therefore, XOR swapping must not be used blindly in a high-level language if aliasing is possible.
Variations
The underlying principle of the XOR swap algorithm can be applied to any operation meeting criteria L1 through L4 above. Replacing XOR by addition and subtraction gives a slightly different, but largely equivalent, formulation:Unlike the XOR swap, this variation requires that the underlying processor or programming language uses a method such as modular arithmetic
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....
or bignums to guarantee that the computation of
X + Y
cannot cause an error due to integer overflowInteger overflow
In computer programming, an integer overflow occurs when an arithmetic operation attempts to create a numeric value that is too large to be represented within the available storage space. For instance, adding 1 to the largest value that can be represented constitutes an integer overflow...
. Therefore, it is seen even more rarely in practice than the XOR swap.
See also
- Symmetric differenceSymmetric differenceIn mathematics, the symmetric difference of two sets is the set of elements which are in either of the sets and not in their intersection. The symmetric difference of the sets A and B is commonly denoted by A\,\Delta\,B\,orA \ominus B....
- XOR linked listXOR linked listAn XOR linked list is a data structure used in computer programming. They take advantage of the bitwise exclusive disjunction operation, here denoted by ⊕, to decrease storage requirements for doubly linked lists. An ordinary doubly linked list stores addresses of the previous and next list items...
- Feistel cipherFeistel cipherIn cryptography, a Feistel cipher is a symmetric structure used in the construction of block ciphers, named after the German-born physicist and cryptographer Horst Feistel who did pioneering research while working for IBM ; it is also commonly known as a Feistel network. A large proportion of block...
(the XOR swap algorithm is a degenerate form of a Feistel cypher)