Linear probing
Encyclopedia
Linear probing is a scheme in computer programming
for resolving hash collision
s of values of hash function
s by sequentially searching the hash table
for a free location. This is accomplished using two values - one as a starting value and one as an interval between successive values in modular arithmetic
. The second value, which is the same for all keys and known as the stepsize, is repeatedly added to the starting value until a free space is found, or the
entire table is traversed.
newLocation = (startingValue + stepSize) % arraySize
This algorithm, which is used in open-addressed hash table
s, provides good memory caching (if stepsize is equal to one), through good locality of reference, but also results in clustering, an unfortunately high probability
that where there has been one collision there will be more. The performance of linear probing is also more sensitive to input distribution when compared to double hashing
.
Given an ordinary hash function H(x), a linear probing function (H(x, i)) would be:
Here H(x) is the starting value, n the size of the hash table, and the stepsize is i in this case.
of the hash table is a constant strictly less than one. This analysis makes the (unrealistic) assumption that the hash function is completely random, but can be extended also to 5-independent hash functions
. Weaker properties, such as universal hashing
, are not strong enough to ensure the constant-time operation of linear probing, but one practical method of hash function generation, tabulation hashing
, again leads to a guaranteed constant expected time performance despite not being 5-independent.
Computer programming
Computer programming is the process of designing, writing, testing, debugging, and maintaining the source code of computer programs. This source code is written in one or more programming languages. The purpose of programming is to create a program that performs specific operations or exhibits a...
for resolving hash collision
Hash collision
Not to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....
s of values of hash function
Hash function
A hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
s by sequentially searching the hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
for a free location. This is accomplished using two values - one as a starting value and one as an interval between successive values in modular arithmetic
Modular arithmetic
In mathematics, modular arithmetic is a system of arithmetic for integers, where numbers "wrap around" after they reach a certain value—the modulus....
. The second value, which is the same for all keys and known as the stepsize, is repeatedly added to the starting value until a free space is found, or the
entire table is traversed.
newLocation = (startingValue + stepSize) % arraySize
This algorithm, which is used in open-addressed hash table
Hash table
In computer science, a hash table or hash map is a data structure that uses a hash function to map identifying values, known as keys , to their associated values . Thus, a hash table implements an associative array...
s, provides good memory caching (if stepsize is equal to one), through good locality of reference, but also results in clustering, an unfortunately high probability
Probability
Probability is ordinarily used to describe an attitude of mind towards some proposition of whose truth we arenot certain. The proposition of interest is usually of the form "Will a specific event occur?" The attitude of mind is of the form "How certain are we that the event will occur?" The...
that where there has been one collision there will be more. The performance of linear probing is also more sensitive to input distribution when compared to double hashing
Double hashing
Double hashing is a computer programming technique used in hash tables to resolve hash collisions, cases when two different values to be searched for produce the same hash key...
.
Given an ordinary hash function H(x), a linear probing function (H(x, i)) would be:
Here H(x) is the starting value, n the size of the hash table, and the stepsize is i in this case.
Dictionary operation in constant time
Using linear probing, dictionary operation can be implemented in constant time. In other words, insert, remove and find operations can be implemented in O(1), as long as the load factorLoad factor
Load factor may refer to:* Load factor , the ratio of the lift of an aircraft to its weight* Load factor , the ratio of the number of records to the number of addresses within a data structure...
of the hash table is a constant strictly less than one. This analysis makes the (unrealistic) assumption that the hash function is completely random, but can be extended also to 5-independent hash functions
K-independent hashing
A family of hash functions is said to be k-independent or k-universal if selecting a hash function at random from the family guarantees that the hash codes of any designated k keys are independent random variables...
. Weaker properties, such as universal hashing
Universal hashing
Using universal hashing refers to selecting a hash function at random from a family of hash functions with a certain mathematical property . This guarantees a low number of collisions in expectation, even if the data is chosen by an adversary...
, are not strong enough to ensure the constant-time operation of linear probing, but one practical method of hash function generation, tabulation hashing
Tabulation hashing
In computer science, tabulation hashing is a method for constructing universal families of hash functions by combining table lookup with exclusive or operations...
, again leads to a guaranteed constant expected time performance despite not being 5-independent.
See also
- Double hashingDouble hashingDouble hashing is a computer programming technique used in hash tables to resolve hash collisions, cases when two different values to be searched for produce the same hash key...
- Hash collisionHash collisionNot to be confused with wireless packet collision.In computer science, a collision or clash is a situation that occurs when two distinct pieces of data have the same hash value, checksum, fingerprint, or cryptographic digest....
- Hash functionHash functionA hash function is any algorithm or subroutine that maps large data sets to smaller data sets, called keys. For example, a single integer can serve as an index to an array...
- Quadratic probingQuadratic probingQuadratic probing is a scheme in computer programming for resolving collisions in hash tables.It is an open addressing method to handle overflows after a collision takes place in some bucket of a hash table....
- Hash table#Collision resolution
External links
- How Caching Affects Hashing by Gregory L. Heileman and Wenbin Luo 2005.