Lookup table
Encyclopedia
In computer science
, a lookup table is a data structure
, usually an array or associative array
, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than undergoing an 'expensive' computation or input/output
operation. The tables may be precalculated and stored in static
program storage or calculated (or "pre-fetched"
) as part of a programs initialization phase (memoization
). Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input.
, logarithms
, and statistical density functions
In ancient India, Aryabhata
created one of the first sine tables
, which he encoded in a Sanskrit-letter-based number system. In 493 A.D., Victorius of Aquitaine
wrote a 98-column multiplication table which gave (in Roman numerals
) the product of every number from 2 to 50 times and the rows were "a list of numbers starting with one thousand, descending by hundreds to one hundred, then descending by tens to ten, then by ones to one, and then the fractions down to 1/144" Modern school children are often taught to memorize "times tables" to avoid calculations of the most commonly used numbers (up to 9 x 9 or 12 x 12).
Early in the history of computers, input/output
operations were particularly slow - even in comparison to processor speeds of the time. It made sense to reduce expensive read operations by a form of manual caching
by creating either static lookup tables (embedded in the program) or dynamic prefetched arrays to contain only the most commonly occurring data items. Despite the introduction of systemwide caching that now automates this process, application level lookup tables can still improve performance for data items that rarely, if ever, change.
or brute-force search
, each element being checked for equality in turn and the associated value, if any, used as a result of the search. This is often the slowest search method unless frequently occurring values occur early in the list. For a one dimensional array or linked list
, the lookup is usually to determine whether or not there is a match with an 'input' data value.
:
On the other hand:
Some hybrid solutions try to combine the advantages of the two representations. Unrolled linked list
s store several elements in each list node, increasing cache performance while decreasing memory overhead for references. CDR coding
does both these as well, by replacing references with the actual data referenced, which extends off the end of the referencing record.
", binary search involves each element being found by determining which half of the table a match may be found in and repeating until either success or failure. Only possible if the list is sorted but gives good performance even if the list is lengthy.
value is used directly as an index to a one dimensional table to extract a result. For small ranges, this can be amongst the fastest lookup, even exceeding binary search speed with zero branches and executing in constant time.
. For example, the decimal number "37" is "00100101" in binary, so it contains three bits that are set to binary "1".
A simple example of C
code, designed to count the 1 bits in a int, might look like this:
This apparently simple algorithm can take potentially hundreds of cycles even on a modern architecture, because it makes many branches in the loop - and branching is slow. This can be ameliorated using loop unrolling and some other compiler optimizations. There is however a simple and much faster algorithmic solution - using a trivial hash function table lookup.
Simply construct a static table, bits_set, with 256 entries giving the number of one bits set in each possible byte value (e.g. 0x00 = 0, 0x01 = 1, 0x02 = 1, and so on). Then use this table to find the number of ones in each byte of the integer using a trivial hash function lookup on each byte in turn, and sum them. This requires no branches, and just four indexed memory accesses, considerably faster than the earlier code.
The above source can be improved easily, (avoiding AND'ing, and shifting) by 'recasting' 'x' as a 4 byte unsigned char array and, preferably, coded in-line as a single statement instead of being a function.
Note that even this simple algorithm can be too slow now, because the original code might run faster from the cache of modern processors, and (large) lookup tables do not fit well in caches and can cause a slower access to memory (in addition, in the above example, it requires computing addresses within a table, to perform the four lookups needed).
, a lookup table (LUT) is used to transform the input data into a more desirable output format. For example, a grayscale picture of the planet Saturn will be transformed into a color image to emphasize the differences in its rings.
A classic example of reducing run-time computations using lookup tables is to obtain the result of a trigonometry
calculation, such as the sine
of a value. Calculating trigonometric functions can substantially slow a computing application. The same application can finish much sooner when it first precalculates the sine of a number of values, for example for each whole number of degrees (The table can be defined as static variables at compile time, reducing repeated run time costs).
When the program requires the sine of a value, it can use the lookup table to retrieve the closest sine value from a memory address, and may also take the step of interpolating to the sine of the desired value, instead of calculating by mathematical formula. Lookup tables are thus used by mathematics co-processors in computer systems. An error in a lookup table was responsible for Intel's infamous floating-point divide bug
.
Functions of a single variable (such as sine and cosine) may be implemented by a simple array. Functions involving two or more variables require multidimensional array indexing techniques. The latter case may thus employ a two-dimensional array of power[x][y] to replace a function to calculate xy for a limited range of x and y values. Functions that have more than one result may be implemented with lookup tables that are arrays of structures.
As mentioned, there are intermediate solutions that use tables in combination with a small amount of computation, often using interpolation
. Pre-calculation combined with interpolation can produce higher accuracy for values that fall between two precomputed values. This technique requires slightly more time to be performed but can greatly enhance accuracy in applications that require the higher accuracy. Depending on the values being precomputed, pre-computation with interpolation can also be used to shrink the lookup table size while maintaining accuracy.
In image processing
, lookup tables are often called LUT
s and give an output value for each of a range of index values. One common LUT, called the colormap or palette
, is used to determine the colors and intensity values with which a particular image will be displayed. In computed tomography
, "windowing" refers to a related concept for determining how to display the intensity of measured radiation..
While often effective, employing a lookup table may nevertheless result in a severe penalty if the computation that the LUT replaces is relatively simple. Memory retrieval time and the complexity of memory requirements can increase application operation time and system complexity relative to what would be required by straight formula computation. The possibility of polluting the cache
may also become a problem. Table accesses for large tables will almost certainly cause a cache miss. This phenomenon is increasingly becoming an issue as processors outpace memory. A similar issue appears in rematerialization
, a compiler optimization
. In some environments, such as the Java programming language
, table lookups can be even more expensive due to mandatory bounds-checking involving an additional comparison and branch for each lookup.
There are two fundamental limitations on when it is possible to construct a lookup table for a required operation. One is the amount of memory that is available: one cannot construct a lookup table larger than the space available for the table, although it is possible to construct disk-based lookup tables at the expense of lookup time. The other is the time required to compute the table values in the first instance; although this usually needs to be done only once, if it takes a prohibitively long time, it may make the use of a lookup table an inappropriate solution. As previously stated however, tables can be statically defined in many cases.
of a given value. Instead, they use the CORDIC
algorithm or a complex formula such as the following Taylor series
to compute the value of sine to a high degree of precision:
(for x close to 0)
However, this can be expensive to compute, especially on slow processors, and there are many applications, particularly in traditional computer graphics
, that need to compute many thousands of sine values every second. A common solution is to initially compute the sine of many evenly distributed values, and then to find the sine of x we choose the sine of the value closest to x. This will be close to the correct value because sine is a continuous function
with a bounded rate of change. For example:
real array sine_table[-1000..1000]
for x from -1000 to 1000
sine_table[x] := sine(pi * x / 1000)
function lookup_sine(x)
return sine_table[round(1000 * x / pi)]
Unfortunately, the table requires quite a bit of space: if IEEE double-precision floating-point numbers are used, over 16,000 bytes would be required. We can use fewer samples, but then our precision will significantly worsen. One good solution is linear interpolation
, which draws a line between the two points in the table on either side of the value and locates the answer on that line. This is still quick to compute, and much more accurate for smooth function
s such as the sine function. Here is our example using linear interpolation:
function lookup_sine(x)
x1 := floor(x*1000/pi)
y1 := sine_table[x1]
y2 := sine_table[x1+1]
return y1 + (y2-y1)*(x*1000/pi-x1)
Another solution that uses a quarter of the space but takes a bit longer to compute would be to take into account the relationships between sine and cosine along with their symmetry rules. In this case, the lookup table is calculated by using the sine function for the first quadrant (i.e. sin(0..pi/2)). When we need a value, we assign a variable to be the angle wrapped to the first quadrant. We then wrap the angle to the four quadrants (not needed if values are always between 0 and 2*pi) and return the correct value (i.e. first quadrant is a straight return, second quadrant is read from pi/2-x, third and fourth are negatives of the first and second respectively). For cosine, we only have to return the angle shifted by pi/2 (i.e. x+pi/2). For tangent, we divide the sine by the cosine (divide-by-zero handling may be needed depending on implementation):
function init_sine
for x from 0 to (360/4)+1
sine_table[x] := sine(2*pi * x / 360)
function lookup_sine(x)
x = wrap x from 0 to 360
y := mod (x, 90)
if (x < 90) return sine_table[ y]
if (x < 180) return sine_table[90-y]
if (x < 270) return -sine_table[ y]
return -sine_table[90-y]
function lookup_cosine(x)
return lookup_sine(x + 90)
function lookup_tan(x)
return (lookup_sine(x) / lookup_cosine(x))
When using interpolation, the size of the lookup table can be reduced by using non uniform sampling, which means that where the function is close to straight, we use few sample points, while where it changes value quickly we use more sample points to keep the approximation close to the real curve. For more information, see interpolation
.
A single (fast) lookup is performed to read the tag in the lookup table at the index specified by the lowest bits of the desired external storage address, and to determine if the memory address is hit by the cache. When a hit is found, no access to external memory is needed (except for write operations, where the cached value may need to be updated asynchronously to the slower memory after some time, or if the position in the cache must be replaced to cache another address).
whose select lines are the inputs of the LUT and whose inputs are constants. An n-bit LUT can encode any n-input Boolean function by modeling such functions as truth table
s. This is an efficient way of encoding Boolean logic
functions, and LUTs with 4-6 bits of input are in fact the key component of modern Field-programmable gate array
s (FPGAs).
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, a lookup table is a data structure
Data structure
In computer science, a data structure is a particular way of storing and organizing data in a computer so that it can be used efficiently.Different kinds of data structures are suited to different kinds of applications, and some are highly specialized to specific tasks...
, usually an array or associative array
Associative array
In computer science, an associative array is an abstract data type composed of a collection of pairs, such that each possible key appears at most once in the collection....
, often used to replace a runtime computation with a simpler array indexing operation. The savings in terms of processing time can be significant, since retrieving a value from memory is often faster than undergoing an 'expensive' computation or input/output
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...
operation. The tables may be precalculated and stored in static
Static memory allocation
Static memory allocation refers to the process of allocating memory at compile-time before the associated program is executed, unlike dynamic memory allocation or automatic memory allocation where memory is allocated as required at run-time....
program storage or calculated (or "pre-fetched"
Prefetcher
The Prefetcher is a component of versions of Microsoft Windows starting with Windows XP. It is a component of the Memory Manager that speeds up the Windows boot process, and shortens the amount of time it takes to start up programs...
) as part of a programs initialization phase (memoization
Memoization
In computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs...
). Lookup tables are also used extensively to validate input values by matching against a list of valid (or invalid) items in an array and, in some programming languages, may include pointer functions (or offsets to labels) to process the matching input.
History
Before the advent of computers, lookup tables of values were used by people to speed up hand calculations of complex functions, such as in trigonometryTrigonometry
Trigonometry is a branch of mathematics that studies triangles and the relationships between their sides and the angles between these sides. Trigonometry defines the trigonometric functions, which describe those relationships and have applicability to cyclical phenomena, such as waves...
, logarithms
Common logarithm
The common logarithm is the logarithm with base 10. It is also known as the decadic logarithm, named after its base. It is indicated by log10, or sometimes Log with a capital L...
, and statistical density functions
In ancient India, Aryabhata
Aryabhata
Aryabhata was the first in the line of great mathematician-astronomers from the classical age of Indian mathematics and Indian astronomy...
created one of the first sine tables
Āryabhaṭa's sine table
Āryabhaṭa's sine table is a set of twenty-four of numbers given in the astronomical treatise Āryabhaṭiya composed by the fifth century Indian mathematician and astronomer Āryabhaṭa , for the computation of the half-chords of certain set of arcs of a circle...
, which he encoded in a Sanskrit-letter-based number system. In 493 A.D., Victorius of Aquitaine
Victorius of Aquitaine
Victorius of Aquitaine, a countryman of Prosper of Aquitaine and also working in Rome, produced in 457 an Easter Cycle, which was based on the consular list provided by Prosper's Chronicle. This dependency caused scholars to think that Prosper had been working on his own Easter Annals for quite...
wrote a 98-column multiplication table which gave (in Roman numerals
Roman numerals
The numeral system of ancient Rome, or Roman numerals, uses combinations of letters from the Latin alphabet to signify values. The numbers 1 to 10 can be expressed in Roman numerals as:...
) the product of every number from 2 to 50 times and the rows were "a list of numbers starting with one thousand, descending by hundreds to one hundred, then descending by tens to ten, then by ones to one, and then the fractions down to 1/144" Modern school children are often taught to memorize "times tables" to avoid calculations of the most commonly used numbers (up to 9 x 9 or 12 x 12).
Early in the history of computers, input/output
Input/output
In computing, input/output, or I/O, refers to the communication between an information processing system , and the outside world, possibly a human, or another information processing system. Inputs are the signals or data received by the system, and outputs are the signals or data sent from it...
operations were particularly slow - even in comparison to processor speeds of the time. It made sense to reduce expensive read operations by a form of manual caching
Cache
In computer engineering, a cache is a component that transparently stores data so that future requests for that data can be served faster. The data that is stored within a cache might be values that have been computed earlier or duplicates of original values that are stored elsewhere...
by creating either static lookup tables (embedded in the program) or dynamic prefetched arrays to contain only the most commonly occurring data items. Despite the introduction of systemwide caching that now automates this process, application level lookup tables can still improve performance for data items that rarely, if ever, change.
Simple lookup in an array, an associative array or a linked list (unsorted list)
This is known as a linear searchLinear search
In computer science, linear search or sequential search is a method for finding a particular value in a list, that consists of checking every one of its elements, one at a time and in sequence, until the desired one is found....
or brute-force search
Brute-force search
In computer science, brute-force search or exhaustive search, also known as generate and test, is a trivial but very general problem-solving technique that consists of systematically enumerating all possible candidates for the solution and checking whether each candidate satisfies the problem's...
, each element being checked for equality in turn and the associated value, if any, used as a result of the search. This is often the slowest search method unless frequently occurring values occur early in the list. For a one dimensional array or linked list
Linked list
In computer science, a linked list is a data structure consisting of a group of nodes which together represent a sequence. Under the simplest form, each node is composed of a datum and a reference to the next node in the sequence; more complex variants add additional links...
, the lookup is usually to determine whether or not there is a match with an 'input' data value.
Linked lists vs. arrays
Linked lists have some advantages over arraysArray data type
In computer science, an array type is a data type that is meant to describe a collection of elements , each selected by one or more indices that can be computed at run time by the program. Such a collection is usually called an array variable, array value, or simply array...
:
- Insertion or deletion of an element at a specific point of a list is a constant time operation. (While one can "delete" an element from an array in constant time by somehow marking its slot as "vacant", an algorithm that iterates over the elements may have to skip a large number of vacant slots).
- arbitrarily many elements may be inserted into a linked list, limited only by the total memory available; while an array will eventually fill up, and then have to be resized — an expensive operation, that may not even be possible if memory is fragmented. Similarly, an array from which many elements are removed, may have to be resized in order to avoid wasting too much space.
On the other hand:
- arrays allow random accessRandom accessIn computer science, random access is the ability to access an element at an arbitrary position in a sequence in equal time, independent of sequence size. The position is arbitrary in the sense that it is unpredictable, thus the use of the term "random" in "random access"...
, while linked lists allow only sequential accessSequential accessIn computer science, sequential access means that a group of elements is accessed in a predetermined, ordered sequence. Sequential access is sometimes the only way of accessing the data, for example if it is on a tape...
to elements. Singly linked lists, in fact, can only be traversed in one direction. This makes linked lists unsuitable for applications where it's useful to quickly look up an element by its index, such as heapsortHeapsortHeapsort is a comparison-based sorting algorithm to create a sorted array , and is part of the selection sort family. Although somewhat slower in practice on most machines than a well implemented quicksort, it has the advantage of a more favorable worst-case O runtime...
. See also trivial hash function below. - Sequential access on arrays is also faster than on linked lists on many machines, because they have greater locality of referenceLocality of referenceIn computer science, locality of reference, also known as the principle of locality, is the phenomenon of the same value or related storage locations being frequently accessed. There are two basic types of reference locality. Temporal locality refers to the reuse of specific data and/or resources...
and thus benefit more from processor caching.
- linked lists require extra storage needed for references, that often makes them impractical for lists of small data items such as charactersCharacter (computing)In computer and machine-based telecommunications terminology, a character is a unit of information that roughly corresponds to a grapheme, grapheme-like unit, or symbol, such as in an alphabet or syllabary in the written form of a natural language....
or boolean values. It can also be slow, and with a naïve allocator, wasteful, to allocate memory separately for each new element, a problem generally solved using memory poolMemory poolMemory pools, also called fixed-size-blocks allocation, allow dynamic memory allocation comparable to malloc or C++'s operator new. As those implementations suffer from fragmentation because of variable block sizes, it can be impossible to use them in a real time system due to performance...
s.
Some hybrid solutions try to combine the advantages of the two representations. Unrolled linked list
Unrolled linked list
In computer programming, an unrolled linked list is a variation on the linked list which stores multiple elements in each node. It can drastically increase cache performance, while decreasing the memory overhead associated with storing list metadata such as references...
s store several elements in each list node, increasing cache performance while decreasing memory overhead for references. CDR coding
CDR coding
In computer science CDR coding is a compressed data representation for Lisp linked lists. It was developed and patented by the MIT Artificial Intelligence Laboratory, and implemented in computer hardware in a number of Lisp machines derived from the MIT CADR....
does both these as well, by replacing references with the actual data referenced, which extends off the end of the referencing record.
Binary search in an array or an associative array (sorted list)
An example of a "divide and conquer algorithmDivide and conquer algorithm
In computer science, divide and conquer is an important algorithm design paradigm based on multi-branched recursion. A divide and conquer algorithm works by recursively breaking down a problem into two or more sub-problems of the same type, until these become simple enough to be solved directly...
", binary search involves each element being found by determining which half of the table a match may be found in and repeating until either success or failure. Only possible if the list is sorted but gives good performance even if the list is lengthy.
Trivial hash function
For a trivial hash function lookup, the unsigned raw dataRaw data
'\putang inaIn computing, it may have the following attributes: possibly containing errors, not validated; in sfferent formats; uncoded or unformatted; and suspect, requiring confirmation or citation. For example, a data input sheet might contain dates as raw data in many forms: "31st January...
value is used directly as an index to a one dimensional table to extract a result. For small ranges, this can be amongst the fastest lookup, even exceeding binary search speed with zero branches and executing in constant time.
Counting '1' bits in a series of bytes
One discrete problem that is expensive to solve on many computers, is that of counting the number of bits which are set to 1 in a (binary) number, sometimes called the population functionHamming weight
The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string...
. For example, the decimal number "37" is "00100101" in binary, so it contains three bits that are set to binary "1".
A simple example of C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
code, designed to count the 1 bits in a int, might look like this:
This apparently simple algorithm can take potentially hundreds of cycles even on a modern architecture, because it makes many branches in the loop - and branching is slow. This can be ameliorated using loop unrolling and some other compiler optimizations. There is however a simple and much faster algorithmic solution - using a trivial hash function table lookup.
Simply construct a static table, bits_set, with 256 entries giving the number of one bits set in each possible byte value (e.g. 0x00 = 0, 0x01 = 1, 0x02 = 1, and so on). Then use this table to find the number of ones in each byte of the integer using a trivial hash function lookup on each byte in turn, and sum them. This requires no branches, and just four indexed memory accesses, considerably faster than the earlier code.
The above source can be improved easily, (avoiding AND'ing, and shifting) by 'recasting' 'x' as a 4 byte unsigned char array and, preferably, coded in-line as a single statement instead of being a function.
Note that even this simple algorithm can be too slow now, because the original code might run faster from the cache of modern processors, and (large) lookup tables do not fit well in caches and can cause a slower access to memory (in addition, in the above example, it requires computing addresses within a table, to perform the four lookups needed).
LUT's in Image processing
In data analysis applications, such as image processingImage processing
In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...
, a lookup table (LUT) is used to transform the input data into a more desirable output format. For example, a grayscale picture of the planet Saturn will be transformed into a color image to emphasize the differences in its rings.
A classic example of reducing run-time computations using lookup tables is to obtain the result of a trigonometry
Trigonometry
Trigonometry is a branch of mathematics that studies triangles and the relationships between their sides and the angles between these sides. Trigonometry defines the trigonometric functions, which describe those relationships and have applicability to cyclical phenomena, such as waves...
calculation, such as the sine
Sine
In mathematics, the sine function is a function of an angle. In a right triangle, sine gives the ratio of the length of the side opposite to an angle to the length of the hypotenuse.Sine is usually listed first amongst the trigonometric functions....
of a value. Calculating trigonometric functions can substantially slow a computing application. The same application can finish much sooner when it first precalculates the sine of a number of values, for example for each whole number of degrees (The table can be defined as static variables at compile time, reducing repeated run time costs).
When the program requires the sine of a value, it can use the lookup table to retrieve the closest sine value from a memory address, and may also take the step of interpolating to the sine of the desired value, instead of calculating by mathematical formula. Lookup tables are thus used by mathematics co-processors in computer systems. An error in a lookup table was responsible for Intel's infamous floating-point divide bug
Pentium FDIV bug
The Pentium FDIV bug was a bug in the Intel P5 Pentium floating point unit . Certain floating point division operations performed with these processors would produce incorrect results...
.
Functions of a single variable (such as sine and cosine) may be implemented by a simple array. Functions involving two or more variables require multidimensional array indexing techniques. The latter case may thus employ a two-dimensional array of power[x][y] to replace a function to calculate xy for a limited range of x and y values. Functions that have more than one result may be implemented with lookup tables that are arrays of structures.
As mentioned, there are intermediate solutions that use tables in combination with a small amount of computation, often using interpolation
Interpolation
In the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points....
. Pre-calculation combined with interpolation can produce higher accuracy for values that fall between two precomputed values. This technique requires slightly more time to be performed but can greatly enhance accuracy in applications that require the higher accuracy. Depending on the values being precomputed, pre-computation with interpolation can also be used to shrink the lookup table size while maintaining accuracy.
In image processing
Image processing
In electrical engineering and computer science, image processing is any form of signal processing for which the input is an image, such as a photograph or video frame; the output of image processing may be either an image or, a set of characteristics or parameters related to the image...
, lookup tables are often called LUT
3D LUT
In the film industry, 3D LUTs are used to calculate preview colors for a monitor or digital projector of how an image will be reproduced on the final film print. A 3D LUT is a 3D lattice of output color values. Each axis is one of the 3 input color components and the input color thus defines a...
s and give an output value for each of a range of index values. One common LUT, called the colormap or palette
Palette (computing)
In computer graphics, a palette is either a given, finite set of colors for the management of digital images , or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors .Depending on the context In computer graphics, a palette is either a given,...
, is used to determine the colors and intensity values with which a particular image will be displayed. In computed tomography
Computed tomography
X-ray computed tomography or Computer tomography , is a medical imaging method employing tomography created by computer processing...
, "windowing" refers to a related concept for determining how to display the intensity of measured radiation..
While often effective, employing a lookup table may nevertheless result in a severe penalty if the computation that the LUT replaces is relatively simple. Memory retrieval time and the complexity of memory requirements can increase application operation time and system complexity relative to what would be required by straight formula computation. The possibility of polluting the cache
Cache pollution
Cache pollution describes situations where an executing computer program loads data into CPU cache unnecessarily, thus causing other needed data to be evicted from the cache into lower levels of the memory hierarchy, potentially all the way down to main memory, thus causing a performance...
may also become a problem. Table accesses for large tables will almost certainly cause a cache miss. This phenomenon is increasingly becoming an issue as processors outpace memory. A similar issue appears in rematerialization
Rematerialization
Rematerialization or remat is a compiler optimization which saves time by recomputing a value instead of loading it from memory. It is typically tightly integrated with register allocation, where it is used as an alternative to spilling registers to memory. It was conceived by Preston Briggs, Keith D...
, a compiler optimization
Compiler optimization
Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied...
. In some environments, such as the Java programming language
Java (programming language)
Java is a programming language originally developed by James Gosling at Sun Microsystems and released in 1995 as a core component of Sun Microsystems' Java platform. The language derives much of its syntax from C and C++ but has a simpler object model and fewer low-level facilities...
, table lookups can be even more expensive due to mandatory bounds-checking involving an additional comparison and branch for each lookup.
There are two fundamental limitations on when it is possible to construct a lookup table for a required operation. One is the amount of memory that is available: one cannot construct a lookup table larger than the space available for the table, although it is possible to construct disk-based lookup tables at the expense of lookup time. The other is the time required to compute the table values in the first instance; although this usually needs to be done only once, if it takes a prohibitively long time, it may make the use of a lookup table an inappropriate solution. As previously stated however, tables can be statically defined in many cases.
Computing sines
Most computers, which only perform basic arithmetic operations, cannot directly calculate the sineSine
In mathematics, the sine function is a function of an angle. In a right triangle, sine gives the ratio of the length of the side opposite to an angle to the length of the hypotenuse.Sine is usually listed first amongst the trigonometric functions....
of a given value. Instead, they use the CORDIC
CORDIC
CORDIC is a simple and efficient algorithm to calculate hyperbolic and trigonometric functions...
algorithm or a complex formula such as the following Taylor series
Taylor series
In mathematics, a Taylor series is a representation of a function as an infinite sum of terms that are calculated from the values of the function's derivatives at a single point....
to compute the value of sine to a high degree of precision:
(for x close to 0)
However, this can be expensive to compute, especially on slow processors, and there are many applications, particularly in traditional computer graphics
Computer graphics
Computer graphics are graphics created using computers and, more generally, the representation and manipulation of image data by a computer with help from specialized software and hardware....
, that need to compute many thousands of sine values every second. A common solution is to initially compute the sine of many evenly distributed values, and then to find the sine of x we choose the sine of the value closest to x. This will be close to the correct value because sine is a continuous function
Continuous function
In mathematics, a continuous function is a function for which, intuitively, "small" changes in the input result in "small" changes in the output. Otherwise, a function is said to be "discontinuous". A continuous function with a continuous inverse function is called "bicontinuous".Continuity of...
with a bounded rate of change. For example:
real array sine_table[-1000..1000]
for x from -1000 to 1000
sine_table[x] := sine(pi * x / 1000)
function lookup_sine(x)
return sine_table[round(1000 * x / pi)]
Unfortunately, the table requires quite a bit of space: if IEEE double-precision floating-point numbers are used, over 16,000 bytes would be required. We can use fewer samples, but then our precision will significantly worsen. One good solution is linear interpolation
Linear interpolation
Linear interpolation is a method of curve fitting using linear polynomials. Lerp is an abbreviation for linear interpolation, which can also be used as a verb .-Linear interpolation between two known points:...
, which draws a line between the two points in the table on either side of the value and locates the answer on that line. This is still quick to compute, and much more accurate for smooth function
Smooth function
In mathematical analysis, a differentiability class is a classification of functions according to the properties of their derivatives. Higher order differentiability classes correspond to the existence of more derivatives. Functions that have derivatives of all orders are called smooth.Most of...
s such as the sine function. Here is our example using linear interpolation:
function lookup_sine(x)
x1 := floor(x*1000/pi)
y1 := sine_table[x1]
y2 := sine_table[x1+1]
return y1 + (y2-y1)*(x*1000/pi-x1)
Another solution that uses a quarter of the space but takes a bit longer to compute would be to take into account the relationships between sine and cosine along with their symmetry rules. In this case, the lookup table is calculated by using the sine function for the first quadrant (i.e. sin(0..pi/2)). When we need a value, we assign a variable to be the angle wrapped to the first quadrant. We then wrap the angle to the four quadrants (not needed if values are always between 0 and 2*pi) and return the correct value (i.e. first quadrant is a straight return, second quadrant is read from pi/2-x, third and fourth are negatives of the first and second respectively). For cosine, we only have to return the angle shifted by pi/2 (i.e. x+pi/2). For tangent, we divide the sine by the cosine (divide-by-zero handling may be needed depending on implementation):
function init_sine
for x from 0 to (360/4)+1
sine_table[x] := sine(2*pi * x / 360)
function lookup_sine(x)
x = wrap x from 0 to 360
y := mod (x, 90)
if (x < 90) return sine_table[ y]
if (x < 180) return sine_table[90-y]
if (x < 270) return -sine_table[ y]
return -sine_table[90-y]
function lookup_cosine(x)
return lookup_sine(x + 90)
function lookup_tan(x)
return (lookup_sine(x) / lookup_cosine(x))
When using interpolation, the size of the lookup table can be reduced by using non uniform sampling, which means that where the function is close to straight, we use few sample points, while where it changes value quickly we use more sample points to keep the approximation close to the real curve. For more information, see interpolation
Interpolation
In the mathematical field of numerical analysis, interpolation is a method of constructing new data points within the range of a discrete set of known data points....
.
Caches
Storage caches (including disk caches for files, or processor caches for either code or data) work also like a lookup table. The table is built with very fast memory instead of being stored on slower external memory, and maintains two pieces of data for a subrange of bits composing an external memory (or disk) address (notably the lowest bits of any possible external address):- one piece (the tag) contains the value of the remaining bits of the address; if these bits match with those from the memory address to read or write, then the other piece contains the cached value for this address.
- the other piece maintains the data associated to that address.
A single (fast) lookup is performed to read the tag in the lookup table at the index specified by the lowest bits of the desired external storage address, and to determine if the memory address is hit by the cache. When a hit is found, no access to external memory is needed (except for write operations, where the cached value may need to be updated asynchronously to the slower memory after some time, or if the position in the cache must be replaced to cache another address).
Hardware LUTs
In digital logic, an n-bit lookup table can be implemented with a multiplexerMultiplexer
In electronics, a multiplexer is a device that selects one of several analog or digital input signals and forwards the selected input into a single line. A multiplexer of 2n inputs has n select lines, which are used to select which input line to send to the output...
whose select lines are the inputs of the LUT and whose inputs are constants. An n-bit LUT can encode any n-input Boolean function by modeling such functions as truth table
Truth table
A truth table is a mathematical table used in logic—specifically in connection with Boolean algebra, boolean functions, and propositional calculus—to compute the functional values of logical expressions on each of their functional arguments, that is, on each combination of values taken by their...
s. This is an efficient way of encoding Boolean logic
Boolean logic
Boolean algebra is a logical calculus of truth values, developed by George Boole in the 1840s. It resembles the algebra of real numbers, but with the numeric operations of multiplication xy, addition x + y, and negation −x replaced by the respective logical operations of...
functions, and LUTs with 4-6 bits of input are in fact the key component of modern Field-programmable gate array
Field-programmable gate array
A field-programmable gate array is an integrated circuit designed to be configured by the customer or designer after manufacturing—hence "field-programmable"...
s (FPGAs).
See also
- Branch tableBranch tableIn computer programming, a branch table is a term used to describe an efficient method of transferring program control to another part of a program using a table of branch instructions. It is a form of multiway branch...
- MemoizationMemoizationIn computing, memoization is an optimization technique used primarily to speed up computer programs by having function calls avoid repeating the calculation of results for previously processed inputs...
- Memory bound functionMemory bound functionMemory bound refers to a situation in which the time to complete a given computational problem is decided primarily by the amount of available memory to hold data. In other words, the limiting factor of solving a given problem is the memory access speed...
- Shift register lookup table
- PalettePalette (computing)In computer graphics, a palette is either a given, finite set of colors for the management of digital images , or a small on-screen graphical element for choosing from a limited set of choices, not necessarily colors .Depending on the context In computer graphics, a palette is either a given,...
and Colour Look-Up TableCLUTA colour look-up table is a mechanism used to transform a range of input colours into another range of colours. It can be a hardware device built into an imaging system or a software function built into an image processing application...
- for the usage in computer graphics - 3D LUT3D LUTIn the film industry, 3D LUTs are used to calculate preview colors for a monitor or digital projector of how an image will be reproduced on the final film print. A 3D LUT is a 3D lattice of output color values. Each axis is one of the 3 input color components and the input color thus defines a...
– usage in film
External links
- Fast table lookup using input character as index for branch table
- Art of Assembly: Calculation via Table Lookups
- Color Presentation of Astronomical Images
- "Bit Twiddling Hacks" (includes lookup tables) By Sean Eron Anderson of Stanford universityStanford UniversityThe Leland Stanford Junior University, commonly referred to as Stanford University or Stanford, is a private research university on an campus located near Palo Alto, California. It is situated in the northwestern Santa Clara Valley on the San Francisco Peninsula, approximately northwest of San...
- Memoization in C++ by Paul McNamee, Johns Hopkins UniversityJohns Hopkins UniversityThe Johns Hopkins University, commonly referred to as Johns Hopkins, JHU, or simply Hopkins, is a private research university based in Baltimore, Maryland, United States...
showing savings - "The Quest for an Accelerated Population Count" by Henry S. Warren, Jr.