Loop interchange
Encyclopedia
In compiler theory, loop interchange is the process of exchanging the order of two iteration variables.
For example, in the code fragment:
for i from 0 to 10
for j from 0 to 20
a[i,j] = i + j
loop interchange would result in:
for j from 0 to 20
for i from 0 to 10
a[i,j] = i + j
On occasion, such a transformation may lead to opportunities to further optimize, such as vectorization of the array assignments.
misses occur if the contiguously accessed array elements within the loop come from a different cache line. Loop interchange can help prevent this. The effectiveness of loop interchange depends on and must be considered in light of the cache model used by the underlying hardware and the array model used by the compiler.
In the C programming language
, the array elements from the same row are stored consecutively (Ex: a[1,1],a[1,2], a[1,3]... ), namely row-major
. On the other hand, FORTRAN
programs store array elements from the same column together(Ex: a[1,1],a[2,1],a[3,1]...) , called column-major. Thus the order of two iteration variables in the first example is suitable for a C program while the second example is better for FORTRAN. Optimizing compilers can detect the improper ordering by programmers and interchange the order to achieve better cache performance.
, loop interchange may lead to worse performance because cache performance is only part of the story. Take the following example:
do i = 1, 10000
do j = 1, 1000
a(i) = a(i) + b(j,i) * c(i)
end do
end do
Loop interchange on this example can improve the cache performance of accessing b(j,i), but it will ruin the reuse of a(i) and c(i) in the inner loop, as it introduces two extra loads (for a(i) and for c(i)) and one extra store (for a(i)) during each iteration. As a result, the overall performance may be degraded after loop interchange.
is required.
For example, in the code fragment:
for i from 0 to 10
for j from 0 to 20
a[i,j] = i + j
loop interchange would result in:
for j from 0 to 20
for i from 0 to 10
a[i,j] = i + j
On occasion, such a transformation may lead to opportunities to further optimize, such as vectorization of the array assignments.
The utility of loop interchange
One major purpose of loop interchange is to improve the cache performance for accessing array elements. CacheCPU cache
A CPU cache is a cache used by the central processing unit of a computer to reduce the average time to access memory. The cache is a smaller, faster memory which stores copies of the data from the most frequently used main memory locations...
misses occur if the contiguously accessed array elements within the loop come from a different cache line. Loop interchange can help prevent this. The effectiveness of loop interchange depends on and must be considered in light of the cache model used by the underlying hardware and the array model used by the compiler.
In the C programming language
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
, the array elements from the same row are stored consecutively (Ex: a[1,1],a[1,2], a[1,3]... ), namely row-major
Row-major order
In computing, row-major order and column-major order describe methods for storing multidimensional arrays in linear memory. Following standard matrix notation, rows are numbered by the first index of a two-dimensional array and columns by the second index. Array layout is critical for correctly...
. On the other hand, FORTRAN
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
programs store array elements from the same column together(Ex: a[1,1],a[2,1],a[3,1]...) , called column-major. Thus the order of two iteration variables in the first example is suitable for a C program while the second example is better for FORTRAN. Optimizing compilers can detect the improper ordering by programmers and interchange the order to achieve better cache performance.
Caveat
Like any other compiler optimizationCompiler optimization
Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied...
, loop interchange may lead to worse performance because cache performance is only part of the story. Take the following example:
do i = 1, 10000
do j = 1, 1000
a(i) = a(i) + b(j,i) * c(i)
end do
end do
Loop interchange on this example can improve the cache performance of accessing b(j,i), but it will ruin the reuse of a(i) and c(i) in the inner loop, as it introduces two extra loads (for a(i) and for c(i)) and one extra store (for a(i)) during each iteration. As a result, the overall performance may be degraded after loop interchange.
Safety
It is not always safe to exchange the iteration variables due to dependencies between statements for the order in which they must execute. To determine whether a compiler can safely interchange loops, dependence analysisDependence analysis
In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2...
is required.