Automatic parallelization
Encyclopedia
Automatic parallelization, also auto parallelization, autoparallelization, or parallelization, the last one of which implies automation when used in context, refers to converting sequential code
into multi-threaded or vectorized (or even both) code in order to utilize multiple processors simultaneously in a shared-memory multiprocessor
(SMP
) machine. The goal of automatic parallelization is to relieve programmers from the tedious and error-prone manual parallelization process. Though the quality of automatic parallelization has improved in the past several decades, fully automatic parallelization of sequential programs by compilers remains a grand challenge due to its need for complex program analysis
and the unknown factors (such as input data range) during compilation.
The programming control structures on which autoparallelization places the most focus are loops, because, in general, most of the execution time of a program takes place inside some form of loop. A parallelizing compiler tries to split up a loop so that its iteration
s can be executed on separate processor
s concurrently.
The first pass of the compiler performs a data dependence analysis
of the loop to determine whether each iteration of the loop can be executed independently of the others. Data dependence can sometimes be dealt with, but it may incur additional overhead in the form of message passing
, synchronization of shared memory
, or some other method of processor communication.
The second pass attempts to justify the parallelization effort by comparing the theoretical execution time of the code after parallelization to the code's sequential execution time. Somewhat counterintuitively, code does not always benefit from parallel execution. The extra overhead that can be associated with using multiple processors can eat into the potential speedup of parallelized code.
code below can be auto-parallelized by a compiler because each iteration is independent of the others, and the final result of array
On the other hand, the following code cannot be auto-parallelized, because the value of
This does not mean that the code cannot be parallelized. Indeed, it is equivalent to
However, current parallelizing compilers are not usually capable of bringing out these parallelisms automatically, and it is questionable whether this code would benefit from parallelization in the first place.
s for automatic parallelization consider Fortran
programs, because Fortran makes stronger guarantees about aliasing
than languages such as C
. Typical examples are:
Source code
In computer science, source code is text written using the format and syntax of the programming language that it is being written in. Such a language is specially designed to facilitate the work of computer programmers, who specify the actions to be performed by a computer mostly by writing source...
into multi-threaded or vectorized (or even both) code in order to utilize multiple processors simultaneously in a shared-memory multiprocessor
Multiprocessor
Computer system having two or more processing units each sharing main memory and peripherals, in order to simultaneously process programs.Sometimes the term Multiprocessor is confused with the term Multiprocessing....
(SMP
Symmetric multiprocessing
In computing, symmetric multiprocessing involves a multiprocessor computer hardware architecture where two or more identical processors are connected to a single shared main memory and are controlled by a single OS instance. Most common multiprocessor systems today use an SMP architecture...
) machine. The goal of automatic parallelization is to relieve programmers from the tedious and error-prone manual parallelization process. Though the quality of automatic parallelization has improved in the past several decades, fully automatic parallelization of sequential programs by compilers remains a grand challenge due to its need for complex program analysis
Program analysis (computer science)
In computer science, program analysis is the process of automatically analysing the behavior of computer programs. Two main approaches in program analysis are static program analysis and dynamic program analysis...
and the unknown factors (such as input data range) during compilation.
The programming control structures on which autoparallelization places the most focus are loops, because, in general, most of the execution time of a program takes place inside some form of loop. A parallelizing compiler tries to split up a loop so that its iteration
Iteration
Iteration means the act of repeating a process usually with the aim of approaching a desired goal or target or result. Each repetition of the process is also called an "iteration," and the results of one iteration are used as the starting point for the next iteration.-Mathematics:Iteration in...
s can be executed on separate processor
Microprocessor
A microprocessor incorporates the functions of a computer's central processing unit on a single integrated circuit, or at most a few integrated circuits. It is a multipurpose, programmable device that accepts digital data as input, processes it according to instructions stored in its memory, and...
s concurrently.
Compiler parallelization analysis
The compiler usually conducts two passes of analysis before actual parallelization in order to determine the following:- Is it safe to parallelize the loop? Answering this question needs accurate dependence analysisDependence analysisIn compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2...
and alias analysisAlias analysisAlias analysis is a technique in compiler theory, used to determine if a storage location may be accessed in more than one way. Two pointers are said to be aliased if they point to the same location.... - Is it worthwhile to parallelize it? This answer requires a reliable estimation (modeling) of the program workload and the capacity of the parallel system.
The first pass of the compiler performs a data dependence analysis
Dependence analysis
In compiler theory, dependence analysis produces execution-order constraints between statements/instructions. Broadly speaking, a statement S2 depends on S1 if S1 must be executed before S2...
of the loop to determine whether each iteration of the loop can be executed independently of the others. Data dependence can sometimes be dealt with, but it may incur additional overhead in the form of message passing
Message passing
Message passing in computer science is a form of communication used in parallel computing, object-oriented programming, and interprocess communication. In this model, processes or objects can send and receive messages to other processes...
, synchronization of shared memory
Shared memory
In computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors...
, or some other method of processor communication.
The second pass attempts to justify the parallelization effort by comparing the theoretical execution time of the code after parallelization to the code's sequential execution time. Somewhat counterintuitively, code does not always benefit from parallel execution. The extra overhead that can be associated with using multiple processors can eat into the potential speedup of parallelized code.
Example
The FortranFortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
code below can be auto-parallelized by a compiler because each iteration is independent of the others, and the final result of array
z
will be correct regardless of the execution order of the other iterations.On the other hand, the following code cannot be auto-parallelized, because the value of
z(i)
depends on the result of the previous iteration, z(i-1)
.This does not mean that the code cannot be parallelized. Indeed, it is equivalent to
However, current parallelizing compilers are not usually capable of bringing out these parallelisms automatically, and it is questionable whether this code would benefit from parallelization in the first place.
Difficulties
Automatic parallelization by compilers or tools is very difficult due to the following reasons:- dependence analysis is hard for code using indirect addressing, pointers, recursion, and indirect function calls;
- loops have an unknown number of iterations;
- accesses to global resources are difficult to coordinate in terms of memory allocation, I/O, and shared variables.
Workaround
Due to the inherent difficulties in full automatic parallelization, several easier approaches exist to get a parallel program in higher quality. They are:- Allow programmers to add "hints" to their programs to guide compiler parallelization, such as HPFHigh Performance FortranHigh Performance Fortran is an extension of Fortran 90 with constructs that support parallel computing, published by the High Performance Fortran Forum . The HPFF was convened and chaired by Ken Kennedy of Rice University...
for distributed memoryDistributed memoryIn computer science, distributed memory refers to a multiple-processor computer system in which each processor has its own private memory. Computational tasks can only operate on local data, and if remote data is required, the computational task must communicate with one or more remote processors...
systems and OpenMPOpenMPOpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most processor architectures and operating systems, including Linux, Unix, AIX, Solaris, Mac OS X, and Microsoft Windows platforms...
or OpenHMPP for shared memoryShared memoryIn computing, shared memory is memory that may be simultaneously accessed by multiple programs with an intent to provide communication among them or avoid redundant copies. Depending on context, programs may run on a single processor or on multiple separate processors...
systems. - Build an interactive system between programmers and parallelizing tools/compilers. Notable examples are Vector FabricsVector Fabrics, B.V.Vector Fabrics, B.V. is a company that licenses tools to optimize parallel software for multicore systems.The tools analyze ANSI C programs and identify opportunities for multi-threading....
' vfAnalyst, SUIF Explorer (The Stanford University Intermediate Format compiler), the Polaris compiler, and ParaWise (formally CAPTools). - Hardware-supported speculative multithreadingSpeculative multithreadingSpeculative multithreading , also known as thread level speculation , is a dynamic parallelization technique that depends on out-of-order execution to achieve speedup on multiprocessor CPUs. It is a kind of speculative execution that occurs at the thread level as opposed to the instruction level....
.
Historical parallelizing compilers
Most research compilerCompiler
A compiler is a computer program that transforms source code written in a programming language into another computer language...
s for automatic parallelization consider Fortran
Fortran
Fortran is a general-purpose, procedural, imperative programming language that is especially suited to numeric computation and scientific computing...
programs, because Fortran makes stronger guarantees about aliasing
Aliasing (computing)
In computing, aliasing describes a situation in which a data location in memory can be accessed through different symbolic names in the program. Thus, modifying the data through one name implicitly modifies the values associated to all aliased names, which may not be expected by the programmer...
than languages such as C
C (programming language)
C is a general-purpose computer programming language developed between 1969 and 1973 by Dennis Ritchie at the Bell Telephone Laboratories for use with the Unix operating system....
. Typical examples are:
- Rice Fortran D compiler
- Vienna Fortran compiler
- Paradigm compiler
- Polaris compiler
- SUIF compiler