Instruction scheduling
Encyclopedia
In computer science
, instruction scheduling is a compiler optimization
used to improve instruction-level parallelism, which improves performance on machines with instruction pipeline
s. Put more simply, without changing the meaning of the code, it tries to
The pipeline stalls can be caused by structural hazards (processor resource limit), data hazards (output of one instruction needed by another instruction) and control hazards (branching).
. In order to determine whether rearranging the block's instructions in a certain way preserves the behavior of that block, we need the concept of a data dependency. There are three types of dependencies, which also happen to be the three data hazards:
Technically, there is a fourth type, Read after Read (RAR or "Input"): Both instructions read the same location. Input dependence does not constrain the execution order of two statements, but it is useful in scalar replacement of array elements.
To make sure we respect the three types of dependencies, we construct a dependency graph, which is a directed graph
where each vertex is an instruction and there is an edge from I1 to I2 if I1 must come before I2 due to a dependency. If loop-carried dependencies are left out, the dependency graph is a directed acyclic graph
. Then, any topological sort of this graph is a valid instruction schedule. The edges of the graph are usually labelled with the latency of the dependence. This is the number of clock cycles that needs to elapse before the pipeline can proceed with the target instruction without stalling.
To arrive at a good schedule, stalls should be prevented. This is determined by the choice of the next instruction to be scheduled. A number of heuristics are in common use:
or both before and after it. The advantage of doing it before register allocation is that this results in maximum parallelism. The disadvantage of doing it before register allocation is that this can result in the register allocator needing to use a number of registers exceeding those available. This will cause spill/fill code to be introduced which will reduce the performance of the section of code in question.
If the architecture being scheduled has instruction sequences that have potentially illegal combinations (due to a lack of instruction interlocks) the instructions must be scheduled after register allocation. This second scheduling pass will also improve the placement of the spill/fill code.
If scheduling is only done after register allocation then there will be false dependencies introduced by the register allocation that will limit the amount of instruction motion possible by the scheduler.
Computer science
Computer science or computing science is the study of the theoretical foundations of information and computation and of practical techniques for their implementation and application in computer systems...
, instruction scheduling is a compiler optimization
Compiler optimization
Compiler optimization is the process of tuning the output of a compiler to minimize or maximize some attributes of an executable computer program. The most common requirement is to minimize the time taken to execute a program; a less common one is to minimize the amount of memory occupied...
used to improve instruction-level parallelism, which improves performance on machines with instruction pipeline
Instruction pipeline
An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput ....
s. Put more simply, without changing the meaning of the code, it tries to
- Avoid pipeline stalls by rearranging the order of instructions.
- Avoid illegal or semantically ambiguous operations (typically involving subtle instruction pipeline timing issues or non-interlocked resources.)
The pipeline stalls can be caused by structural hazards (processor resource limit), data hazards (output of one instruction needed by another instruction) and control hazards (branching).
Data hazards
Instruction scheduling is typically done on a single basic blockBasic block
In computing, a basic block is a portion of the code within a program with certain desirable properties that make it highly amenable to analysis. Compilers usually decompose programs into their basic blocks as a first step in the analysis process...
. In order to determine whether rearranging the block's instructions in a certain way preserves the behavior of that block, we need the concept of a data dependency. There are three types of dependencies, which also happen to be the three data hazards:
- Read after Write (RAW or "True"): Instruction 1 writes a value used later by Instruction 2. Instruction 1 must come first, or Instruction 2 will read the old value instead of the new.
- Write after Read (WAR or "Anti"): Instruction 1 reads a location that is later overwritten by Instruction 2. Instruction 1 must come first, or it will read the new value instead of the old.
- Write after Write (WAW or "Output"): Two instructions both write the same location. They must occur in their original order.
Technically, there is a fourth type, Read after Read (RAR or "Input"): Both instructions read the same location. Input dependence does not constrain the execution order of two statements, but it is useful in scalar replacement of array elements.
To make sure we respect the three types of dependencies, we construct a dependency graph, which is a directed graph
Directed graph
A directed graph or digraph is a pair G= of:* a set V, whose elements are called vertices or nodes,...
where each vertex is an instruction and there is an edge from I1 to I2 if I1 must come before I2 due to a dependency. If loop-carried dependencies are left out, the dependency graph is a directed acyclic graph
Directed acyclic graph
In mathematics and computer science, a directed acyclic graph , is a directed graph with no directed cycles. That is, it is formed by a collection of vertices and directed edges, each edge connecting one vertex to another, such that there is no way to start at some vertex v and follow a sequence of...
. Then, any topological sort of this graph is a valid instruction schedule. The edges of the graph are usually labelled with the latency of the dependence. This is the number of clock cycles that needs to elapse before the pipeline can proceed with the target instruction without stalling.
Algorithms
The simplest algorithm to find a topological sort is frequently used and is known as list scheduling. Conceptually, it repeatedly selects a source of the dependency graph, appends it to the current instruction schedule and removes it from the graph. This may cause other vertices to be sources, which will then also be considered for scheduling. The algorithm terminates if the graph is empty.To arrive at a good schedule, stalls should be prevented. This is determined by the choice of the next instruction to be scheduled. A number of heuristics are in common use:
- The processor resources used by the already scheduled instructions are recorded. If a candidate uses a resource that is occupied, its priority will drop.
- If a candidate is scheduled closer to its predecessors than the associated latency its priority will drop.
- If a candidate lies on the critical path of the graph, its priority will rise. This heuristic provides some form of look-ahead in an otherwise local decision process.
- If choosing a candidate will create many new sources, its priority will rise. This heuristic tends to generate more freedom for the scheduler.
The phase order of Instruction Scheduling
Instruction scheduling may be done either before or after register allocationRegister allocation
In compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers...
or both before and after it. The advantage of doing it before register allocation is that this results in maximum parallelism. The disadvantage of doing it before register allocation is that this can result in the register allocator needing to use a number of registers exceeding those available. This will cause spill/fill code to be introduced which will reduce the performance of the section of code in question.
If the architecture being scheduled has instruction sequences that have potentially illegal combinations (due to a lack of instruction interlocks) the instructions must be scheduled after register allocation. This second scheduling pass will also improve the placement of the spill/fill code.
If scheduling is only done after register allocation then there will be false dependencies introduced by the register allocation that will limit the amount of instruction motion possible by the scheduler.
Types of Instruction Scheduling
There are several types of instruction scheduling:- Local (Basic Block) Scheduling: instructions can't move across basic block boundaries.
- Global scheduling: instructions can move across basic block boundaries.
- Modulo Scheduling: another name for software pipeliningSoftware pipeliningIn computer science, software pipelining is a technique used to optimize loops, in a manner that parallels hardware pipelining. Software pipelining is a type of out-of-order execution, except that the reordering is done by a compiler instead of the processor...
, which is a form of instruction scheduling that interleaves different iterations of a loop. - Trace schedulingTrace schedulingTrace scheduling is an optimization technique used in compilers for computer programs.A compiler often can, by rearranging its generated machine instructions for faster execution, improve program performance...
: the first practical approach for global scheduling, trace scheduling tries to optimize the control flow path that is executed most often. - Superblock scheduling: a simplified form of trace scheduling which does not attempt to merge control flow paths at trace "side entrances". Instead, code can be implemented by more than one schedule, vastly simplifying the code generator.
See also
- Code generation
- Register allocationRegister allocationIn compiler optimization, register allocation is the process of assigning a large number of target program variables onto a small number of CPU registers...
- Instruction selectionInstruction selectionIn computer science, instruction selection is the stage of a compiler backend that transforms its tree-based middle-level intermediate representation into a low-level IR very close to its final target language...
- Instruction unitInstruction unitThe instruction unit in a central processing unit is responsible for organising for program instructions to be fetched from memory, and executed, in an appropriate order...
- Branch predicationBranch predicationBranch predication is a strategy in computer architecture design for mitigating the costs usually associated with conditional branches, particularly branches to short sections of code...