### Lecture 7

### **Instruction Scheduling**

- I Basic Block Scheduling
- II Global Scheduling (for Non-Numeric Code)

Reading: Chapter 10.3 - 10.4

Advanced Compilers

M. Lam

## I. Scheduling Constraints

### • Data dependences

• The operations must generate the same results as the corresponding ones in the original program.

### • Control dependences

• All the operations executed in the original program must be executed in the optimized program

#### • Resource constraints

• No over-subscription of resources.

- Must maintain order of accesses to potentially same locations
  - True dependence: write -> read (RAW hazard)

a = ... = a

• Output dependence: write -> write (WAW hazard)

a = ... a = ...

Anti-dependence: read -> write (WAR hazard)

= a a =

- Data Dependence Graph
  - · Nodes: operations
  - Edges: n<sub>1</sub> -> n<sub>2</sub> if n<sub>2</sub> is data dependent on n<sub>1</sub> labeled by the execution length of n<sub>1</sub>

Advanced Compilers

L7: Instruction Scheduling

### **Analysis on Memory Variables**

3

• Undecidable in general

read x
read y
A[x] = ...
... = A[y]

Two memory accesses can potentially be the same unless proven otherwise

#### • Classes of analysis

- simple: base+offset1 = base+offset2?
- "data dependence analysis": Array accesses whose indices are affine expressions of loop indices

A[2i] = A[2i+1]?

- interprocedural analysis: global = parameter?
- pointer analysis: pointer1 = pointer2?

### · Data dependence analysis is useful for many other purposes

Advanced Compilers

• Each instruction type has a resource reservation table



## **Example of a Machine Model**

- · Each instruction can execute 2 operations
- 1 ALU operation or branch operation Op dst, src1, src2 executes in 1 clock
- 1 load or store operation LD dst, addr ST src, addr result is available in 2 clocks pipelined: can issue LD next clock executes in 1 clock cycle





### With Resource Constraints

- NP-complete in general => Heuristics time!
- List Scheduling

READY = nodes with 0 predecessors

Loop until READY is empty {

Let n be the node in READY with highest priority

Schedule **n** in the earliest slot

that satisfies precedence + resource constraints

8

Update predecessor count of **n**'s successor nodes Update READY

}

### • Scope: DAGs

- · Schedules operations in topological order
- Never backtracks

### • Variations

- Priority function for node **n** 
  - delay: max delay slots from **n** to any node
  - critical path: max clocks from n to any node
  - resource requirements
  - source order

Advanced Compilers

L7: Instruction Scheduling

# II. Introduction to Global Scheduling

9





Advanced Compilers

L7: Instruction Scheduling

# Terminology

11

Control equivalence

 Two operations o<sub>1</sub> and o<sub>2</sub> are control equivalent if o<sub>1</sub> is executed if and only if o<sub>2</sub> is executed.

Control dependence

 An op o<sub>2</sub> is control dependent on op o<sub>1</sub> if the execution of o<sub>2</sub> depends on the outcome of o<sub>1</sub>.

### Speculation

- An operation o<sub>1</sub> is speculatively executed if it is executed before all the operations it control-dependent upon have been executed.
- · No exception, satisfy data dependences



## **Code Motions**



Goal: Shorten execution time probabilistically

Moving instructions up

- Move instruction to a cut set (from entry)
- Speculation: even when not anticipated.



### Moving instructions down

- Move instruction to a cut set (from exit)
- May execute extra instruction
- Can duplicate code

Advanced Compilers

13

L7: Instruction Scheduling

# A Note on Data Dependences



- Lots of data dependences
- Key performance factor: memory latencies
- Move memory fetches up
  - Speculative memory fetches can be expensive
- Control-intensive: get execution profile
  - Static estimation
    - Innermost loops are frequently executed: back edges are likely to be taken
    - Edges that branch to exit and exception routines are not likely to be taken
  - Dynamic profiling
    - · Instrument code and measure using representative data

Advanced Compilers

L7: Instruction Scheduling

# A Basic Global Scheduling Algorithm

15

- Schedule innermost loops first
- Only upward code motion
- No creation of copies
- Only one level of speculation

### • A region in a control flow graph is

- a set of basic blocks and all the edges connecting these blocks,
- such that control from outside the region must enter through a single entry block.

### • A function is represented as a hierarchy of regions

- The whole control flow graph is a region
- Each natural loop in the flow graph is a region
- Natural loops are hierarchically nested

### · Schedule regions from inner to outer

- treat inner loop as a black box unit, can schedule around it but not into it
- ignore all the loop back edges --> get an acyclic graph

Advanced Compilers

17

L7: Instruction Scheduling

## Algorithm

Compute data dependences; For each region from inner to outer { For each basic block B in prioritized topological order { CandBlocks = ControlEquiv{B}  $\cup$ Dominated-Successors{ControlEquiv{B}}; CandInsts = ready operations in CandBlocks; For (t = 0, 1, ... until all operations from B are scheduled { For (n in CandInst in priority order) { if (n has no resource conflicts at time t) { S(n) = < B, t > Update resource commitments Update data dependences } Update CandInsts; }}}

• Priority functions

• Non-speculative before speculative

- Prepass before scheduling: loop unrolling
- Especially important to move operation up loop back edges



Advanced Compilers

L7: Instruction Scheduling

## Summary

19

- List scheduling
- Global scheduling
  - Legal code motions
  - Heuristics