• Topics: bypassing, deeper pipelines, control hazards

## **RISC/CISC** Loads/Stores

Registers and memory Complex and reduced instrs Format of a load/store

|                            | RR                        | ALU   | DM       | RW    |
|----------------------------|---------------------------|-------|----------|-------|
| ADD R3 $\leftarrow$ R1, R2 | Rd R1,R2                  | R1+R2 |          | Wr R3 |
| BEZ R1, [R5]<br>Cor        | Rd R1, R5<br>npare, Set I |       |          |       |
| LD R6 ← 8[R3]              | Rd R3                     | R3+8  | Get data | Wr R6 |
| ST R6 → 8[R3]              | Rd R3,R6                  | R3+8  | Wr data  |       |

- For the following code sequence, show how the instrs flow through the pipeline:
  - ADD R3  $\leftarrow$  R1, R2
  - LD R7 ← 8[R6]
  - ST R9  $\rightarrow$  4[R8]
  - BEZ R4, [R5]



- For the following code sequence, show how the instrs flow through the pipeline:
  - ADD R3  $\leftarrow$  R1, R2 LD R7  $\leftarrow$  8[R6] ST R9  $\rightarrow$  4[R8] BEZ R4, [R5]



#### Hazards

- Structural hazards: different instructions in different stages (or the same stage) conflicting for the same resource
- Data hazards: an instruction cannot continue because it needs a value that has not yet been generated by an earlier instruction
- Control hazard: fetch cannot continue because it does not know the outcome of an earlier branch – special case of a data hazard – separate category because they are treated in different ways

- Example: a unified instruction and data cache → stage 4 (MEM) and stage 1 (IF) can never coincide
- The later instruction and all its successors are delayed until a cycle is found when the resource is free → these are pipeline bubbles
- Structural hazards are easy to eliminate increase the number of resources (for example, implement a separate instruction and data cache)

Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9



Show the instruction occupying each stage in each cycle (no bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R7+R8→R9



## Bypassing: 5-Stage Pipeline



Source: H&P textbook  $^{10}$ 

Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.



Show the instruction occupying each stage in each cycle (with bypassing) if I1 is R1+R2→R3 and I2 is R3+R4→R5 and I3 is R3+R8→R9.
 Identify the input latch for each input operand.



#### **Pipeline Implementation**

- Signals for the muxes have to be generated some of this can happen during ID
- Need look-up tables in decode stage to identify situations that merit bypassing/stalling
  - the number of inputs to the muxes goes up



© 2007 Elsevier, Inc. All rights reserved.

• For the 5-stage pipeline (RR and RW take half a cycle)



- For the following pairs of instructions, how many stalls will the 2<sup>nd</sup> instruction experience (with and without bypassing)?
  - ADD R3  $\leftarrow$  R1+R2 ADD R5  $\leftarrow$  R3+R4
  - LD R2 ← [R1]
     ADD R4 ← R2+R3
  - LD R2 ← [R1]
    - SD R3  $\rightarrow$  [R2]
  - LD R2 ← [R1]
     SD R2 → [R3]

• For the 5-stage pipeline (RR and RW take half a cycle)



- For the following pairs of instructions, how many stalls will the 2<sup>nd</sup> instruction experience (with and without bypassing)?
  - ADD R3 ← R1+R2
     ADD R5 ← R3+R4
  - LD R2 ← [R1]
     ADD R4 ← R2+R3
  - LD R2  $\leftarrow$  [R1] SD R3  $\rightarrow$  [R2]
  - LD R2  $\leftarrow$  [R1] SD R2  $\rightarrow$  [R3]

without: 2 with: 0
without: 2 with: 1
without: 2 with: 1
without: 2 with: 0



 For the 5-stage pipeline, bypassing can eliminate delays between the following example pairs of instructions: add/sub R1, R2, R3 add/sub/lw/sw R4, R1, R5

lw R1, 8(R2) sw R1, 4(R3)

• The following pairs of instructions will have intermediate stalls:

lw R1, 8(R2) add/sub/lw R3, R1, R4 or sw R3, 8(R1)

fmul F1, F2, F3 fadd F5, F1, F4