IS CS-2022S-02

题目来源：Problem 4 日期：2024-07-04 题目主题：CS-计算机体系结构

具体题目

When the instruction $j$ may use the result generated by the instruction $i$ , we say there is a data dependency from the instruction $j$ to the instruction $i$ , and write $j \to i$ . Give all data dependencies in the program below. The behavior of each instruction is described as a comment (the description after #) in the program. The processor which executes the program has 32 registers from $x 0$ to $x 31$ , and $x 0$ is the zero register that always keeps the value 0. In the comment, we represent a memory access to the address addr on the memory as memory[addr].

instruction 0) addi x3, x0, 64  # x3 <- x0 + 64
instruction 1) addi x4, x0, 0   # x4 <- x0 + 0
instruction 2) addi x5, x0, 0   # x5 <- x0 + 0
instruction 3) Loop: lw x6, 0(x4)  # x6 <- memory[x4 + 0]
instruction 4) add x5, x5, x6   # x5 <- x5 + x6
instruction 5) addi x4, x4, 4   # x4 <- x4 + 4
instruction 6) blt x4, x3, Loop  # if x4 < x3, goto Loop
instruction 7) sw x5, 4096(x0)  # memory[x0 + 4096] <- x5

Consider a 5-stage pipeline processor that issues up to one instruction per clock cycle. The processor consists of 5 stages: instruction fetch (IF) stage, instruction decode and register fetch (ID) stage, execution (EX) stage, memory access (MA) stage, and register write back (WB) stage. The bit width of each register is 32. The processor has the instruction and data memories that can be accessed in one clock cycle, and load-word lw and store-word sw instructions do not stall on the MA stage. If there is a load-use data hazard for lw instruction, the IF, ID, and EX stages are stalled for one clock cycle. The branch instruction blt (branch if less than) stalls the IF and ID stages until the branch result is determined in the EX stage. Thus, the processor does not fetch subsequent instructions for two clock cycles after a branch instruction is fetched. Execution results in the EX stage and load results in the MA stage are properly forwarded to the EX stage.

Explain what the load-use data hazard is. Explain also how load-use data hazards occur when the program in question (1) is executed on the processor.
Calculate the number of clock cycles required for the execution of the program in question (1) on the processor in question (2). Calculate also the average IPC (instructions per cycle) up to two places of decimals.
Using the program in question (1) and the processor in question (2) as an example, explain the mechanism and role of dynamic branch prediction.

正确解答

1. Data Dependencies

Data dependencies in the program:

Instruction 3 uses x4 from Instruction 1: 3 → 1
Instruction 4 uses x5 from Instruction 2: 4 → 2
Instruction 4 uses x6 from Instruction 3: 4 → 3
Instruction 6 uses x4 from Instruction 5: 6 → 5
Instruction 6 uses x3 from Instruction 0: 6 → 0
Instruction 7 uses x5 from Instruction 4: 7 → 4
Instruction 3 in iteration $n$ uses x4 from Instruction 5 in iteration $n - 1$ : 3 → 5 (loop dependency)

2. Load-use Data Hazard

A load-use data hazard occurs when a load instruction (lw) is followed immediately by an instruction that uses the loaded value, causing a stall because the subsequent instruction needs the value that is still being fetched.

In the program, this hazard occurs between:

Instruction 3: lw x6, 0(x4)
Instruction 4: add x5, x5, x6

The processor stalls for one cycle to ensure the value in x6 is available for Instruction 4.

3. Clock Cycles Calculation

We consider the following:

The loop executes $(x 3 - x 4) /4$ times, i.e., $(64 - 0) /4 = 16$ iterations.
Each iteration includes Instructions 3 to 6.

Instructions 0, 1, 2:

Each executed once.
IF: 1 cycle
ID: 1 cycle
EX: 1 cycle
MA: 1 cycle
WB: 1 cycle

Instructions 3 to 6 per iteration:

IF: 1 cycle
ID: 1 cycle
EX: 1 cycle
MA: 1 cycle
WB: 1 cycle

Stalls per iteration:

Load-use hazard after Instruction 3: 1 cycle
Branch hazard after Instruction 6: 2 cycles

Execution Cycles:

Instructions 0, 1, 2: 3 cycles
Each iteration: 7 cycles (5 cycles + 1 cycle load-use hazard + 2 cycles branch hazard)
16 iterations: $16 \times 7 = 112$ cycles
Final instruction 7: 5 cycles

Total clock cycles: $3 + 112 + 5 = 120$ cycles

Average IPC Calculation:

Total instructions: $3 + 16 \times 4 + 1 = 68$
IPC: $\frac{68}{120} \approx 0.57$

Pipeline Execution Example

Below is the Gantt chart for the first few cycles of the program execution:

Cycle	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
Inst0	IF	ID	EX	MA	WB
Inst1		IF	ID	EX	MA	WB
Inst2			IF	ID	EX	MA	WB
Inst3				IF	ID	EX	MA	WB	ST	ST	IF (After `blt` EX)	ID	EX	MA	WB
Inst4					ST (L-U Haz.)	IF	ID	EX	MA	WB
Inst5							IF	ID	EX	MA	WB
Inst6								IF	ID	EX	MA	WB

Key:

IF: Instruction Fetch
ID: Instruction Decode and Register Fetch
EX: Execution
MA: Memory Access
WB: Write Back
ST: Stall

4. Dynamic Branch Prediction

Dynamic Branch Prediction is a technique used in pipelined processors to predict the outcome of branch instructions to minimize control hazards and reduce pipeline stalls. It helps maintain a steady flow of instructions by making educated guesses about whether a branch will be taken.

Mechanism of Dynamic Branch Prediction

Branch History Table (BHT):
- Stores the outcomes of recent branch instructions (taken or not taken).
Speculative Execution:
- The processor speculatively executes instructions based on the prediction.
- If the prediction is correct, execution continues without interruption.
- If incorrect, the speculative instructions are discarded, causing a penalty.

Role of Dynamic Branch Prediction in the Given Program

In the given program, we have a branch instruction:

Instruction 6: blt x4, x3, Loop
- This instruction checks if x4 is less than x3 and branches back to the loop start if true.

Example with Branch Prediction:

Initial Iteration:
- The processor encounters blt x4, x3, Loop.
- It uses the BHT to predict whether the branch will be taken.
Prediction:
- Assume the prediction is that the branch will be taken (likely due to the loop).
Speculative Execution:
- Instructions from Loop: are speculatively executed.
- If the prediction is correct, there is no penalty, and the pipeline stays full.
- If the prediction is incorrect, the speculatively executed instructions are discarded, causing a 2-cycle penalty.

Specific Numbers in the Context of the Program With Branch Prediction:

Assume a 90% accuracy in branch prediction.
Mispredictions = 16 iterations * 10% = 1.6 ≈ 2 mispredictions.
Total penalty due to mispredictions = 2 mispredictions * 2 cycles = 4 cycles.
Total clock cycles with prediction: $120 - 32 + 4 = 92$ cycles (adjusting for reduced penalties).

Impact on IPC With Prediction:

Total cycles = 92.
Total instructions = 68.
IPC = $\frac{68}{92} \approx 0.74$ .

Conclusion:

Dynamic branch prediction significantly improves the performance of the program by reducing the number of stall cycles caused by branch instructions. With a 90% accurate branch predictor, the total execution time is reduced to 92 cycles, resulting in a higher IPC and more efficient pipeline utilization.

知识点

数据冒险动态分支预测流水线

数据冒险 Data Hazards

分支预测 Branch Prediction

解题技巧和信息

Data Dependency: Identify dependencies by analyzing which instructions use the results of previous instructions.
Load-use Hazard: Recognize hazards caused by consecutive load and use instructions.

重点词汇

Data dependency 数据依赖
Load-use data hazard 加载使用数据冒险
Pipeline 流水线
Clock cycle 时钟周期
Instructions per cycle (IPC) 每周期指令数
Dynamic branch prediction 动态分支预测

参考资料

Bryant, R. E., & O’Hallaron, D. R. (2015). Computer Systems: A Programmer’s Perspective (3rd ed.). Pearson.
Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach (6th ed.). Morgan Kaufmann.

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer