IS CS-2022S-02
题目来源:Problem 4 日期:2024-07-04 题目主题:CS-计算机体系结构
具体题目
- When the instruction may use the result generated by the instruction , we say there is a data dependency from the instruction to the instruction , and write . Give all data dependencies in the program below. The behavior of each instruction is described as a comment (the description after
#
) in the program. The processor which executes the program has 32 registers from to , and is the zero register that always keeps the value 0. In the comment, we represent a memory access to the addressaddr
on the memory asmemory[addr]
.
instruction 0) addi x3, x0, 64 # x3 <- x0 + 64
instruction 1) addi x4, x0, 0 # x4 <- x0 + 0
instruction 2) addi x5, x0, 0 # x5 <- x0 + 0
instruction 3) Loop: lw x6, 0(x4) # x6 <- memory[x4 + 0]
instruction 4) add x5, x5, x6 # x5 <- x5 + x6
instruction 5) addi x4, x4, 4 # x4 <- x4 + 4
instruction 6) blt x4, x3, Loop # if x4 < x3, goto Loop
instruction 7) sw x5, 4096(x0) # memory[x0 + 4096] <- x5
-
Consider a 5-stage pipeline processor that issues up to one instruction per clock cycle. The processor consists of 5 stages: instruction fetch (IF) stage, instruction decode and register fetch (ID) stage, execution (EX) stage, memory access (MA) stage, and register write back (WB) stage. The bit width of each register is 32. The processor has the instruction and data memories that can be accessed in one clock cycle, and load-word
lw
and store-wordsw
instructions do not stall on the MA stage. If there is a load-use data hazard forlw
instruction, the IF, ID, and EX stages are stalled for one clock cycle. The branch instructionblt
(branch if less than) stalls the IF and ID stages until the branch result is determined in the EX stage. Thus, the processor does not fetch subsequent instructions for two clock cycles after a branch instruction is fetched. Execution results in the EX stage and load results in the MA stage are properly forwarded to the EX stage.Explain what the load-use data hazard is. Explain also how load-use data hazards occur when the program in question (1) is executed on the processor.
-
Calculate the number of clock cycles required for the execution of the program in question (1) on the processor in question (2). Calculate also the average IPC (instructions per cycle) up to two places of decimals.
-
Using the program in question (1) and the processor in question (2) as an example, explain the mechanism and role of dynamic branch prediction.
正确解答
1. Data Dependencies
Data dependencies in the program:
- Instruction 3 uses
x4
from Instruction 1:3 → 1
- Instruction 4 uses
x5
from Instruction 2:4 → 2
- Instruction 4 uses
x6
from Instruction 3:4 → 3
- Instruction 6 uses
x4
from Instruction 5:6 → 5
- Instruction 6 uses
x3
from Instruction 0:6 → 0
- Instruction 7 uses
x5
from Instruction 4:7 → 4
- Instruction 3 in iteration uses
x4
from Instruction 5 in iteration :3 → 5
(loop dependency)
2. Load-use Data Hazard
A load-use data hazard occurs when a load instruction (lw
) is followed immediately by an instruction that uses the loaded value, causing a stall because the subsequent instruction needs the value that is still being fetched.
In the program, this hazard occurs between:
- Instruction 3:
lw x6, 0(x4)
- Instruction 4:
add x5, x5, x6
The processor stalls for one cycle to ensure the value in x6
is available for Instruction 4
.
3. Clock Cycles Calculation
We consider the following:
- The loop executes times, i.e., iterations.
- Each iteration includes Instructions 3 to 6.
Instructions 0, 1, 2:
- Each executed once.
- IF: 1 cycle
- ID: 1 cycle
- EX: 1 cycle
- MA: 1 cycle
- WB: 1 cycle
Instructions 3 to 6 per iteration:
- IF: 1 cycle
- ID: 1 cycle
- EX: 1 cycle
- MA: 1 cycle
- WB: 1 cycle
Stalls per iteration:
- Load-use hazard after Instruction 3: 1 cycle
- Branch hazard after Instruction 6: 2 cycles
Execution Cycles:
- Instructions 0, 1, 2: 3 cycles
- Each iteration: 7 cycles (5 cycles + 1 cycle load-use hazard + 2 cycles branch hazard)
- 16 iterations: cycles
- Final instruction 7: 5 cycles
Total clock cycles: cycles
Average IPC Calculation:
- Total instructions:
- IPC:
Pipeline Execution Example
Below is the Gantt chart for the first few cycles of the program execution:
Cycle | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Inst0 | IF | ID | EX | MA | WB | ||||||||||
Inst1 | IF | ID | EX | MA | WB | ||||||||||
Inst2 | IF | ID | EX | MA | WB | ||||||||||
Inst3 | IF | ID | EX | MA | WB | ST | ST | IF (After blt EX) | ID | EX | MA | WB | |||
Inst4 | ST (L-U Haz.) | IF | ID | EX | MA | WB | |||||||||
Inst5 | IF | ID | EX | MA | WB | ||||||||||
Inst6 | IF | ID | EX | MA | WB |
Key:
- IF: Instruction Fetch
- ID: Instruction Decode and Register Fetch
- EX: Execution
- MA: Memory Access
- WB: Write Back
- ST: Stall
4. Dynamic Branch Prediction
Dynamic Branch Prediction is a technique used in pipelined processors to predict the outcome of branch instructions to minimize control hazards and reduce pipeline stalls. It helps maintain a steady flow of instructions by making educated guesses about whether a branch will be taken.
Mechanism of Dynamic Branch Prediction
-
Branch History Table (BHT):
- Stores the outcomes of recent branch instructions (taken or not taken).
-
Speculative Execution:
- The processor speculatively executes instructions based on the prediction.
- If the prediction is correct, execution continues without interruption.
- If incorrect, the speculative instructions are discarded, causing a penalty.
Role of Dynamic Branch Prediction in the Given Program
In the given program, we have a branch instruction:
- Instruction 6:
blt x4, x3, Loop
- This instruction checks if
x4
is less thanx3
and branches back to the loop start iftrue
.
- This instruction checks if
Example with Branch Prediction:
-
Initial Iteration:
- The processor encounters
blt x4, x3, Loop
. - It uses the BHT to predict whether the branch will be taken.
- The processor encounters
-
Prediction:
- Assume the prediction is that the branch will be taken (likely due to the loop).
-
Speculative Execution:
- Instructions from
Loop:
are speculatively executed. - If the prediction is correct, there is no penalty, and the pipeline stays full.
- If the prediction is incorrect, the speculatively executed instructions are discarded, causing a 2-cycle penalty.
- Instructions from
Specific Numbers in the Context of the Program With Branch Prediction:
- Assume a 90% accuracy in branch prediction.
- Mispredictions = 16 iterations * 10% = 1.6 ≈ 2 mispredictions.
- Total penalty due to mispredictions = 2 mispredictions * 2 cycles = 4 cycles.
- Total clock cycles with prediction: cycles (adjusting for reduced penalties).
Impact on IPC With Prediction:
- Total cycles = 92.
- Total instructions = 68.
- IPC = .
Conclusion:
Dynamic branch prediction significantly improves the performance of the program by reducing the number of stall cycles caused by branch instructions. With a 90% accurate branch predictor, the total execution time is reduced to 92 cycles, resulting in a higher IPC and more efficient pipeline utilization.
知识点
解题技巧和信息
- Data Dependency: Identify dependencies by analyzing which instructions use the results of previous instructions.
- Load-use Hazard: Recognize hazards caused by consecutive load and use instructions.
重点词汇
- Data dependency 数据依赖
- Load-use data hazard 加载使用数据冒险
- Pipeline 流水线
- Clock cycle 时钟周期
- Instructions per cycle (IPC) 每周期指令数
- Dynamic branch prediction 动态分支预测
参考资料
- Bryant, R. E., & O’Hallaron, D. R. (2015). Computer Systems: A Programmer’s Perspective (3rd ed.). Pearson.
- Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach (6th ed.). Morgan Kaufmann.