IS CS-2024S-02

题目来源：Problem 2 日期：2024-08-16 题目主题：CS-Computer Architecture-Cache Memory

解题思路

这道题目考察处理器和缓存的工作原理，特别是直接映射缓存的命中率计算，以及指令执行的 IPC 计算。问题还要求提出硬件和软件层面的优化方案。我们将通过对程序执行过程中缓存的详细分析，计算缓存命中率和 IPC，并给出合理的优化建议。

Solution

Question 1: Cache Hit Rate

Cache Parameters:

Cache size: 256 bytes
Block size: 16 bytes
Number of cache lines: $256/16 = 16$ lines

Memory Address Breakdown:

Address size: 32 bits
Offset bits (block size of 16 bytes): 4 bits (i.e., $a_{3} a_{2} a_{1} a_{0}$ )
Index bits (16 lines): 4 bits (i.e., $a_{7} a_{6} a_{5} a_{4}$ )
Tag bits: Remaining bits (i.e., $a_{31} a_{30} \dots a_{8}$ )

Cache Access Analysis

Arrays A and B:

Array $A$ starts at address 0x1000.
Array $B$ starts at address 0x2000.
Both $A$ and $B$ map to the same index in the cache due to the identical lower address bits.

Detailed Analysis of the First Four Loops

Loop 1 (Iteration 0):
- Access $A [0]$ (address 0x1000), Cache Line 0: Miss
- Access $A [1]$ (address 0x1004), Cache Line 0: Hit
- Access $A [2]$ (address 0x1008), Cache Line 0: Hit
- Write $B [0]$ (address 0x2000), Cache Line 0: Miss
Loop 2 (Iteration 1):
- Access $A [1]$ (address 0x1004), Cache Line 0: Miss (replaced by B $0$ )
- Access $A [2]$ (address 0x1008), Cache Line 0: Hit
- Access $A [3]$ (address 0x100C), Cache Line 0: Hit
- Write $B [1]$ (address 0x2004), Cache Line 0: Miss
Loop 3 (Iteration 2):
- Access $A [2]$ (address 0x1008), Cache Line 0: Miss (replaced by B $1$ )
- Access $A [3]$ (address 0x100C), Cache Line 0: Hit
- Access $A [4]$ (address 0x1010), Cache Line 1: Miss (crosses to a new cache line)
- Write $B [2]$ (address 0x2008), Cache Line 0: Miss
Loop 4 (Iteration 3):
- Access $A [3]$ (address 0x100C), Cache Line 0: Miss (replaced by B[2])
- Access $A [4]$ (address 0x1010), Cache Line 1: Hit
- Access $A [5]$ (address 0x1014), Cache Line 1: Hit
- Write $B [3]$ (address 0x200C), Cache Line 0: Miss

Summary of the First Four Loops

Total Accesses: 16 accesses (4 per loop)
Total Cache Misses: 9 misses
Total Cache Hits: 7 hits
Hit Rate for the First Four Loops:

Hit Rate = \frac{7}{16} = 0.4375

Analysis of Subsequent Loops

Assuming the same pattern continues, the subsequent 96 loops (24 groups of 4 loops) will follow the same pattern:

Loop 1 in each group: 3 hits, 1 miss
Loop 2 in each group: 2 hits, 2 misses
Loop 3 in each group: 1 hit, 3 misses
Loop 4 in each group: 2 hits, 2 misses

Total Cache Misses and Hits for All Loops

Subsequent 96 Loops:

Loop 1: $24 \times 1 = 24$ misses
Loop 2: $24 \times 2 = 48$ misses
Loop 3: $24 \times 3 = 72$ misses
Loop 4: $24 \times 2 = 48$ misses

Total Cache Misses for 96 Loops: $24 + 48 + 72 + 48 = 192$ misses

Total Cache Hits for 96 Loops: $384 - 192 = 192$ hits

Final Cache Hit Rate Calculation

Total Accesses: $16 + 384 = 400$ accesses
Total Cache Misses: $9 + 192 = 201$ misses
Total Cache Hits: $400 - 201 = 199$ hits
Final Hit Rate:

Hit Rate = \frac{199}{400} = 0.4975

Conclusion

The final cache hit rate for the entire program is approximately 49.75%.

Question 2: IPC (Instructions Per Cycle)

Given the cache hit rate of approximately 49.75%:

Instruction Breakdown:
- addi: 1 cycle
- lw/sw: 1 cycle (hit) or 4 cycles (miss)
- div: 4 cycles
- blt: 1 cycle
Cycle Calculation per Iteration:
- Total Cycles for lw instructions: $3 \times 0.4975 \times 1 + 3 \times 0.5025 \times 4 \approx 7.5075$ cycles
- Total Cycles for sw instruction: $1 \times 0.4975 \times 1 + 1 \times 0.5025 \times 4 = 2.5075$ cycles
- Other instructions: 7 cycles (for addi, add, div, blt)
Total Cycles per Iteration:

7.5075 + 2.5075 + 7 = 17.015 cycles per iteration

Total Cycles for 400 Iterations:

400 \times 17.015 = 6806 cycles

Total Instructions:
- 11 instructions per iteration × 400 iterations = 4400 instructions
IPC:

IPC = \frac{4400}{6806} \approx 0.647

Question 3: Hardware Modification to Improve Cache Hit Rate

Modification: Use a 2-way set associative cache instead of a direct-mapped cache.

Reason: A 2-way set associative cache allows each index to map to two cache lines, which would enable both arrays $A$ and $B$ to coexist in the cache without conflict. This reduces the conflict misses, significantly improving the hit rate.

Question 4: Software-Level Optimization

Optimization: Array padding.

Example: Introduce padding in array $B$ to offset its starting address such that it does not map to the same cache lines as array $A$ .

Explanation: By adding a small padding (e.g., 16 bytes) to the start of array $B$ , the memory addresses of $B$ ‘s elements will map to different cache lines than the corresponding elements in $A$ . This reduces the conflict between $A$ and $B$ , leading to fewer cache misses and a higher hit rate.

知识点

CacheMemory DirectMappedCache SetAssociativeCache InstructionLevelParallelism ArrayPadding

难点思路

本题的难点在于正确分析缓存访问的冲突问题，以及提出有效的硬件和软件优化方案来减少这些冲突。直接映射缓存的冲突失效是此题的核心问题。

解题技巧和信息

Direct-Mapped Cache: 在处理大量连续数据时容易产生冲突失效，集合关联缓存可以有效缓解这一问题。
IPC Calculation: 注意由于缓存失效引起的指令执行延迟对 IPC 的影响。
Array Padding: 是解决数组访问冲突问题的有效软件优化方法。

重点词汇

Cache Line 缓存行
Direct-Mapped Cache 直接映射缓存
Hit Rate 命中率
Instruction Per Cycle (IPC) 每周期指令数
Array Padding 数组填充

参考资料

Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach. Chapter 2.
Patterson, D. A., & Hennessy, J. L. (2013). Computer Organization and Design: The Hardware/Software Interface. Chapter 5.

知识点

CacheMemory DirectMappedCache SetAssociativeCache InstructionLevelParallelism LoopUnrolling

难点思路

本题的难点在于对缓存操作的细致分析，特别是如何判断缓存命中与缺失，以及考虑硬件和软件的优化策略。

解题技巧和信息

Direct-Mapped Cache: 一般容易产生冲突失效，可以考虑增加集合关联度来减小失效率。
IPC Calculation: 注意每种指令在不同缓存命中状态下的执行周期。
Loop Unrolling: 是一种常见的优化手段，通过减少循环次数和增加每次循环的工作量来提高性能。

重点词汇

Cache Line 缓存行
Direct-Mapped Cache 直接映射缓存
Hit Rate 命中率
Instruction Per Cycle (IPC) 每周期指令数
Loop Unrolling 循环展开

参考资料

Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach. Chapter 2.
Patterson, D. A., & Hennessy, J. L. (2013). Computer Organization and Design: The Hardware/Software Interface. Chapter 5.

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

Explorer

IS_CS-2024S-02

IS CS-2024S-02

解题思路

Solution

Question 1: Cache Hit Rate

Cache Access Analysis

Detailed Analysis of the First Four Loops

Summary of the First Four Loops

Analysis of Subsequent Loops

Total Cache Misses and Hits for All Loops

Final Cache Hit Rate Calculation

Conclusion

Question 2: IPC (Instructions Per Cycle)

Question 3: Hardware Modification to Improve Cache Hit Rate

Question 4: Software-Level Optimization

知识点

难点思路

解题技巧和信息

重点词汇

参考资料

知识点

难点思路

解题技巧和信息

重点词汇

参考资料

Graph View

Table of Contents

Backlinks