IS CS-2024S-02
题目来源:Problem 2 日期:2024-08-16 题目主题:CS-Computer Architecture-Cache Memory
解题思路
这道题目考察处理器和缓存的工作原理,特别是直接映射缓存的命中率计算,以及指令执行的 IPC 计算。问题还要求提出硬件和软件层面的优化方案。我们将通过对程序执行过程中缓存的详细分析,计算缓存命中率和 IPC,并给出合理的优化建议。
Solution
Question 1: Cache Hit Rate
Cache Parameters:
- Cache size: 256 bytes
- Block size: 16 bytes
- Number of cache lines: lines
Memory Address Breakdown:
- Address size: 32 bits
- Offset bits (block size of 16 bytes): 4 bits (i.e., )
- Index bits (16 lines): 4 bits (i.e., )
- Tag bits: Remaining bits (i.e., )
Cache Access Analysis
Arrays A and B:
- Array starts at address 0x1000.
- Array starts at address 0x2000.
- Both and map to the same index in the cache due to the identical lower address bits.
Detailed Analysis of the First Four Loops
-
Loop 1 (Iteration 0):
- Access (address 0x1000), Cache Line 0: Miss
- Access (address 0x1004), Cache Line 0: Hit
- Access (address 0x1008), Cache Line 0: Hit
- Write (address 0x2000), Cache Line 0: Miss
-
Loop 2 (Iteration 1):
- Access (address 0x1004), Cache Line 0: Miss (replaced by B)
- Access (address 0x1008), Cache Line 0: Hit
- Access (address 0x100C), Cache Line 0: Hit
- Write (address 0x2004), Cache Line 0: Miss
-
Loop 3 (Iteration 2):
- Access (address 0x1008), Cache Line 0: Miss (replaced by B)
- Access (address 0x100C), Cache Line 0: Hit
- Access (address 0x1010), Cache Line 1: Miss (crosses to a new cache line)
- Write (address 0x2008), Cache Line 0: Miss
-
Loop 4 (Iteration 3):
- Access (address 0x100C), Cache Line 0: Miss (replaced by B[2])
- Access (address 0x1010), Cache Line 1: Hit
- Access (address 0x1014), Cache Line 1: Hit
- Write (address 0x200C), Cache Line 0: Miss
Summary of the First Four Loops
- Total Accesses: 16 accesses (4 per loop)
- Total Cache Misses: 9 misses
- Total Cache Hits: 7 hits
- Hit Rate for the First Four Loops:
Analysis of Subsequent Loops
Assuming the same pattern continues, the subsequent 96 loops (24 groups of 4 loops) will follow the same pattern:
- Loop 1 in each group: 3 hits, 1 miss
- Loop 2 in each group: 2 hits, 2 misses
- Loop 3 in each group: 1 hit, 3 misses
- Loop 4 in each group: 2 hits, 2 misses
Total Cache Misses and Hits for All Loops
Subsequent 96 Loops:
- Loop 1: misses
- Loop 2: misses
- Loop 3: misses
- Loop 4: misses
Total Cache Misses for 96 Loops: misses
Total Cache Hits for 96 Loops: hits
Final Cache Hit Rate Calculation
- Total Accesses: accesses
- Total Cache Misses: misses
- Total Cache Hits: hits
- Final Hit Rate:
Conclusion
- The final cache hit rate for the entire program is approximately 49.75%.
Question 2: IPC (Instructions Per Cycle)
Given the cache hit rate of approximately 49.75%:
-
Instruction Breakdown:
addi
: 1 cyclelw/sw
: 1 cycle (hit) or 4 cycles (miss)div
: 4 cyclesblt
: 1 cycle
-
Cycle Calculation per Iteration:
- Total Cycles for
lw
instructions: cycles - Total Cycles for
sw
instruction: cycles - Other instructions: 7 cycles (for addi, add, div, blt)
- Total Cycles for
-
Total Cycles per Iteration:
- Total Cycles for 400 Iterations:
-
Total Instructions:
- 11 instructions per iteration × 400 iterations = 4400 instructions
-
IPC:
Question 3: Hardware Modification to Improve Cache Hit Rate
Modification: Use a 2-way set associative cache instead of a direct-mapped cache.
Reason: A 2-way set associative cache allows each index to map to two cache lines, which would enable both arrays and to coexist in the cache without conflict. This reduces the conflict misses, significantly improving the hit rate.
Question 4: Software-Level Optimization
Optimization: Array padding.
Example: Introduce padding in array to offset its starting address such that it does not map to the same cache lines as array .
Explanation: By adding a small padding (e.g., 16 bytes) to the start of array , the memory addresses of ‘s elements will map to different cache lines than the corresponding elements in . This reduces the conflict between and , leading to fewer cache misses and a higher hit rate.
知识点
CacheMemoryDirectMappedCacheSetAssociativeCacheInstructionLevelParallelismArrayPadding
难点思路
本题的难点在于正确分析缓存访问的冲突问题,以及提出有效的硬件和软件优化方案来减少这些冲突。直接映射缓存的冲突失效是此题的核心问题。
解题技巧和信息
- Direct-Mapped Cache: 在处理大量连续数据时容易产生冲突失效,集合关联缓存可以有效缓解这一问题。
- IPC Calculation: 注意由于缓存失效引起的指令执行延迟对 IPC 的影响。
- Array Padding: 是解决数组访问冲突问题的有效软件优化方法。
重点词汇
- Cache Line 缓存行
- Direct-Mapped Cache 直接映射缓存
- Hit Rate 命中率
- Instruction Per Cycle (IPC) 每周期指令数
- Array Padding 数组填充
参考资料
- Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach. Chapter 2.
- Patterson, D. A., & Hennessy, J. L. (2013). Computer Organization and Design: The Hardware/Software Interface. Chapter 5.
知识点
CacheMemoryDirectMappedCacheSetAssociativeCacheInstructionLevelParallelismLoopUnrolling
难点思路
本题的难点在于对缓存操作的细致分析,特别是如何判断缓存命中与缺失,以及考虑硬件和软件的优化策略。
解题技巧和信息
- Direct-Mapped Cache: 一般容易产生冲突失效,可以考虑增加集合关联度来减小失效率。
- IPC Calculation: 注意每种指令在不同缓存命中状态下的执行周期。
- Loop Unrolling: 是一种常见的优化手段,通过减少循环次数和增加每次循环的工作量来提高性能。
重点词汇
- Cache Line 缓存行
- Direct-Mapped Cache 直接映射缓存
- Hit Rate 命中率
- Instruction Per Cycle (IPC) 每周期指令数
- Loop Unrolling 循环展开
参考资料
- Hennessy, J. L., & Patterson, D. A. (2017). Computer Architecture: A Quantitative Approach. Chapter 2.
- Patterson, D. A., & Hennessy, J. L. (2013). Computer Organization and Design: The Hardware/Software Interface. Chapter 5.