Cache | 缓存

Overview | 概述

Cache is a high-speed storage layer that stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than the data’s primary storage location. Caches are used to reduce access times, reduce latency, and reduce the need for repeated calculations.

缓存是一种高速存储层，用于存储数据的子集，通常是临时的，以便未来对这些数据的请求可以比从数据的主存储位置访问更快。缓存用于减少访问时间、降低延迟和减少重复计算的需求。

Cache Structure | 缓存结构

1. Cache Line | 缓存行

A cache line is the basic unit of data transfer between cache and main memory. Each cache line typically contains:

缓存行是缓存和主存之间数据传输的基本单位。每个缓存行通常包含：

Valid Bit | 有效位: Indicates whether the data in the cache line is valid. 表示该缓存行中的数据是否有效。
Tag | 标记: Used to identify the memory block stored in the cache line. 用于识别存储在缓存行中的内存块。
Data | 数据: The actual content of the cached memory block. 实际存储的内存块内容。
Dirty Bit | 脏位 (only for write-back policy): Indicates whether the cache line has been modified. （仅用于写回策略）表示缓存行的数据是否被修改过。

2. Tag | 标记

The tag is a crucial part of the cache structure:

标记是缓存结构中的关键部分：

Unique Identification | 唯一标识: Tags uniquely identify the memory block stored in a cache line. 标记唯一标识存储在缓存行中的内存块。
Address Matching | 地址匹配: When the CPU requests data, the cache controller compares the tag portion of the request address with the tag in the cache line to determine if there’s a cache hit. 当 CPU 请求数据时，缓存控制器比较请求地址的标记部分与缓存行中的标记，以确定是否发生缓存命中。
Space Saving | 空间节省: Tags allow the cache to store data from different memory locations without storing the full memory address. 标记允许缓存存储来自不同内存位置的数据，而无需存储完整的内存地址。

Cache Types | 缓存类型

1. Direct-Mapped Cache | 直接映射缓存

In a direct-mapped cache, each block of main memory maps to exactly one block in the cache. This is the simplest form of cache organization.

在直接映射缓存中，主存的每个块映射到缓存中的一个特定块。这是最简单的缓存组织形式。

Calculation | 计算:

Cache Line Index | 缓存行索引:

$Index = (Memory Address / Block Size) mod Number of Cache Lines$

Tag | 标记:

Tag = Memory Address / (Block Size \times Number of Cache Lines)

Advantages | 优点:

Simple and fast lookup 简单快速的查找
Low hardware complexity 硬件复杂度低

Disadvantages | 缺点:

Higher conflict miss rate 较高的冲突缺失率
Poor utilization of cache space 缓存空间利用率低

Example | 示例:

For a 64KB direct-mapped cache with 64-byte blocks:

对于一个 64KB 的直接映射缓存，块大小为 64 字节：

Number of cache lines 缓存行数 $= 64 K B /64 B = 1024$
Index bits 索引位数 $= lo g_{2} (1024) = 10$ bits
Offset bits 偏移位数 $= lo g_{2} (64) = 6$ bits
Tag bits 标记位数 $= 32 - 10 - 6 = 16$ bits (assuming 32-bit addresses, 假设 32 位地址)

2. Fully Associative Cache | 全相联缓存

In a fully associative cache, any block of main memory can be placed in any block of the cache. This provides the most flexible cache organization.

在全相联缓存中，主存的任何块都可以放置在缓存的任何块中。这提供了最灵活的缓存组织。

Calculation | 计算:

Tag | 标记:

Tag = Memory Address

Advantages | 优点:

Lowest miss rate (no conflict misses) 最低的缺失率（无冲突缺失）
Best utilization of cache space 最佳的缓存空间利用率

Disadvantages | 缺点:

Complex and expensive hardware 复杂且昂贵的硬件
Slower lookup time 较慢的查找时间

Example | 示例:

For a 64KB fully associative cache with 64-byte blocks:

对于一个 64KB 的全相联缓存，块大小为 64 字节：

Number of cache lines 缓存行数 $= 64 K B /64 B = 1024$
Tag bits 标记位 $= 32 - 6 = 26$ bits (assuming 32-bit addresses)
Offset bits 偏移位数 $= lo g_{2} (64) = 6$ bits

3. Multiway Set-Associative Cache | 多路组相联缓存

Set-associative cache is a compromise between direct-mapped and fully associative caches. The cache is divided into sets, and each set contains multiple cache lines.

组相联缓存是直接映射和全相联缓存之间的折衷。缓存分为多个组，每个组包含多个缓存行。

Inside each set, the cache is associative, meaning that a memory block can be placed in any cache line within the set. The number of cache lines per set is called the associativity, which can be 2-way, 4-way, etc.

在每个组内，缓存是相联的，这意味着内存块可以放置在组内的任何缓存行中。每组的缓存行数称为相联度，可以是 2 路、4 路等。

Calculation | 计算:

Set Index | 组索引:

$Index = (Memory Address / Block Size) mod Number of Sets$

Tag | 标记:

$Tag = Memory Address / (Block Size \times Number of Sets)$

Advantages | 优点:

Better balance between lookup speed and miss rate 在查找速度和缺失率之间取得更好的平衡
More flexible than direct-mapped cache, since it allows memory blocks to have multiple possible locations inside the set 比直接映射缓存更灵活，因为它允许内存块在组内有多个可能的位置
No need for calculations of exact cache line placement as in fully associative cache, only set index is needed 无需像全相联缓存那样计算确切的缓存行放置位置，只需要组索引
More practical than fully associative cache for larger cache sizes 对于较大的缓存大小，比全相联缓存更实用

Disadvantages | 缺点:

More complex than direct-mapped cache 比直接映射缓存更复杂
Slightly slower lookup than direct-mapped cache 查找速度略慢于直接映射缓存

Example | 示例:

For a 64KB 4-way set-associative cache with 64-byte blocks:

对于一个 64KB 的 4 路组相联缓存，块大小为 64 字节：

Number of sets 组数 $= 64 K B / (4 \times 64 B) = 256$
Index bits 索引位数 $= lo g_{2} (256) = 8$ bits
Offset bits 偏移位数 $= lo g_{2} (64) = 6$ bits
Tag bits 标记位数 $= 32 - 8 - 6 = 18$ bits (assuming 32-bit addresses)

In practice, most modern processors use set-associative caches, as they offer a good compromise between performance, complexity, and cost. The choice of associativity (e.g., 2-way, 4-way, 8-way) depends on the specific requirements and constraints of the system design.

在实践中，大多数现代处理器使用组相联缓存，因为它们在性能、复杂性和成本之间提供了良好的平衡。相联度的选择（例如 2 路、4 路、8 路）取决于系统设计的具体要求和限制。

Cache and RAM Hierarchy | 缓存和 RAM 层次结构

The interaction between cache and RAM forms a hierarchical storage structure:

缓存和 RAM 之间的交互形成了一个层次化的存储结构：

Data Request | 数据请求: When the CPU needs to read data, it first sends a request to the cache. 当 CPU 需要读取数据时，它首先向缓存发出请求。
Cache Check | 缓存检查:
- The cache controller checks if the requested address is in the cache.
- It uses part of the address as an index to locate the possible cache line.
- Then it compares the tag portion of the address with the tag in the cache line.
缓存控制器检查请求的地址是否在缓存中。

它使用地址的一部分作为索引来定位可能的缓存行。

然后比较地址的标记部分与缓存行中的标记。
Cache Hit | 缓存命中:
- If the tag matches and the valid bit is 1, a cache hit occurs.
- Data is returned directly from the cache to the CPU without accessing RAM.
如果标记匹配且有效位为 1，就发生了缓存命中。

数据直接从缓存返回给 CPU，无需访问 RAM。
Cache Miss | 缓存缺失:
- If no matching valid data is found, a cache miss occurs.
- The cache controller sends a request to RAM to fetch the required data block.
- Data is loaded from RAM into the cache and simultaneously returned to the CPU.
如果未找到匹配的有效数据，就发生了缓存缺失。

缓存控制器向 RAM 发出请求，获取所需的数据块。

数据从 RAM 加载到缓存中，同时返回给 CPU。
Data Replacement | 数据替换:
- If the cache is full, a cache line needs to be selected for replacement based on the replacement policy (e.g., LRU, FIFO).
- If the data being replaced has been modified (dirty bit is 1), it needs to be written back to RAM first.
如果缓存已满，需要根据替换策略（如 LRU、FIFO 等）选择一个缓存行进行替换。

如果被替换的数据被修改过（脏位为 1），则需要先将其写回到 RAM。
Write Operations | 写操作:
- Write-through: Data is written to both cache and RAM simultaneously.
- Write-back: Data is only written to cache and the dirty bit is set. It’s written to RAM only when the data is evicted from the cache.
写直达：数据同时写入缓存和 RAM。

写回：数据只写入缓存，并设置脏位。只有在数据被替换出缓存时才写入 RAM。

This hierarchical storage structure allows the system to leverage data locality principles, significantly improving data access speed.

这种层次化的存储结构允许系统利用数据的局部性原理，大大提高了数据访问的速度。

Replacement Policies | 替换策略

When the cache is full, a replacement policy decides which data to evict to make room for new data. Common policies include:

当缓存已满时，替换策略决定要驱逐哪些数据以腾出空间存放新数据。常见的策略包括：

Least Recently Used (LRU) | 最近最少使用: Evicts the least recently accessed data. 驱逐最近最少访问的数据。
First In, First Out (FIFO) | 先进先出: Evicts the oldest data in the cache. 驱逐缓存中最旧的数据。
Random Replacement | 随机替换: Randomly selects a cache line to evict. 随机选择一个缓存行进行驱逐。

Cache Coherence | 缓存一致性

In multi-core processors or multi-processor systems, cache coherence becomes a critical issue. Common cache coherence protocols include:

在多核处理器或多处理器系统中，缓存一致性成为一个关键问题。常见的缓存一致性协议包括：

MESI Protocol | MESI 协议: Modified, Exclusive, Shared, Invalid states. 修改、独占、共享、无效四种状态。
MOESI Protocol | MOESI 协议: Adds an Owned state to MESI. 在 MESI 基础上增加了拥有状态。
MESIF Protocol | MESIF 协议: Adds a Forward state to MESI. 在 MESI 基础上增加了转发状态。

These protocols ensure data consistency among multiple caches, preventing errors caused by data inconsistency.

这些协议确保了多个缓存之间数据的一致性，防止了数据不一致导致的错误。

Write Policies | 写策略

When the CPU needs to write data, it can handle the write operation in different ways. The two main write policies are Write-through and Write-back. Each has its own advantages and trade-offs.

当 CPU 需要写入数据时，它可以用不同的方式处理写操作。两种主要的写策略是直写和写回。每种策略都有其自身的优势和权衡。

Write-through | 直写

In a write-through policy, every write operation is immediately propagated to both the cache and the main memory.

在直写策略中，每次写操作都会立即传播到缓存和主存。

Characteristics | 特点:

Simplicity | 简单性: The cache and main memory are always consistent. 缓存和主存始终保持一致。
Read misses | 读缺失: A read miss never results in writes to main memory. 读缺失从不导致对主存的写操作。
Write performance | 写性能: Every write operation incurs the cost of accessing main memory, which can be slow. 每次写操作都会产生访问主存的开销，这可能会很慢。
CPU stalls | CPU 停顿: The CPU may have to wait for writes to complete, causing stalls. CPU 可能需要等待写操作完成，导致停顿。

Use cases | 使用场景:

Systems where data integrity is critical. 数据完整性至关重要的系统。
Systems with infrequent write operations. 写操作不频繁的系统。

Write-back | 写回

In a write-back policy, write operations are performed only on the cache. The modified cache block is written to main memory only when it’s evicted from the cache.

在写回策略中，写操作只在缓存上执行。只有当修改过的缓存块被驱逐出缓存时，才会写入主存。

Characteristics | 特点:

Performance | 性能: Generally offers better performance, especially for write-intensive workloads. 通常提供更好的性能，特别是对于写密集型工作负载。
Complexity | 复杂性: Requires additional hardware (dirty bit) and more complex control logic. 需要额外的硬件（脏位）和更复杂的控制逻辑。
Inconsistency risk | 不一致风险: The cache and main memory may be inconsistent for periods of time. 缓存和主存在一段时间内可能不一致。
Burst mode | 突发模式: When a dirty block is evicted, it results in a burst of memory traffic. 当脏块被驱逐时，会导致突发的内存流量。

Use cases | 使用场景:

Systems with frequent write operations. 写操作频繁的系统。
Systems where write performance is critical. 写性能至关重要的系统。

Comparison | 比较

Aspect	Write-through \| 直写	Write-back \| 写回
Write Speed \| 写入速度	Slower \| 较慢	Faster \| 较快
Memory Traffic \| 内存流量	Higher \| 较高	Lower \| 较低
Data Consistency \| 数据一致性	Always consistent \| 始终一致	May be inconsistent \| 可能不一致
Complexity \| 复杂度	Simpler \| 较简单	More complex \| 较复杂
Power Consumption \| 功耗	Higher \| 较高	Lower \| 较低

Hybrid Approaches | 混合方法

Some systems use a combination of write-through and write-back policies to balance performance and data integrity:

一些系统使用直写和写回策略的组合来平衡性能和数据完整性：

Selective Write-through | 选择性直写: Critical data uses write-through, while non-critical data uses write-back. 关键数据使用直写，非关键数据使用写回。
Write-back with periodic flush | 定期刷新的写回: Uses write-back but periodically writes all dirty blocks to main memory. 使用写回，但定期将所有脏块写入主存。
Adaptive policies | 自适应策略: Dynamically switches between write-through and write-back based on system load and write patterns. 基于系统负载和写入模式动态切换直写和写回。

Understanding these write policies is crucial for optimizing cache performance and ensuring data integrity in computer systems.

理解这些写策略对于优化缓存性能和确保计算机系统中的数据完整性至关重要。

Aspect	Write-through \| 直写	Write-back \| 写回
Write Speed \| 写入速度	Slower \| 较慢	Faster \| 较快
Memory Traffic \| 内存流量	Higher \| 较高	Lower \| 较低
Data Consistency \| 数据一致性	Always consistent \| 始终一致	May be inconsistent \| 可能不一致
Complexity \| 复杂度	Simpler \| 较简单	More complex \| 较复杂
Power Consumption \| 功耗	Higher \| 较高	Lower \| 较低

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

Explorer

Cache

Cache | 缓存

Overview | 概述

Cache Structure | 缓存结构

1. Cache Line | 缓存行

2. Tag | 标记

Cache Types | 缓存类型

1. Direct-Mapped Cache | 直接映射缓存

2. Fully Associative Cache | 全相联缓存

3. Multiway Set-Associative Cache | 多路组相联缓存

Cache and RAM Hierarchy | 缓存和 RAM 层次结构

Replacement Policies | 替换策略

Cache Coherence | 缓存一致性

Write Policies | 写策略

Write-through | 直写

Write-back | 写回

Comparison | 比较

Hybrid Approaches | 混合方法

Graph View

Table of Contents

Backlinks