最小生成树 | Minimum Spanning Tree (MST)

定义 | Definition

最小生成树是给定连通无向图的一个子图,它包括所有的顶点,并且总权重最小。最小生成树在图论和网络设计中有广泛应用。

A Minimum Spanning Tree (MST) is a subgraph of a given connected, undirected graph that includes all the vertices and has the minimum total edge weight. MSTs are widely used in graph theory and network design.

最小生成树的性质 | Properties of Minimum Spanning Tree

  1. 唯一性 | Uniqueness: 如果所有边的权重都不同,则最小生成树是唯一的。

  2. 循环性 | Cycle Property: 对于一个循环,其中的任一边如果其权重大于循环中任何一条边,则该边不属于最小生成树。

  3. 割边性质 | Cut Property: 对于任何割,权重最小的边属于最小生成树。

  4. 最小成本 | Minimum Cost: 最小生成树的权重和是所有可能的生成树中最小的。

  5. Uniqueness: If all edge weights are distinct, the MST is unique.

  6. Cycle Property: For any cycle, if the weight of an edge is larger than the weights of other edges in the cycle, then this edge does not belong to the MST.

  7. Cut Property: For any cut, the minimum weight edge crossing the cut belongs to the MST.

  8. Minimum Cost: The total weight of the MST is the smallest among all possible spanning trees.

常见算法 | Common Algorithms

Kruskal 算法 | Kruskal’s Algorithm

Kruskal 算法是一种贪心算法,通过按权重升序排序边,并逐一将边加入生成树中,确保不会形成环,直到所有顶点都包含在内。该算法使用了并查集数据结构来高效地检测环的形成。

Kruskal’s Algorithm is a greedy algorithm that sorts all the edges in ascending order of their weight and then adds edges one by one to the MST, ensuring no cycles are formed, until all vertices are included. This algorithm uses the Disjoint Set data structure to efficiently detect cycles.

数据结构选择 | Data Structure Selection

并查集(Disjoint Set): 并查集用于跟踪和合并不同的集合,可以有效地检测图中的环。通过路径压缩和按秩合并操作,保证了近乎常数时间的复杂度。

Disjoint Set: The disjoint set is used to keep track of and merge different sets, which allows efficient cycle detection in the graph. With path compression and union by rank, it ensures almost constant time complexity.

算法步骤 | Algorithm Steps

  1. 按权重升序排序所有边。

  2. 初始化最小生成树为空。

  3. 依次检查每条边,如果加入该边不会形成环,则将其加入最小生成树。

  4. 重复步骤 3,直到最小生成树包含所有顶点。

  5. Sort all the edges in ascending order of their weight.

  6. Initialize the MST as empty.

  7. Check each edge in order, and if adding the edge does not form a cycle, add it to the MST.

  8. Repeat step 3 until the MST includes all vertices.

class DisjointSet:
    def __init__(self, n):
        self.parent = list(range(n))
        self.rank = [0] * n
 
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
 
    def union(self, x, y):
        rootX = self.find(x)
        rootY = self.find(y)
        if rootX != rootY:
            if self.rank[rootX] > self.rank[rootY]:
                self.parent[rootY] = rootX
            elif self.rank[rootX] < self.rank[rootY]:
                self.parent[rootX] = rootY
            else:
                self.parent[rootY] = rootX
                self.rank[rootX] += 1
 
def kruskal(graph):
    edges = sorted(graph['edges'], key=lambda edge: edge[2])
    disjoint_set = DisjointSet(graph['vertices'])
    mst = []
    for u, v, weight in edges:
        if disjoint_set.find(u) != disjoint_set.find(v):
            disjoint_set.union(u, v)
            mst.append((u, v, weight))
    return mst
 
graph = {
    'vertices': 4,
    'edges': [
        (0, 1, 10),
        (0, 2, 6),
        (0, 3, 5),
        (1, 3, 15),
        (2, 3, 4)
    ]
}
 
mst = kruskal(graph)
print(mst)

Prim 算法 | Prim’s Algorithm

Prim 算法也是一种贪心算法,通过从任一顶点开始,逐步扩展最小生成树,选择与当前树连接且权重最小的边,直到包含所有顶点。该算法使用了优先队列数据结构来高效地选择最小权重的边。

Prim’s Algorithm is another greedy algorithm that starts with any vertex and gradually expands the MST by selecting the minimum weight edge that connects the current tree to a new vertex, until all vertices are included. This algorithm uses a priority queue data structure to efficiently select the minimum weight edge.

数据结构选择 | Data Structure Selection

优先队列(Priority Queue): 优先队列用于存储和快速访问权重最小的边。通过使用最小堆,可以保证每次选择的边是当前可选择的最小权重边。

Priority Queue: A priority queue is used to store and quickly access the minimum weight edge. By using a min-heap, it ensures that each selected edge is the minimum weight edge available.

算法步骤 | Algorithm Steps

  1. 从任一顶点开始,初始化最小生成树。

  2. 重复以下步骤,直到最小生成树包含所有顶点:

    • 选择与当前树连接且权重最小的边。
    • 将该边连接的新顶点加入最小生成树。
  3. Start with any vertex and initialize the MST.

  4. Repeat the following steps until the MST includes all vertices:

    • Select the minimum weight edge that connects the current tree to a new vertex.
    • Add the newly connected vertex to the MST.
import heapq
 
def prim(graph, start):
    mst = []
    visited = set()
    min_heap = [(0, start, -1)]  # (weight, current_vertex, previous_vertex)
    while min_heap:
        weight, u, prev = heapq.heappop(min_heap)
        if u not in visited:
            visited.add(u)
            if prev != -1:
                mst.append((prev, u, weight))
            for v, weight in graph[u]:
                if v not in visited:
                    heapq.heappush(min_heap, (weight, v, u))
    return mst
 
graph = {
    0: [(1, 10), (2, 6), (3, 5)],
    1: [(0, 10), (3, 15)],
    2: [(0, 6), (3, 4)],
    3: [(0, 5), (1, 15), (2, 4)]
}
 
mst = prim(graph, 0)
print(mst)

进阶知识 | Advanced Knowledge

最小生成树的应用 | Applications of Minimum Spanning Tree

  • 网络设计 | Network Design: 设计计算机网络、电话网络、电力网络等,最小生成树可以帮助找到连接所有节点的最小成本路径。

  • 聚类分析 | Clustering: 在数据聚类中,最小生成树可以用于找到自然的分组。

  • 图像处理 | Image Processing: 在图像分割中,最小生成树可以帮助确定图像中像素的边界。

  • Network Design: Designing computer, telecommunication, or power networks, MST helps in finding the minimum cost path to connect all nodes.

  • Clustering: In data clustering, MST can be used to find natural groupings.

  • Image Processing: In image segmentation, MST helps in determining the boundaries of pixels in images.

多种权重的处理 | Handling Multiple Weights

在处理具有多种权重的图时,可以使用不同的策略来选择合适的边,例如考虑不同权重的优先级。

When dealing with graphs with multiple weights, different strategies can be employed to select the appropriate edges, such as considering the priority of different weights.

常见问题 | Common Issues

  • 图不连通 | Graph Not Connected: Kruskal 和 Prim 算法都要求输入图是连通的,若图不连通,则需处理每个连通分量。

  • 权重相同 | Equal Weights: 若图中有多条边具有相同的权重,应确保算法能正确处理,避免形成环或漏掉某些边。

  • Graph Not Connected: Both Kruskal’s and Prim’s algorithms require the input graph to be connected. If the graph is not

connected, handle each connected component separately.

  • Equal Weights: If the graph has multiple edges with equal weights, ensure the algorithm handles them correctly to avoid forming cycles or missing edges.