EM 算法 | EM Algorithm

定义 | Definition

EM (Expectation-Maximization) 算法是一种迭代优化算法，用于在存在隐变量或缺失数据的情况下进行最大似然估计。EM 算法通过交替执行期望步骤 (E-step) 和最大化步骤 (M-step) 来寻找参数估计的最大似然解。

The EM (Expectation-Maximization) algorithm is an iterative optimization algorithm used to find maximum likelihood estimates in the presence of latent variables or missing data. The EM algorithm alternates between the Expectation step (E-step) and the Maximization step (M-step) to find the maximum likelihood estimates of parameters.

应用 | Applications

高斯混合模型 | Gaussian Mixture Models
- 用于聚类和密度估计。
- Used for clustering and density estimation.
隐藏马尔可夫模型 | Hidden Markov Models
- 用于序列数据的建模和标注。
- Used for modeling and labeling sequence data.
图像处理 | Image Processing
- 用于图像分割和恢复。
- Used for image segmentation and restoration.

算法步骤 | Algorithm Steps

1. 初始化 | Initialization

选择参数的初始值。

Choose initial values for the parameters.

2. 期望步骤 (E-step) | Expectation Step (E-step)

计算在当前参数下隐变量的期望值。

Compute the expected value of the latent variables given the current parameters.

3. 最大化步骤 (M-step) | Maximization Step (M-step)

最大化期望值，重新估计参数。

Maximize the expected value to re-estimate the parameters.

4. 迭代 | Iteration

重复执行 E-step 和 M-step 直到参数收敛或达到预设的迭代次数。

Repeat the E-step and M-step until the parameters converge or a preset number of iterations is reached.

伪代码 | Pseudocode

以下是应用于高斯混合模型 (GMM) 的 EM 算法伪代码：

Below is the pseudocode for the EM algorithm applied to Gaussian Mixture Models (GMM):

import numpy as np
from scipy.stats import multivariate_normal
 
def initialize_parameters(X, k):
    n, d = X.shape
    pi = np.ones(k) / k  # 初始化混合系数
    mu = X[np.random.choice(n, k, replace=False)]  # 随机选择初始均值
    sigma = np.array([np.eye(d)] * k)  # 初始化协方差矩阵
    return pi, mu, sigma
 
def e_step(X, pi, mu, sigma):
    n, d = X.shape
    k = len(pi)
    gamma = np.zeros((n, k))
 
    for i in range(k):
        gamma[:, i] = pi[i] * multivariate_normal.pdf(X, mu[i], sigma[i])
 
    gamma /= gamma.sum(axis=1, keepdims=True)
    return gamma
 
def m_step(X, gamma):
    n, d = X.shape
    k = gamma.shape[1]
 
    n_k = gamma.sum(axis=0)
    pi = n_k / n
    mu = np.dot(gamma.T, X) / n_k[:, np.newaxis]
    sigma = np.zeros((k, d, d))
 
    for i in range(k):
        X_centered = X - mu[i]
        sigma[i] = np.dot(gamma[:, i] * X_centered.T, X_centered) / n_k[i]
 
    return pi, mu, sigma
 
def compute_log_likelihood(X, pi, mu, sigma):
    n, d = X.shape
    k = len(pi)
    log_likelihood = 0
 
    for i in range(k):
        log_likelihood += pi[i] * multivariate_normal.pdf(X, mu[i], sigma[i])
 
    return np.log(log_likelihood).sum()
 
def em_algorithm(X, k, max_iters=100, tol=1e-4):
    pi, mu, sigma = initialize_parameters(X, k)
    log_likelihood = -np.inf
 
    for _ in range(max_iters):
        gamma = e_step(X, pi, mu, sigma)
        pi, mu, sigma = m_step(X, gamma)
        new_log_likelihood = compute_log_likelihood(X, pi, mu, sigma)
 
        if abs(new_log_likelihood - log_likelihood) < tol:
            break
 
        log_likelihood = new_log_likelihood
 
    return pi, mu, sigma, gamma
 
# Example usage
X = np.array([
    [1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0],
    [1.0, 0.6], [9.0, 11.0], [8.0, 2.0], [10.0, 2.0],
    [9.0, 3.0], [4.0, 2.0], [4.0, 4.0], [6.0, 3.0]
])
k = 3
pi, mu, sigma, gamma = em_algorithm(X, k)
print("Pi:", pi)
print("Mu:", mu)
print("Sigma:", sigma)
print("Gamma:", gamma)

示例 | Example

假设我们有以下数据点：

Suppose we have the following data points:

[1.0, 2.0], [1.5, 1.8], [5.0, 8.0], [8.0, 8.0],
[1.0, 0.6], [9.0, 11.0], [8.0, 2.0], [10.0, 2.0],
[9.0, 3.0], [4.0, 2.0], [4.0, 4.0], [6.0, 3.0]

我们希望将这些数据点聚类为 3 个高斯分布。

We want to cluster these data points into 3 Gaussian distributions.

通过运行上述 EM 算法，我们得到的参数如下：

By running the above EM algorithm, we obtain the following parameters:

Pi: [0.33333333 0.33333333 0.33333333]
Mu: [[9.         6.        ]
     [5.16666667 6.16666667]
     [1.5        1.46666667]]
Sigma: [[[ 1.5        0.        ]
          [ 0.         5.5       ]]
         
         [[ 6.8        0.        ]
          [ 0.         1.73333333]]
         
         [[ 0.63333333 0.4       ]
          [ 0.4        0.45666667]]]
Gamma: [[0.   0.   1.  ]
        [0.   0.   1.  ]
        [0.   1.   0.  ]
        [1.   0.   0.  ]
        [0.   0.   1.  ]
        [1.   0.   0.  ]
        [1.   0.   0.  ]
        [1.   0.   0.  ]
        [1.   0.   0.  ]
        [0.   1.   0.  ]
        [0.   1.   0.  ]
        [0.   1.   0.  ]]

时间复杂度 | Time Complexity

EM 算法的时间复杂度通常为 $O (n \cdot k \cdot d^{2} \cdot t)$ ，其中：

The time complexity of the EM algorithm is typically $O (n \cdot k \cdot d^{2} \cdot t)$ , where:

$n$ 是数据点的数量。
$n$ is the number of data points.
$k$ 是高斯分布的数量。
$k$ is the number of Gaussian distributions.
$d$ 是数据点的维度。
$d$ is the dimension of the data points.
$t$ 是算法的迭代次数。
$t$ is the number of iterations.

优点和缺点 | Advantages and Disadvantages

优点 | Advantages

适用于复杂模型 | Suitable for Complex Models
- 可以处理具有隐变量的复杂概率模型。
- Can handle complex probabilistic models with latent variables.
灵活性高 | High Flexibility
- 适用于不同的概率分布和模型。
- Applicable to different probability distributions and models.
理论基础扎实 | Solid Theoretical Foundation
- 基于最大似然估计，具有良好的统计性质。
- Based on maximum likelihood estimation, with good statistical properties.

缺点 | Disadvantages

对初始值敏感 | Sensitive to Initial Values
- 不同的初始值可能导致不同的收敛结果。
- Different initial values may lead to different convergence results.
可能收敛到局部最优 | May Converge to Local Optima
- 由于优化过程是迭代的，可能陷入局部最优解。
- The iterative optimization process may get stuck in local optima.
计算复杂度高 | High Computational Complexity
- 对于大规模数据集，计算复杂度较高。
- High computational complexity for large datasets.

总结 | Summary

EM 算法是一种强大的优化工具，广泛应用于统计学、机器学习和信号处理等领域。通过理解其工作原理、优缺点及实现细节，可以有效地解决涉及隐变量和缺失数据的问题。

The EM algorithm is a powerful optimization tool widely used in statistics, machine learning, and signal processing. Understanding its working principle, advantages, disadvantages, and implementation details helps in effectively solving problems involving latent variables and missing data.

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer