投影矩阵 | Projection Matrix

定义 | Definition

投影矩阵是一个方阵,能够将一个向量投影到一个子空间上。

The projection matrix is a square matrix that projects a vector onto a subspace.

给定一个矩阵 ,其投影矩阵 可以定义为:

For a given matrix , the projection matrix can be defined as:

其中 是一个 矩阵。

where is an matrix.

投影矩阵的推导 | Derivation of the Projection Matrix

为了推导投影矩阵,我们考虑一个向量 在子空间 上的投影。首先,我们需要找到一个向量 ,它是 在子空间 上的正交投影。

To derive the projection matrix, we consider the projection of a vector onto the subspace . We need to find a vector that is the orthogonal projection of onto the subspace .

1. 定义正交投影 | Define Orthogonal Projection

上的投影,则 可以表示为:

Let be the projection of onto , then can be expressed as:

其中 是一个系数向量。

where is a coefficient vector.

2. 最小化投影误差 | Minimize the Projection Error

我们希望 最小化 。这等价于最小化

We want to minimize . This is equivalent to minimizing

为了最小化这个误差,我们求误差的平方和并将其设置为零:

To minimize this error, we take the sum of squared errors and set its gradient to zero:

展开得到:

Expanding this, we get:

3. 对 求导 | Differentiate with Respect to

为了找到最小值,我们对 求导并设置为零:

To find the minimum, we take the derivative with respect to and set it to zero:

计算导数:

Calculating the derivatives, we get:

4. 解正则方程 | Solve the Normal Equations

简化得:

Simplifying, we get:

5. 求解 | Solve for

假设 是可逆的,我们可以得到 的解:

Assuming is invertible, we get the solution for :

6. 投影矩阵的计算 | Calculation of the Projection Matrix

的解代入 ,得到:

Substituting the solution for into , we get:

因此,投影矩阵 定义为:

Thus, the projection matrix is defined as:

投影矩阵的性质 | Properties of the Projection Matrix

  1. 对称性 | Symmetry: 投影矩阵 是对称矩阵,即 。 The projection matrix is symmetric, i.e., .

  2. 幂等性 | Idempotency: 投影矩阵 是幂等矩阵,即 。 The projection matrix is idempotent, i.e., .

投影的计算 | Calculation of Projection

给定一个向量 ,其在子空间 上的投影 计算如下:

Given a vector , its projection onto the subspace is calculated as:

伪逆矩阵 | Pseudo-Inverse Matrix

伪逆矩阵是一种广义逆矩阵,用于解决一些矩阵方程(如线性回归中的正则方程)。

The pseudo-inverse matrix is a type of generalized inverse matrix used to solve certain matrix equations, such as normal equations in linear regression.

对于一个矩阵 ,其伪逆矩阵 定义为:

For a matrix , its pseudo-inverse is defined as:

推导过程 | Derivation

为了推导伪逆矩阵,我们首先考虑一个线性方程组 的最小二乘解

To derive the pseudo-inverse matrix, we first consider the least squares solution of the linear system :

我们要求 使得 最小化,这意味着我们需要解以下正则方程:

We want to minimize , which means we need to solve the following normal equations:

假设 可逆,我们可以得到 的解:

Assuming is invertible, we get the solution for :

因此,伪逆矩阵 为:

Thus, the pseudo-inverse matrix is:

与线性方程组的关系 | Relationship with Linear Systems

方程组 的解 | Solutions to

对于线性方程组 ,若 是满秩矩阵(即 可逆),则方程组有唯一解:

For the linear system , if is a full-rank matrix (i.e., is invertible), then the system has a unique solution:

无解或多解的情况 | No Solutions or Multiple Solutions

不是满秩矩阵,方程组可能无解或有无穷多解。在这种情况下,可以求得最小二乘解

If is not a full-rank matrix, the system may have no solutions or infinitely many solutions. In this case, the least squares solution can be found:

此时, 是使 最小的向量。

In this case, is the vector that minimizes .

现实中的应用 | Real-World Applications

数据量远大于变量数量的情况 | When the Amount of Data Far Exceeds the Number of Variables

在统计学和机器学习中,线性方程组 中的数据量(观测数 )通常远大于变量数量(特征数 ),即 。这种情况下,矩阵 通常是满秩的,因此 是可逆的。

In statistics and machine learning, the linear system often involves a dataset where the number of observations is much larger than the number of variables , i.e., . In such cases, the matrix is typically full-rank, making invertible.