投影矩阵 | Projection Matrix
定义 | Definition
投影矩阵是一个方阵,能够将一个向量投影到一个子空间上。
The projection matrix is a square matrix that projects a vector onto a subspace.
给定一个矩阵 ,其投影矩阵 可以定义为:
For a given matrix , the projection matrix can be defined as:
其中 是一个 矩阵。
where is an matrix.
投影矩阵的推导 | Derivation of the Projection Matrix
为了推导投影矩阵,我们考虑一个向量 在子空间 上的投影。首先,我们需要找到一个向量 ,它是 在子空间 上的正交投影。
To derive the projection matrix, we consider the projection of a vector onto the subspace . We need to find a vector that is the orthogonal projection of onto the subspace .
1. 定义正交投影 | Define Orthogonal Projection
设 是 在 上的投影,则 可以表示为:
Let be the projection of onto , then can be expressed as:
其中 是一个系数向量。
where is a coefficient vector.
2. 最小化投影误差 | Minimize the Projection Error
我们希望 最小化 。这等价于最小化 。
We want to minimize . This is equivalent to minimizing 。
为了最小化这个误差,我们求误差的平方和并将其设置为零:
To minimize this error, we take the sum of squared errors and set its gradient to zero:
展开得到:
Expanding this, we get:
3. 对 求导 | Differentiate with Respect to
为了找到最小值,我们对 求导并设置为零:
To find the minimum, we take the derivative with respect to and set it to zero:
计算导数:
Calculating the derivatives, we get:
4. 解正则方程 | Solve the Normal Equations
简化得:
Simplifying, we get:
5. 求解 | Solve for
假设 是可逆的,我们可以得到 的解:
Assuming is invertible, we get the solution for :
6. 投影矩阵的计算 | Calculation of the Projection Matrix
将 的解代入 ,得到:
Substituting the solution for into , we get:
因此,投影矩阵 定义为:
Thus, the projection matrix is defined as:
投影矩阵的性质 | Properties of the Projection Matrix
-
对称性 | Symmetry: 投影矩阵 是对称矩阵,即 。 The projection matrix is symmetric, i.e., .
-
幂等性 | Idempotency: 投影矩阵 是幂等矩阵,即 。 The projection matrix is idempotent, i.e., .
投影的计算 | Calculation of Projection
给定一个向量 ,其在子空间 上的投影 计算如下:
Given a vector , its projection onto the subspace is calculated as:
伪逆矩阵 | Pseudo-Inverse Matrix
伪逆矩阵是一种广义逆矩阵,用于解决一些矩阵方程(如线性回归中的正则方程)。
The pseudo-inverse matrix is a type of generalized inverse matrix used to solve certain matrix equations, such as normal equations in linear regression.
对于一个矩阵 ,其伪逆矩阵 定义为:
For a matrix , its pseudo-inverse is defined as:
推导过程 | Derivation
为了推导伪逆矩阵,我们首先考虑一个线性方程组 的最小二乘解 :
To derive the pseudo-inverse matrix, we first consider the least squares solution of the linear system :
我们要求 使得 最小化,这意味着我们需要解以下正则方程:
We want to minimize , which means we need to solve the following normal equations:
假设 可逆,我们可以得到 的解:
Assuming is invertible, we get the solution for :
因此,伪逆矩阵 为:
Thus, the pseudo-inverse matrix is:
与线性方程组的关系 | Relationship with Linear Systems
方程组 的解 | Solutions to
对于线性方程组 ,若 是满秩矩阵(即 可逆),则方程组有唯一解:
For the linear system , if is a full-rank matrix (i.e., is invertible), then the system has a unique solution:
无解或多解的情况 | No Solutions or Multiple Solutions
若 不是满秩矩阵,方程组可能无解或有无穷多解。在这种情况下,可以求得最小二乘解 :
If is not a full-rank matrix, the system may have no solutions or infinitely many solutions. In this case, the least squares solution can be found:
此时, 是使 最小的向量。
In this case, is the vector that minimizes .
现实中的应用 | Real-World Applications
数据量远大于变量数量的情况 | When the Amount of Data Far Exceeds the Number of Variables
在统计学和机器学习中,线性方程组 中的数据量(观测数 )通常远大于变量数量(特征数 ),即 。这种情况下,矩阵 通常是满秩的,因此 是可逆的。
In statistics and machine learning, the linear system often involves a dataset where the number of observations is much larger than the number of variables , i.e., . In such cases, the matrix is typically full-rank, making invertible.