矩阵求导 | Matrix Calculus

简介 | Introduction

矩阵求导是线性代数和微积分的交叉领域,涉及到对矩阵、向量以及标量函数求导的技术。它在机器学习、优化、统计学等多个领域中具有广泛的应用。矩阵求导和标量微分有相似之处,但它具有更多的复杂性,因为它考虑的是多维的结构。

Matrix calculus is an intersection of linear algebra and calculus, dealing with techniques for taking derivatives with respect to matrices, vectors, and scalar functions. It has broad applications in fields such as machine learning, optimization, and statistics. While similar to scalar calculus, matrix calculus involves additional complexity due to its consideration of multidimensional structures.

标量作为变量的矩阵求导 | Scalar as Variable in Matrix Calculus

标量对标量求导 | Scalar Derivative of a Scalar Function

如果函数 仅依赖于一个标量变量 ,那么其导数与一元微分学中定义的导数相同:

If a function depends only on a single scalar variable , its derivative is the same as defined in single-variable calculus:

标量对向量求导 | Scalar Derivative of a Vector Function

对于一个依赖于向量 的标量函数 ,其导数是一个向量,称为梯度 (Gradient):

For a scalar function that depends on a vector , the derivative is a vector, known as the gradient:

标量对矩阵求导 | Scalar Derivative of a Matrix Function

对于一个依赖于矩阵 的标量函数 ,其导数是一个矩阵,称为梯度矩阵 (Gradient Matrix):

For a scalar function that depends on a matrix , the derivative is a matrix, known as the gradient matrix:

向量作为变量的矩阵求导 | Vector as Variable in Matrix Calculus

向量对标量求导 | Vector Derivative of a Scalar Function

对于一个依赖于标量 的向量函数 ,其导数是一个向量:

For a vector function that depends on a scalar , the derivative is a vector:

向量对向量求导 | Vector Derivative of a Vector Function

对于一个依赖于向量 的向量函数 ,其导数是一个雅可比矩阵 (Jacobian Matrix):

For a vector function that depends on a vector , the derivative is a Jacobian matrix:

向量对矩阵求导 | Vector Derivative of a Matrix Function

对于一个依赖于矩阵 的向量函数 ,其导数是一个张量 (Tensor),通常不常见,也不易表述。通常我们通过分块矩阵或其他方式来表示这种导数。

For a vector function that depends on a matrix , the derivative is a tensor, which is less common and harder to express. Usually, we represent such derivatives using block matrices or other methods.

矩阵作为变量的矩阵求导 | Matrix as Variable in Matrix Calculus

矩阵对标量求导 | Matrix Derivative of a Scalar Function

对于一个依赖于标量 的矩阵函数 ,其导数是一个矩阵:

For a matrix function that depends on a scalar , the derivative is a matrix:

矩阵对向量求导 | Matrix Derivative of a Vector Function

对于一个依赖于向量 的矩阵函数 ,其导数是一个三阶张量:

For a matrix function that depends on a vector , the derivative is a third-order tensor:

矩阵对矩阵求导 | Matrix Derivative of a Matrix Function

对于一个依赖于矩阵 的矩阵函数 ,其导数是一个四阶张量。通常,通过 Kronecker 积或者其他特殊的方式来简化表示。

For a matrix function that depends on a matrix , the derivative is a fourth-order tensor. Usually, it is simplified using the Kronecker product or other special methods.

矩阵求导法则 | Rules of Matrix Calculus

链式法则 | Chain Rule

当复合函数的形式存在时,链式法则在矩阵求导中同样适用:

When dealing with composite functions, the chain rule is also applicable in matrix calculus:

积法则 | Product Rule

对矩阵乘积求导时,积法则为:

For the derivative of a matrix product, the product rule states:

转置求导法则 | Transpose Rule

对于转置矩阵的求导,转置求导法则为:

For the derivative of a transposed matrix, the transpose rule states:

逆矩阵求导法则 | Inverse Matrix Rule

对于逆矩阵的求导,逆矩阵求导法则为:

For the derivative of an inverse matrix, the inverse matrix rule states:

矩阵求导常见公式推导 | Derivation of Common Formulas in Matrix Calculus

1. 对迹函数求导 | Derivative of Trace Function

推导过程 | Derivation:

考虑迹函数 ,其中 是常矩阵, 是变量矩阵。首先,利用迹的性质 ,可以将其展开:

Consider the trace function , where is a constant matrix and is the variable matrix. First, using the property of the trace, , we can expand it as:

的元素 求导:

Taking the derivative with respect to an element of :

这里,仅当 时,求和中的项不为零,因此:

Here, the only non-zero term in the summation occurs when and , so:

将结果组合在一起,可以得出:

Combining the results, we get:

2. 对行列式求导 | Derivative of Determinant

推导过程 | Derivation:

考虑行列式 ,我们利用 的行列式对其元素的导数公式。已知:

Consider the determinant , we use the formula for the derivative of the determinant with respect to its elements. It is known that:

将其扩展到整个矩阵 ,可以得到:

Expanding this to the entire matrix , we obtain:

3. 对二次型求导 | Derivative of Quadratic Form

推导过程 | Derivation:

考虑一个二次型 ,其中 是向量, 是对称矩阵。我们对 进行求导:

Consider a quadratic form , where is a vector and is a symmetric matrix. We take the derivative with respect to :

首先,我们分别对 求导:

First, we differentiate with respect to and :

将所有分量组合在一起,我们可以得出:

Combining all the components together, we obtain:

对于对称矩阵 ,有 ,所以最终可以简化为:

For a symmetric matrix , where , this simplifies to:

4. 对矩阵乘积求导 | Derivative of a Matrix Product

推导过程 | Derivation:

考虑两个矩阵 的乘积 ,对 求导。我们使用矩阵乘法的求导法则:

Consider the product of two matrices and , i.e., , and take the derivative with respect to . We use the product rule for matrix calculus:

类似地,对于右乘的情况 ,对 求导,可以得到:

Similarly, for the case of right multiplication , the derivative with respect to is:

5. 对矩阵 Frobenius 范数的求导 | Derivative of the Frobenius Norm

推导过程 | Derivation:

Frobenius 范数是矩阵元素平方和的平方根,定义为:

The Frobenius norm is the square root of the sum of the squares of the matrix elements, defined as:

其平方的导数比原范数更常用:

The derivative of its square is more commonly used:

具体推导如下:

The derivation is as follows:

的元素 求导:

Taking the derivative with respect to an element of :

组合成矩阵形式,得到:

Combining into matrix form, we get:

6. 对矩阵迹二次型求导 | Derivative of the Trace of a Quadratic Form

推导过程 | Derivation:

考虑二次型 ,其中 是对称矩阵, 是变量矩阵。我们可以通过应用迹函数的导数规则推导其对 的导数:

Consider the quadratic form , where is a symmetric matrix and is the variable matrix. We can derive its derivative with respect to by applying the derivative rule for trace functions:

由于 是对称的,我们得到:

Since is symmetric, we obtain:

7. 对矩阵对数行列式求导 | Derivative of the Log-Determinant

推导过程 | Derivation:

考虑对矩阵 的对数行列式 的求导。已知:

Consider the derivative of the log-determinant with respect to the matrix . It is known that:

推导可以通过以下关系来理解:

This derivation can be understood by the relationship:

并结合对迹函数的求导公式:

And applying the derivative formula for the trace function:

8. 对向量内积求导 | Derivative of a

Vector Inner Product

推导过程 | Derivation:

考虑两个向量 的内积 ,对 求导得到:

Consider the inner product of two vectors and , , the derivative with respect to is:

类似地,对 求导,得到:

Similarly, the derivative with respect to is:


这些推导为常见的矩阵求导公式提供了清晰的理论基础。这些公式在实际应用中,例如机器学习、优化、信号处理等领域,能够有效简化推导过程和计算复杂度。