IS CS-2021S-04

题目来源：Problem 4 日期：2024-08-04 题目主题：CS-机器学习-线性回归

解题思路

我们要解决的主要问题是通过给定的数据集找到一个线性预测器 $f (x) = w^{T} x$ ，使得预测误差的平方和最小化。给定了数据生成过程和噪声的假设，我们需要推导出最优权重向量 $\hat{w}$ ，并分析在有噪声的情况下损失函数的期望。

Solution

Question 1: Express $\hat{w}$ using $X, Y, Φ$ , and $n$

To find the optimal weight vector $\hat{w}$ , we minimize the loss function $L (w)$ defined as:

L (w) = \frac{1}{2 n} i = 1 \sum n (y_{i} - w^{T} x_{i})^{2} = \frac{1}{2 n} ∥ Y - Xw ∥_{2}^{2} .

To minimize $L (w)$ , we take the derivative of $L (w)$ with respect to $w$ and set it to zero:

\nabla L (w) = - \frac{1}{n} X^{T} (Y - Xw) = 0.

Solving for $w$ gives:

X^{T} Y = X^{T} Xw .

Thus, the optimal weight vector $\hat{w}$ is:

\hat{w} = (X^{T} X)^{- 1} X^{T} Y = Φ^{- 1} (\frac{1}{n} X^{T} Y) .

Question 2: Express $E_{e} [L (w)]$ in the form of $\frac{1}{2} ∥ w - w^{*} ∥_{A}^{2} + b$

To express $E_{e} [L (w)]$ , we first express $L (w)$ :

L (w) = \frac{1}{2 n} (Y - Xw)^{T} (Y - Xw) .

Using the data generation model $y_{i} = w^{* T} x_{i} + ϵ_{i}$ , we can write $Y = X w^{*} + e$ . Then:

E_{e} [L (w)] = \frac{1}{2 n} E_{e} [(X w^{*} + e - Xw)^{T} (X w^{*} + e - Xw)] .

Expanding and using the properties of expectation:

E_{e} [L (w)] = \frac{1}{2 n} [(w - w^{*})^{T} X^{T} X (w - w^{*}) + E_{e} [e^{T} e]] .

Since $E [e] = 0$ and $E [e e^{T}] = σ^{2} I$ , we have:

E_{e} [L (w)] = \frac{1}{2} ∥ w - w^{*} ∥_{Φ}^{2} + \frac{σ ^{2}}{2} .

Here, the matrix $A$ is $Φ$ and the scalar $b$ is $\frac{σ ^{2}}{2}$ .

Question 3: Express $E_{e} [L (w)] - E_{e} [L (\hat{w})]$ in the form of $\frac{σ ^{2}}{2 n} tr (B)$

We have:

E_{e} [L (\hat{w})] = \frac{σ ^{2}}{2 n} .

Thus:

E_{e} [L (w)] - E_{e} [L (\hat{w})] = \frac{1}{2} (w - w^{*})^{T} Φ (w - w^{*}) + \frac{σ ^{2}}{2} - \frac{σ ^{2}}{2 n} .

Therefore, the matrix $B$ is $Φ$ .

Question 4: Explain what problem arises when $Φ$ is not a regular matrix and suggest a way to remedy the problem

When $Φ$ is not a regular matrix, it is singular and cannot be inverted. This usually happens when the features are linearly dependent, leading to multicollinearity. This makes the computation of $\hat{w}$ unstable or impossible.

A common remedy is to add a regularization term to the loss function, which is known as Ridge Regression. The modified loss function becomes:

L (w) = \frac{1}{2 n} ∥ Y - Xw ∥_{2}^{2} + \frac{λ}{2} ∥ w ∥_{2}^{2},

where $λ > 0$ is a regularization parameter. The solution then becomes:

\hat{w} = (Φ + λ I)^{- 1} (\frac{1}{n} X^{T} Y) .

知识点

机器学习线性回归最小二乘法岭回归

解题技巧和信息

在回归问题中，当自变量之间存在共线性问题时，使用岭回归可以增加模型的稳定性并避免参数过大。理解最小二乘法的优化问题如何转化为矩阵求解问题是非常重要的。此外，加入正则化项可以有效地解决过拟合问题。

重点词汇

trace (迹) - 矩阵对角线元素之和
regular matrix (正规矩阵) - 具有满秩的矩阵，即矩阵的行列式非零
regularization (正则化) - 添加到损失函数的额外项，以约束模型复杂度并提高泛化能力

参考资料

The Elements of Statistical Learning, Trevor Hastie, Robert Tibshirani, and Jerome Friedman, Chap. 3
Pattern Recognition and Machine Learning, Christopher Bishop, Chap. 4

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

Explorer

IS_CS-2021S-04

IS CS-2021S-04

解题思路

Solution

Question 1: Express $\hat{w}$ using $X, Y, Φ$ , and $n$

Question 2: Express $E_{e} [L (w)]$ in the form of $\frac{1}{2} ∥ w - w^{*} ∥_{A}^{2} + b$

Question 3: Express $E_{e} [L (w)] - E_{e} [L (\hat{w})]$ in the form of $\frac{σ ^{2}}{2 n} tr (B)$

Question 4: Explain what problem arises when $Φ$ is not a regular matrix and suggest a way to remedy the problem

知识点

解题技巧和信息

重点词汇

参考资料

Graph View

Table of Contents

Backlinks

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

IS_CS-2021S-04

IS CS-2021S-04

解题思路

Solution

Question 1: Express w^ using X,Y,Φ, and n

Question 2: Express Ee​[L(w)] in the form of 21​∥w−w∗∥A2​+b

Question 3: Express Ee​[L(w)]−Ee​[L(w^)] in the form of 2nσ2​tr(B)

Question 4: Explain what problem arises when Φ is not a regular matrix and suggest a way to remedy the problem

知识点

解题技巧和信息

重点词汇

参考资料

Graph View

Table of Contents

Backlinks

Question 1: Express $\hat{w}$ using $X, Y, Φ$ , and $n$

Question 2: Express $E_{e} [L (w)]$ in the form of $\frac{1}{2} ∥ w - w^{*} ∥_{A}^{2} + b$

Question 3: Express $E_{e} [L (w)] - E_{e} [L (\hat{w})]$ in the form of $\frac{σ ^{2}}{2 n} tr (B)$

Question 4: Explain what problem arises when $Φ$ is not a regular matrix and suggest a way to remedy the problem