1# Variance and Covariance 方差与协方差
1. Definitions 定义
Variance 方差
Variance measures how much a set of numbers is spread out. It is the expectation of the squared deviation of a random variable from its mean.
方差度量一组数据的分散程度。它是随机变量与其均值的平方偏差的期望。
Where:
- is a random variable
- is the mean of
- denotes the expectation
其中:
- 是随机变量
- 是 的均值
- 表示期望
Covariance 协方差
Covariance measures the directional relationship between two random variables. It indicates the extent to which the variables change together.
协方差度量两个随机变量之间的方向关系。它表明变量共同变化的程度。
Where:
- are random variables
- denotes the expectation
其中:
- 是随机变量
- 表示期望
2. Important Properties 重要性质
Variance Properties 方差性质
- Non-Negative 非负性:
- Zero Variance 零方差:
- Variance of a Constant 常数的方差:
- Scaling Property 缩放性质:
- Sum of Independent Variables 独立变量和的方差: If and are independent, then
Covariance Properties 协方差性质
- Symmetry 对称性:
- Linearity 线性性质:
- Covariance of a Variable with Itself 自身的协方差:
- Sum of Covariances 协方差的和:
3. Inferences 推断
Variance Inferences 方差推断
-
Population Variance 总体方差: Population variance is usually unknown and needs to be estimated from sample data.
总体方差通常未知,需要通过样本数据进行估计。
-
Sample Variance 样本方差:
where are sample values and is the sample mean.
其中 是样本值, 是样本均值。
Covariance Inferences 协方差推断
-
Population Covariance 总体协方差: Similar to population variance, population covariance is usually unknown and needs to be estimated.
类似于总体方差,总体协方差通常未知,需要进行估计。
-
Sample Covariance 样本协方差:
where are sample values and are sample means.
其中 是样本值, 是样本均值。
期望 的推导 Derivation of
-
定义样本均值:Definition of the Sample Mean
样本均值 定义为:
The sample mean is defined as:
\mathbb{E}(\bar{X}) = \mu
3. **计算 $\mathbb{E}\left( (\bar{X} - \mu)^2 \right)$:Calculating $\mathbb{E}\left( (\bar{X} - \mu)^2 \right)$** 我们需要计算 $\mathbb{E}\left( (\bar{X} - \mu)^2 \right)$。注意到 $\bar{X}$ 是 $X_1, X_2, \ldots, X_n$ 的线性组合,因此我们可以利用方差的性质: We need to calculate $\mathbb{E}\left( (\bar{X} - \mu)^2 \right)$. Noting that $\bar{X}$ is a linear combination of $X_1, X_2, \ldots, X_n$, we can use the properties of variance:\text{Var}(\bar{X}) = \text{Var}\left( \frac{1}{n} \sum_{i=1}^n X_i \right)
4. **方差的性质:Properties of Variance** 利用方差的性质,对于独立同分布的随机变量 $X_i$,我们有: Using the properties of variance, for i.i.d. random variables $X_i$, we have:\text{Var}\left( \frac{1}{n} \sum_{i=1}^n X_i \right) = \frac{1}{n^2} \sum_{i=1}^n \text{Var}(X_i) = \frac{1}{n^2} \cdot n \sigma^2 = \frac{\sigma^2}{n}
5. **利用方差的定义:Using the Definition of Variance** 方差的定义是 $\text{Var}(X) = \mathbb{E}\left[ (X - \mathbb{E}(X))^2 \right]$,所以: The definition of variance is $\text{Var}(X) = \mathbb{E}\left[ (X - \mathbb{E}(X))^2 \right]$, thus:\text{Var}(\bar{X}) = \mathbb{E}\left[ (\bar{X} - \mathbb{E}(\bar{X}))^2 \right] = \mathbb{E}\left[ (\bar{X} - \mu)^2 \right]
\mathbb{E}\left( (\bar{X} - \mu)^2 \right) = \text{Var}(\bar{X}) = \frac{\sigma^2}{n}
### The Derivation of Unbiased Estimator for Sample Variance 样本方差无偏估计的推导 Assume we have a random variable $X$ with population mean $\mu$ and population variance $\sigma^2$. A sample of size $n$, $X_1, X_2, \ldots, X_n$, is drawn from the population, with sample mean $\bar{X}$ and sample variance $s^2$ defined as: 假设我们有一个随机变量 $X$,其总体均值为 $\mu$,总体方差为 $\sigma^2$。从总体中抽取一个样本 $X_1, X_2, \ldots, X_n$,样本均值为 $\bar{X}$,样本方差 $s^2$ 定义为:s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})
We need to prove that $\mathbb{E}(s^2) = \sigma^2$, meaning that the sample variance $s^2$ is an unbiased estimator of the population variance $\sigma^2$. 我们需要证明 $\mathbb{E}(s^2) = \sigma^2$,即样本方差 $s^2$ 是总体方差 $\sigma^2$ 的无偏估计。 1. **样本方差的展开:Expansion of Sample Variance**s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})
首先展开 $\sum_{i=1}^n (X_i - \bar{X})^2$: First, expand $\sum_{i=1}^n (X_i - \bar{X})^2$:\sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n \left[ (X_i - \mu) - (\bar{X} - \mu) \right]
= \sum_{i=1}^n (X_i - \mu)^2 - 2 \sum_{i=1}^n (X_i - \mu)(\bar{X} - \mu) + \sum_{i=1}^n (\bar{X} - \mu)
2. **处理中间项:Handling the Middle Term** 由于 $\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i$,我们可以得到: Since $\bar{X} = \frac{1}{n} \sum_{i=1}^n X_i$, we can get:\sum_{i=1}^n (X_i - \mu)(\bar{X} - \mu) = (\bar{X} - \mu) \sum_{i=1}^n (X_i - \mu)
而 $\sum_{i=1}^n (X_i - \mu) = n (\bar{X} - \mu)$,所以: And $\sum_{i=1}^n (X_i - \mu) = n (\bar{X} - \mu)$, thus:\sum_{i=1}^n (X_i - \mu)(\bar{X} - \mu) = n (\bar{X} - \mu)(\bar{X} - \mu) = n (\bar{X} - \mu)
\sum_{i=1}^n (X_i - \bar{X})^2 = \sum_{i=1}^n (X_i - \mu)^2 - 2 n (\bar{X} - \mu)^2 + n (\bar{X} - \mu)
= \sum_{i=1}^n (X_i - \mu)^2 - n (\bar{X} - \mu)
4. **求期望:Taking the Expectation** 现在我们计算 $s^2$ 的期望: Now we calculate the expectation of $s^2$:\mathbb{E}(s^2) = \mathbb{E}\left( \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2 \right)
= \frac{1}{n-1} \mathbb{E}\left( \sum_{i=1}^n (X_i - \mu)^2 - n (\bar{X} - \mu)^2 \right)
5. **利用性质:Using Properties** $\mathbb{E}\left( \sum_{i=1}^n (X_i - \mu)^2 \right) = n \sigma^2$,以及 $\mathbb{E}\left( (\bar{X} - \mu)^2 \right) = \frac{\sigma^2}{n}$,所以: $\mathbb{E}\left( \sum_{i=1}^n (X_i - \mu)^2 \right) = n \sigma^2$ and $\mathbb{E}\left( (\bar{X} - \mu)^2 \right) = \frac{\sigma^2}{n}$ [[Covariance and Variance#期望--mathbbe-left--barx---mu-2-right--frac-sigma-2n-的推导-derivation-of--mathbbe-left--barx---mu-2-right--frac-sigma-2n|Here]] , so:\mathbb{E}\left( \sum_{i=1}^n (X_i - \bar{X})^2 \right) = n \sigma^2 - n \cdot \frac{\sigma^2}{n} = n \sigma^2 - \sigma^2 = (n-1) \sigma
\mathbb{E}(s^2) = \frac{1}{n-1} (n-1) \sigma^2 = \sigma
因此,样本方差 $s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2$ 是总体方差 $\sigma^2$ 的无偏估计。 Therefore, the sample variance $s^2 = \frac{1}{n-1} \sum_{i=1}^n (X_i - \bar{X})^2$ is an unbiased estimator of the population variance $\sigma^2$.