2020S2

Problem 1

Let $A$ be a propositional variable, and $L_{i}$ be a literal (i.e., a propositional variable or negation of a propositional variable). In this problem, a propositional formula of the following form is called a clause.

$L_{1} \land \dots \land L_{n} \supset A$

If $n = 1$ , it is of the form $L_{1} \supset A$ , and if $n = 0$ , it is of the form $A$ . Hereinafter, $Π$ is a set of clauses, and $M$ is a set of propositional variables. If all clauses in $Π$ are true under the interpretation in which all propositional variables in $M$ are true and the other variables are false, then $M$ is called a model of $Π$ . The inclusion relation between sets is naturally defined between models.

Answer the following questions.

Let $Π_{0} = {P, P \supset Q, Q \land \neg R \supset S, P \land \neg S \land \neg T \supset T}$ . Enumerate all the subsets of ${P, Q, R, S, T}$ that are models of $Π_{0}$ .

We write $Π_{M}$ for the set of clauses obtained from $Π$ by (i) removing all the clauses that contain negation of a propositional variable in $M$ in the left hand side of $\supset$ , and then (ii) deleting all the negated literals (negation of propositional variables) from the remaining clauses.

For $Π_{0}$ in question (1), if $M_{0} = {P, Q, S}$ , what is $Π_{M_{0}}$ ?
Show that if a model $M^{'}$ of $Π_{M}$ satisfies $M \subseteq M^{'}$ , then $M^{'}$ is a model of $Π$ .
Show that if a model $M^{'}$ of $Π$ satisfies $M^{'} \subseteq M$ , then $M^{'}$ is a model of $Π_{M}$ .
For $Π_{0}$ and $M_{0}$ in question (2), obtain the minimum model of $Π_{M_{0}}$ . Here, a model $M^{'}$ of $Π_{M}$ is called a minimum model of $Π_{M}$ if $M^{'} \subseteq M^{''}$ holds for every model $M^{''}$ of $Π_{M}$ .
Show that if the minimum model of $Π_{M}$ coincides with $M$ , then $M$ is a minimal model of $Π$ . Here, a model $M$ of $Π$ is called a minimal model of $Π$ if there does not exist a model $M^{''}$ of $Π$ such that $M^{''} \subset M^{'}$ .
Is a minimal model $M$ of $Π$ always a minimum model of $Π_{M}$ ? If so, prove the fact. Otherwise, give a counterexample.

设 $A$ 是一个命题变量， $L_{i}$ 是一个文字（即，命题变量或命题变量的否定）。在这个问题中，以下形式的命题公式称为子句。

$L_{1} \land \dots \land L_{n} \supset A$

如果 $n = 1$ ，则形式为 $L_{1} \supset A$ ，如果 $n = 0$ ，则形式为 $A$ 。以下， $Π$ 是一组子句， $M$ 是一组命题变量。如果在解释中 $Π$ 中的所有子句在 $M$ 中所有命题变量为真且其他变量为假的情况下为真，则称 $M$ 为 $Π$ 的模型。模型之间的包含关系自然地定义在集合之间。

回答以下问题。

令 $Π_{0} = {P, P \supset Q, Q \land \neg R \supset S, P \land \neg S \land \neg T \supset T}$ 。枚举 ${P, Q, R, S, T}$ 的所有子集，它们是 $Π_{0}$ 的模型。

我们写 $Π_{M}$ 来表示通过 (i) 删除所有在 $\supset$ 的左边包含 $M$ 中命题变量的否定的子句，并 (ii) 从剩余的子句中删除所有否定的文字（命题变量的否定）后从 $Π$ 得到的子句集合。

对于问题 (1) 中的 $Π_{0}$ ，如果 $M_{0} = {P, Q, S}$ ，那么 $Π_{M_{0}}$ 是什么？
证明如果 $Π_{M}$ 的一个模型 $M^{'}$ 满足 $M \subseteq M^{'}$ ，那么 $M^{'}$ 是 $Π$ 的一个模型。
证明如果 $Π$ 的一个模型 $M^{'}$ 满足 $M^{'} \subseteq M$ ，那么 $M^{'}$ 是 $Π_{M}$ 的一个模型。
对于问题 (2) 中的 $Π_{0}$ 和 $M_{0}$ ，求出 $Π_{M_{0}}$ 的最小模型。这里，如果 $Π_{M}$ 的一个模型 $M^{'}$ 满足对于 $Π_{M}$ 的每一个模型 $M^{''}$ ，有 $M^{'} \subseteq M^{''}$ ，则称 $M^{'}$ 为 $Π_{M}$ 的最小模型。
证明如果 $Π_{M}$ 的最小模型与 $M$ 重合，则 $M$ 是 $Π$ 的最小模型。这里，如果 $Π$ 的一个模型 $M$ 满足不存在 $Π$ 的一个模型 $M^{''}$ 使得 $M^{''} \subset M$ ，则称 $M$ 为 $Π$ 的最小模型。
一个 $Π$ 的最小模型 $M$ 是否总是 $Π_{M}$ 的最小模型？如果是这样，证明这一事实。否则，给出一个反例。

Problem 2

Consider the problem of obtaining a sequence of part-of-speech (POS) tags $t = (t_{1}, \dots, t_{ℓ})$ for a given natural language sentence (word sequence) $w = (w_{1}, \dots, w_{ℓ})$ . For example, for the following four-word sentence

(w_{1}, w_{2}, w_{3}, w_{4}) = (John, wrote, a, book),

our goal is to output the following POS tag sequence.

(t_{1}, t_{2}, t_{3}, t_{4}) = (NOUN, VERB, DET, NOUN)

In this example, NOUN, VERB, and DET are POS tags, denoting noun, verb, and determiner, respectively. $W$ is a finite set of all words, and $T$ is a finite set of all POS tags. Suppose that a POS-tagged corpus $D = {(w^{(k)}, t^{(k)}) ∣ k \in {1, \dots, N}}$ ( $w^{(k)}$ is a $k$ -th sentence in $D$ , $t^{(k)}$ is its POS tag sequence, and $N > 0$ is the number of elements in $D$ ) is given as training data. In the following, for a word sequence $w = (w_{1}, \dots, w_{ℓ})$ , its length $ℓ$ is represented as $∣ w ∣$ .

Answer the following questions.

Consider the probability $p_{u} (t ∣ w)$ for assigning a POS tag $t \in T$ to a word $w \in W$ , and define the probability function $p_{u} (t ∣ w)$ for assigning a POS tag sequence $t$ to a word sequence $w$ as follows.

p_{u} (t ∣ w) \equiv i = 1 \prod ∣ w ∣ p_{u} (t_{i} ∣ w_{i})

Supposing each element of training data $D$ is independently distributed following $p_{u} (t ∣ w)$ , answer a method for computing the maximum likelihood estimate of $p_{u} (t ∣ w)$ .

Assume that $p_{u} (t ∣ w)$ is given for each $w \in W, t \in T$ . Describe an algorithm to obtain a POS tag sequence $t$ that maximizes $p_{u} (t ∣ w)$ for an input sentence $w$ .
Consider the probability $p_{b} (t ∣ v, w)$ for assigning a POS tag $t \in T$ to a word $w \in W$ when a word $v \in W$ immediately precedes $w$ in a sentence. Note that $v$ is considered as a special word <s> when $w$ is the first word of the sentence. The probability function $p_{b} (t ∣ w)$ is defined as follows.

$p_{b} (t ∣ w) \equiv p_{b} (t_{1} ∣ <s>, w_{1}) \prod_{i = 2}^{∣ w ∣} p_{b} (t_{i} ∣ w_{i - 1}, w_{i})$

Supposing each element of training data $D$ is independently distributed following $p_{b} (t ∣ w)$ , answer a method for computing the maximum likelihood estimate of $p_{b} (t ∣ v, w)$ .

Also, assuming that $p_{b} (t ∣ v, w)$ is given for each $v \in W \cup {<s>}, w \in W, t \in T$ , describe an algorithm to obtain a POS tag sequence $t$ that maximizes $p_{b} (t ∣ w)$ for an input sentence $w$ .

Explain why POS tagging using hidden Markov models is expected to attain higher accuracy than the methods described in questions (1) to (3). You must describe the definition of hidden Markov models and the POS tagging algorithm using hidden Markov models, and provide an explanation including an example where the methods described in questions (1) to (3) output a wrong POS tag but the POS tagging using hidden Markov models outputs a correct POS tag.

考虑为一个给定的自然语言句子（单词序列） $w = (w_{1}, \dots, w_{ℓ})$ 获得词性标签（POS 标签）序列 $t = (t_{1}, \dots, t_{ℓ})$ 的问题。例如，对于以下四个单词的句子

(w_{1}, w_{2}, w_{3}, w_{4}) = (John, wrote, a, book),

我们的目标是输出以下词性标签序列。

(t_{1}, t_{2}, t_{3}, t_{4}) = (NOUN, VERB, DET, NOUN)

在这个例子中，NOUN, VERB 和 DET 是词性标签，分别表示名词、动词和限定词。 $W$ 是所有单词的有限集合， $T$ 是所有词性标签的有限集合。假设给定了一个词性标注的语料库 $D = {(w^{(k)}, t^{(k)}) ∣ k \in {1, \dots, N}}$ （ $w^{(k)}$ 是 $D$ 中的第 $k$ 个句子， $t^{(k)}$ 是其词性标签序列， $N > 0$ 是 $D$ 中的元素数目）作为训练数据。在下文中，对于一个单词序列 $w = (w_{1}, \dots, w_{ℓ})$ ，其长度 $ℓ$ 表示为 $∣ w ∣$ 。

回答以下问题。

考虑为单词 $w \in W$ 分配词性标签 $t \in T$ 的概率 $p_{u} (t ∣ w)$ ，并定义为单词序列 $w$ 分配词性标签序列 $t$ 的概率函数 $p_{u} (t ∣ w)$ 如下。

p_{u} (t ∣ w) \equiv i = 1 \prod ∣ w ∣ p_{u} (t_{i} ∣ w_{i})

假设训练数据 $D$ 的每个元素是独立分布的，服从 $p_{u} (t ∣ w)$ ，回答计算 $p_{u} (t ∣ w)$ 的最大似然估计的方法。

假设每个 $w \in W, t \in T$ 给定了 $p_{u} (t ∣ w)$ 。描述一个算法以获得对于输入句子 $w$ 最大化 $p_{u} (t ∣ w)$ 的词性标签序列 $t$ 。
考虑当单词 $v \in W$ 在句子中立即出现在 $w$ 之前时，为单词 $w \in W$ 分配词性标签 $t \in T$ 的概率 $p_{b} (t ∣ v, w)$ 。注意，当 $w$ 是句子的第一个单词时， $v$ 被认为是一个特殊的单词 <s>。概率函数 $p_{b} (t ∣ w)$ 定义如下。

p_{b} (t ∣ w) \equiv p_{b} (t_{1} ∣ <s>, w_{1}) i = 2 \prod ∣ w ∣ p_{b} (t_{i} ∣ w_{i - 1}, w_{i})

假设训练数据 $D$ 的每个元素是独立分布的，服从 $p_{b} (t ∣ w)$ ，回答计算 $p_{b} (t ∣ v, w)$ 的最大似然估计的方法。

同样，假设每个 $v \in W \cup {<s>}, w \in W, t \in T$ 给定了 $p_{b} (t ∣ v, w)$ ，描述一个算法以获得对于输入句子 $w$ 最大化 $p_{b} (t ∣ w)$ 的词性标签序列 $t$ 。

解释为什么使用隐马尔可夫模型的词性标注方法预期会比问题 (1) 到 (3) 中描述的方法获得更高的准确率。你必须描述隐马尔可夫模型的定义和使用隐马尔可夫模型的词性标注算法，并提供一个包括在问题 (1) 到 (3) 中描述的方法输出错误词性标签但使用隐马尔可夫模型的词性标注方法输出正确词性标签的例子的解释。

Problem 3

Let us consider representing an orientation of an object in a three-dimensional space as a rotation from a canonical orientation. In the following, we consider the orientation of the object consisting of four cubes as shown in Figures 1 and 2 (note that each pair of cubes placed side by side shares a surface). We also define the canonical orientation as the one illustrated in Figure 1. Note that the center of the cube at the lower left in Figure 1 is placed at the origin, and the center of each cube lies on the $x, y, z$ axis. Figure 2 shows another orientation of this object, in which the center of the cube at the lower right is placed at the origin, and the center of each cube lies on the $x, y, z$ axis. When you draw another orientation of this object, you need to draw it with the $x, y, z$ axes from the same point of view as Figures 1 and 2. Angles must be represented in radian.

Answer the following questions.

Draw the object with the orientation represented by Euler angles $(π, π, 0)$ .
Answer Euler angles given as the elementwise arithmetic mean of the two triplets of Euler angles, $(0, 0, 0)$ (corresponding to the canonical orientation) and $(π, π, 0)$ . Also, draw the object with the orientation represented by the mean Euler angles.

Let us represent an orientation of the object using a $3 \times 3$ transformation matrix that corresponds to a rotation from the canonical orientation given in Figure 1. Note that a point on the object at a three-dimensional coordinate $v$ moves to another three-dimensional coordinate $R v$ through the rotation by a transformation matrix $R$ .

Answer the transformation matrix that represents the orientation shown in Figure 2.
Answer the transformation matrix given as the elementwise arithmetic mean of the two transformation matrices corresponding to the orientations shown in Figures 1 and 2. Also, draw and describe the shape of the object obtained by applying the mean transformation matrix to the object shown in Figure 1.

Let us represent an orientation of the object using a quaternion that expresses a rotation from the canonical orientation given in Figure 1. A quaternion is a four-dimensional unit vector, and a quaternion that expresses the three-dimensional rotation centered at the origin around a three-dimensional unit vector $v = (v_{x}, v_{y}, v_{z})$ with an angle $θ$ is represented as $(v_{x} sin \frac{θ}{2}, v_{y} sin \frac{θ}{2}, v_{z} sin \frac{θ}{2}, cos \frac{θ}{2})$ .

Note that the angle of a rotation around the three-dimensional unit vector is positive if the rotation is in the clockwise direction when the unit vector is viewed from its start point.

Answer the quaternion that corresponds to the orientation shown in Figure 2. There are two answers for this question; give both of the two answers.
Answer the quaternion given as the spherical linear average of the two quaternions corresponding to the orientations shown in Figures 1 and 2. There are two answers for this question; give both of the two answers. Also, draw the object with the orientation represented by each of the average quaternions. Note that the spherical linear average of two quaternions $q_{1}$ and $q_{2}$ is given as $\frac{s i n 0.5 φ}{s i n φ} q_{1} + \frac{s i n 0.5 φ}{s i n φ} q_{2}$ where $φ = cos^{- 1} (q_{1} \cdot q_{2})$ $(0 \leq φ \leq π)$ ; the spherical linear average is undefined when $sin φ = 0$ .

让我们考虑在三维空间中表示物体的方向为相对于标准方向的旋转。在下文中，我们考虑由图 1 和图 2 所示的四个立方体组成的物体的方向（注意每对并排放置的立方体共享一个表面）。我们还将标准方向定义为图 1 所示的方向。注意，图 1 左下方立方体的中心放置在原点，每个立方体的中心都位于 $x, y, z$ 轴上。图 2 显示了该物体的另一种方向，其中右下角立方体的中心放置在原点，并且每个立方体的中心都位于 $x, y, z$ 轴上。当你绘制该物体的另一个方向时，你需要从与图 1 和图 2 相同的视角绘制 $x, y, z$ 轴。角度必须用弧度表示。

回答以下问题。

画出由欧拉角 $(π, π, 0)$ 表示的方向的物体。
回答由欧拉角 $(0, 0, 0)$ （对应于标准方向）和 $(π, π, 0)$ 的两组三元组的元素平均值给出的欧拉角。还要画出由平均欧拉角表示的方向的物体。

我们使用 $3 \times 3$ 转换矩阵表示物体的方向，该转换矩阵对应于图 1 给出的标准方向的旋转。注意，物体上在三维坐标 $v$ 上的点通过转换矩阵 $R$ 旋转到另一个三维坐标 $R v$ 。

回答表示图 2 所示方向的转换矩阵。
回答由图 1 和图 2 所示方向的两个转换矩阵的元素平均值给出的转换矩阵。还要描述通过对图 1 中显示的物体应用平均转换矩阵得到的物体的形状。

我们使用四元数表示物体的方向，四元数表示从图 1 给出的标准方向的旋转。四元数是一个四维单位向量，表示绕三维单位向量 $v = (v_{x}, v_{y}, v_{z})$ 在原点的三维旋转的四元数表示为 $(v_{x} sin \frac{θ}{2}, v_{y} sin \frac{θ}{2}, v_{z} sin \frac{θ}{2}, cos \frac{θ}{2})$ 。

注意，如果旋转方向是从单位向量的起点看时是顺时针方向，则围绕三维单位向量的旋转角度为正。

回答表示图 2 所示方向的四元数。这个问题有两个答案，请给出两个答案。
回答由图 1 和图 2 所示方向的两个四元数的球面线性平均给出的四元数。这个问题有两个答案，请给出两个答案。还要绘制由每个平均四元数表示的方向的物体。注意，两个四元数 $q_{1}$ 和 $q_{2}$ 的球面线性平均表示为 $\frac{s i n 0.5 φ}{s i n φ} q_{1} + \frac{s i n 0.5 φ}{s i n φ} q_{2}$ ，其中 $φ = cos^{- 1} (q_{1} \cdot q_{2})$ $(0 \leq φ \leq π)$ ；当 $sin φ = 0$ 时，球面线性平均未定义。

Problem 4

Consider a connected undirected graph $G = (V, E)$ with positive edge weights. A subgraph $G^{'} = (V, E^{'})$ of $G$ obtained by removing some of the edges in $G$ is called a spanning tree of $G$ , if $G^{'}$ is a tree. The summation of weights of all the edges in a spanning tree is called the weight of the spanning tree. A minimum spanning tree of $G$ is a spanning tree of $G$ whose weight is minimum. You can assume appropriate data representation for graphs and trees in the questions below.

Answer the following questions.

Let $e$ be the edge (or arbitrary one of the edges if there are multiple such edges) with the maximum weight in some arbitrary cycle $C$ in $G$ . Prove that there is a minimum spanning tree of $G$ that does not contain $e$ .
Consider an arbitrary vertex subset $V^{'}$ of $V$ ( $V^{'} \neq = V, V^{'} \neq = \emptyset$ ) for $G = (V, E)$ . Let $e$ be the edge (or arbitrary one of the edges if there are multiple such edges) with the minimum weight among the edges $(u, v) \in E$ such that $u \in V^{'}$ and $v \in V - V^{'}$ . Prove that there is a minimum spanning tree that contains $e$ . Note that $\emptyset$ denotes an empty set.
Describe an $O (∣ E ∣)$ -time algorithm that finds an arbitrary path between two nodes $u, v \in V$ on graph $G = (V, E)$ .
Assume that we are given a graph $G = (V, E)$ and its minimum spanning tree $T$ . Let $G^{'}$ be the graph obtained by adding to $G$ a new edge $e = (u, v) \neq \in E$ $(u, v \in V)$ with weight $w > 0$ . Describe an $O (∣ V ∣)$ -time algorithm that finds a minimum spanning tree of $G^{'}$ .
Prove the correctness of the algorithm described in question (4).

考虑一个具有正边权的连通无向图 $G = (V, E)$ 。如果 $G$ 的一个子图 $G^{'} = (V, E^{'})$ 是一个树，则称其为 $G$ 的生成树。生成树中所有边的权重之和称为生成树的权重。 $G$ 的最小生成树是 $G$ 的一个生成树，其权重最小。你可以在以下问题中假设图和树的适当数据表示。

回答以下问题。

设 $e$ 是在 $G$ 的某个任意循环 $C$ 中具有最大权重的边（如果有多条这样的边，则任意选取一条）。证明存在一个不包含 $e$ 的 $G$ 的最小生成树。
对于 $G = (V, E)$ 的任意顶点子集 $V^{'}$ ( $V^{'} \neq = V, V^{'} \neq = \emptyset$ )，设 $e$ 是权重最小的边（如果有多条这样的边，则任意选取一条），该边位于 $V^{'}$ 和 $V - V^{'}$ 的顶点之间，即 $(u, v) \in E$ 使得 $u \in V^{'}$ 且 $v \in V - V^{'}$ 。证明存在一个包含 $e$ 的最小生成树。注意， $\emptyset$ 表示空集。
描述一个 $O (∣ E ∣)$ 时间的算法，用于找到图 $G = (V, E)$ 上的两个节点 $u, v \in V$ 之间的任意路径。
假设我们给定一个图 $G = (V, E)$ 及其最小生成树 $T$ 。设 $G^{'}$ 是通过向 $G$ 中添加一条新边 $e = (u, v) \neq \in E$ $(u, v \in V)$ ，且权重 $w > 0$ 得到的图。描述一个 $O (∣ V ∣)$ 时间的算法，以找到 $G^{'}$ 的一个最小生成树。
证明问题 (4) 中描述的算法的正确性。

Problem 5

Suppose that $f (x)$ is a real function defined on a closed interval from $a$ to $b$ $(a < b)$ . Suppose that $n$ is an integer that is no less than 2, and define $h = (b - a) / n$ . Then, for each integer $i = 0, 1, \dots, n$ , define $x_{i} = a + ih$ and $f_{i} = f (x_{i})$ , respectively. Namely, $x_{0}, \dots, x_{n}$ are the points that divide the interval from $a$ to $b$ into $n$ equal parts, and $f_{i}$ is the value of the function $f (x)$ at $x = x_{i}$ .

Next, define $J = \int_{a}^{b} f (x) d x$ , and define $J_{n}$ as the approximate value calculated by the composite trapezoid rule applied on $J$ using the points which divide the interval from $a$ to $b$ into $n$ equal parts.

Answer the following questions.

Assume that $f (x)$ is a four times continuously differentiable function. Let $k$ be an integer such that $0 < k < n$ and define $f_{k}^{''}$ as the second order differential of $f (x)$ at $x_{k}$ . Express an approximate value of $f_{k}^{''}$ whose error is $O (h^{2})$ , as a linear combination of $f_{k - 1}, f_{k}$ , and $f_{k + 1}$ .
The approximation obtained by question (1) seems to become accurate when $h$ approaches zero. Answer, with a reason, whether this is correct or not in the calculation with the IEEE 754 double precision floating point operations.
Express $J_{n}$ using $n, h$ , and $f_{i}$ $(i = 0, \dots, n)$ .
Assume that $f (x)$ can be expressed by a quadratic function in each interval formed by the division into $n$ equal parts. Then, define $J_{2 n}$ similarly using the division into $2 n$ equal parts composed of the division of each original part into two halves. Express $E_{n} = J_{n} - J$ using $J_{2 n}$ and $J_{n}$ .

假设 $f (x)$ 是一个定义在从 $a$ 到 $b$ $(a < b)$ 的闭区间上的实函数。假设 $n$ 是不小于 2 的整数，并定义 $h = (b - a) / n$ 。然后，对于每个整数 $i = 0, 1, \dots, n$ ，分别定义 $x_{i} = a + ih$ 和 $f_{i} = f (x_{i})$ 。即， $x_{0}, \dots, x_{n}$ 是将从 $a$ 到 $b$ 的区间分成 $n$ 等分的点， $f_{i}$ 是函数 $f (x)$ 在 $x = x_{i}$ 处的值。

接下来，定义 $J = \int_{a}^{b} f (x) d x$ ，并定义 $J_{n}$ 为使用复合梯形规则计算的近似值，该规则适用于使用将区间 $a$ 到 $b$ 分成 $n$ 等分的点上的 $J$ 。

回答以下问题。

假设 $f (x)$ 是四次连续可微函数。设 $k$ 是整数，使得 $0 < k < n$ ，并定义 $f_{k}^{''}$ 为 $f (x)$ 在 $x_{k}$ 处的二阶导数。表示 $f_{k}^{''}$ 的一个近似值，其误差为 $O (h^{2})$ ，作为 $f_{k - 1}, f_{k}$ 和 $f_{k + 1}$ 的线性组合。
问题 (1) 中获得的近似值在 $h$ 趋近于零时似乎变得准确。回答，在使用 IEEE 754 双精度浮点运算进行计算时，这是否正确，并给出理由。
使用 $n, h$ 和 $f_{i}$ $(i = 0, \dots, n)$ 表示 $J_{n}$ 。
假设 $f (x)$ 可以在每个由分成 $n$ 等分形成的区间中用二次函数表示。然后，类似地定义 $J_{2 n}$ ，使用将每个原始部分分成两半的 $2 n$ 等分的划分。使用 $J_{2 n}$ 和 $J_{n}$ 表示 $E_{n} = J_{n} - J$ 。

Problem 6

The probability density function of the normal distribution $N (μ, σ^{2})$ with mean $μ \in R$ and variance $σ^{2} > 0$ is given by

f (x) = \frac{1}{2 π σ ^{2}} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}}) .

Let $X$ and $Z$ be random variables that independently follow $N (μ, 1)$ and $N (0, 1)$ , respectively, and define $Y = θX + Z$ for some constant $θ \in R$ . For an integer $n > 1$ , let $(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})$ be two-dimensional random variables that independently follow the same distribution as $(X, Y)$ , for which we write $X^{(n)} = (X_{1}, X_{2}, \dots, X_{n})$ and $Y^{(n)} = (Y_{1}, Y_{2}, \dots, Y_{n})$ .

Answer the following questions.

Express the expectation $E [Y]$ and variance $V [Y]$ of $Y$ using $μ$ and $θ$ .
Show that the conditional distribution of $X$ given $Y$ is a normal distribution, and express its expectation $E [X ∣ Y]$ and variance $V [X ∣ Y]$ using $μ$ , $θ$ , and $Y$ .
Let $(x^{(n)}, y^{(n)})$ denote a realization of $(X^{(n)}, Y^{(n)})$ . Express the joint probability density function $p_{μ, θ} (x^{(n)}, y^{(n)})$ of $(X^{(n)}, Y^{(n)})$ using $μ, θ, x^{(n)} = (x_{1}, x_{2}, \dots, x_{n})$ and $y^{(n)} = (y_{1}, y_{2}, \dots, y_{n})$ .
Consider maximum-likelihood estimation of $(μ, θ)$ by the EM algorithm for the case where the observation of $X_{n}$ is missing from $(X^{(n)}, Y^{(n)})$ , that is, the case where $(X^{(n - 1)}, Y^{(n)})$ is observed. Then the update rule of estimators of $(μ, θ)$ by the EM algorithm for some initial value $(μ_{0}, θ_{0}) \in R^{2}$ is given by

(μ_{t + 1}, θ_{t + 1}) = (μ, θ) \in R^{2} ar g max E_{X_{n} \sim N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})} [lo g p_{μ, θ} (X^{(n)}, Y^{(n)})], t = 0, 1, \dots,

where $\overset{μ}{ˉ}$ and $\overset{σ}{ˉ}^{2}$ are the values obtained by the substitution $(μ, θ, Y)$ in the expressions of $E [X ∣ Y]$ and $V [X ∣ Y]$ obtained in question (2), respectively, and $E_{X_{n} \sim N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})}$ denotes the expectation when $X_{n}$ follows $N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})$ and $(X^{(n - 1)}, Y^{(n)})$ is fixed.

(i) Express $E_{X_{n} \sim N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})} [lo g p_{μ, θ} (X^{(n)}, Y^{(n)})]$ using $μ, θ, \overset{μ}{ˉ}, \overset{σ}{ˉ}^{2}, X^{(n - 1)}$ and $Y^{(n)}$ .

(ii) Express $(μ_{t + 1}, θ_{t + 1})$ using $n, \overset{μ}{ˉ}, \overset{σ}{ˉ}^{2}, X^{(n - 1)}$ and $Y^{(n)}$ .

正态分布 $N (μ, σ^{2})$ 的概率密度函数，其均值 $μ \in R$ 和方差 $σ^{2} > 0$ ，表示为

f (x) = \frac{1}{2 π σ ^{2}} exp (- \frac{( x - μ ) ^{2}}{2 σ ^{2}}) .

设 $X$ 和 $Z$ 是独立服从 $N (μ, 1)$ 和 $N (0, 1)$ 的随机变量，定义 $Y = θX + Z$ ，其中 $θ \in R$ 为常数。对于整数 $n > 1$ ，设 $(X_{1}, Y_{1}), (X_{2}, Y_{2}), \dots, (X_{n}, Y_{n})$ 为服从 $(X, Y)$ 相同分布的二维随机变量，我们记 $X^{(n)} = (X_{1}, X_{2}, \dots, X_{n})$ 和 $Y^{(n)} = (Y_{1}, Y_{2}, \dots, Y_{n})$ 。

回答以下问题。

使用 $μ$ 和 $θ$ 表示 $Y$ 的期望 $E [Y]$ 和方差 $V [Y]$ 。
证明 $X$ 在给定 $Y$ 条件下的条件分布是正态分布，并使用 $μ, θ$ 和 $Y$ 表示其期望 $E [X ∣ Y]$ 和方差 $V [X ∣ Y]$ 。
令 $(x^{(n)}, y^{(n)})$ 表示 $(X^{(n)}, Y^{(n)})$ 的一个实现。使用 $μ, θ, x^{(n)} = (x_{1}, x_{2}, \dots, x_{n})$ 和 $y^{(n)} = (y_{1}, y_{2}, \dots, y_{n})$ 表示 $(X^{(n)}, Y^{(n)})$ 的联合概率密度函数 $p_{μ, θ} (x^{(n)}, y^{(n)})$ 。
考虑使用 EM 算法对 $(μ, θ)$ 的极大似然估计的情况，其中 $X_{n}$ 的观测值缺失于 $(X^{(n)}, Y^{(n)})$ ，即 $(X^{(n - 1)}, Y^{(n)})$ 被观测到的情况。然后，使用 EM 算法对某些初始值 $(μ_{0}, θ_{0}) \in R^{2}$ 的 $(μ, θ)$ 的估计器的更新规则为

(μ_{t + 1}, θ_{t + 1}) = (μ, θ) \in R^{2} ar g min E_{X_{n} \sim N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})} [lo g p_{μ, θ} (X^{(n)}, Y^{(n)})], t = 0, 1, \dots,

其中 $\overset{μ}{ˉ}$ 和 $\overset{σ}{ˉ}^{2}$ 是在问题 (2) 中获得的 $E [X ∣ Y]$ 和 $V [X ∣ Y]$ 的表达式中通过代入 $(μ, θ, Y)$ 得到的值， $E_{X_{n} \sim N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})}$ 表示当 $X_{n}$ 服从 $N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})$ 且 $(X^{(n - 1)}, Y^{(n)})$ 固定时的期望。

(i) 使用 $μ, θ, \overset{μ}{ˉ}, \overset{σ}{ˉ}^{2}, X^{(n - 1)}$ 和 $Y^{(n)}$ 表示

$E_{X_{n} \sim N (\overset{μ}{ˉ}, \overset{σ}{ˉ}^{2})} [lo g p_{μ, θ} (X^{(n)}, Y^{(n)})]$ 。

(ii) 使用 $n, \overset{μ}{ˉ}, \overset{σ}{ˉ}^{2}, X^{(n - 1)}$ 和 $Y^{(n)}$ 表示 $(μ_{t + 1}, θ_{t + 1})$ 。

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

Explorer

2020S2

2020S2

Problem 1

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Graph View

Table of Contents

Backlinks