Question 7

When we analyze the worst time complexity of an algorithm, it is worth observing how the computation time increases as , the size of input data, increases. The Landau’s notation, which is often used for representing an asymptotic upper bound of ignoring the constant factor, is defined as the following. For a positive function , if there exists a positive constant , and

holds, then

is defined to hold.

  1. and are defined to be equal if and are equivalent for any . For each of (A) to (C), find all formulae in the box below that are equal to it. If there is no such formula, just write “none”.

    (A) (B) (C)

  1. For each proposition below, determine with proof whether it holds or not. Note that and are positive functions.

(A) and are equal.

(B) If holds, then .


当我们分析一个算法的最坏时间复杂度时,观察计算时间 随输入数据大小 的增加而增加是很有价值的。Landau 的 表示法,通常用于表示忽略常数因子的 的渐近上界,定义如下。对于一个正函数 ,如果存在一个正常数 ,且

成立,则定义

成立。

  1. 如果 对任何 都等价,则定义 是相等的。对于 (A) 到 (C) 的每一个,找到下面框中等于它的所有公式。如果没有这样的公式,只写“none”。

    (A) (B) (C)

  1. 对于下面的每个命题,确定是否成立,并提供证明。注意 是正函数。
  • (A) 是相等的。

  • (B) 如果 成立,则


Question 8

Let be the set of non-zero complex two-dimensional vectors. Let be a 2 by 2 real symmetric matrix, and be the unit matrix.

  1. Find all the eigenvalues of .

  2. Under the assumption of , answer i) and ii).

    i) Let be the matrix whose first and second columns consist of the eigenvectors and for the eigenvalues and , respectively. Show that is invertible and satisfies .

    ii) Prove that the set and are equal.

  3. For each of the statements A), B), and C), answer the conditions on matrix elements for the statement to hold.

    A) Every can be expressed as with some .

    B) No can be expressed as with some .

    C) At least one can be expressed as with some .


为非零复二维向量的集合。设 为一个 2×2 的实对称矩阵, 为单位矩阵。

  1. 找出 的所有特征值

  2. 在假设 的条件下,回答 i) 和 ii)。

    i) 设 为一个矩阵,其第一列和第二列分别由特征值 的特征向量 组成。证明 是可逆的,并且满足

    ii) 证明集合 是相等的。

  3. 对于每个陈述 A), B), 和 C),回答矩阵元素 的条件使该陈述成立。

    A) 每个 都可以表示为 ,其中

    B) 没有 可以表示为 ,其中

    C) 至少有一个 可以表示为 ,其中


Question 9

An array of real numbers, , is called a heap if for , where is the quotient of integer divided by 2 (e.g., ). A heap can be treated as a binary tree such that is the root node and is the parent node of . Answer the following questions.

  1. Explain whether array is a heap or not.
  2. Show that if is a heap, is also a heap for any .
  3. Suppose that is a heap but is not. Describe a procedure that exchanges elements in to make it a heap in worst-case time.
  4. Suppose that is a heap. Prove that is maximum among all elements in . Suppose that after replacing with , is not a heap. Describe a procedure that exchanges elements in to make it a heap in worst-case time.
  5. Describe an algorithm that exchanges elements in to make it a heap in worst-case time.
  6. Using the procedures defined above, describe an algorithm that sorts array in worst-case time.

一个包含 个实数的数组 ,如果满足 对于 ,其中 是整数 除以 2 的商(例如,),则称其为堆。堆可以被看作是一个二叉树,其中 是根节点, 的父节点。回答以下问题。

  1. 解释数组 是否是一个堆。
  2. 证明如果 是一个堆,那么 也是一个堆,对于任何
  3. 假设 是一个堆,但 不是。描述一种在 最坏情况下,将 中的元素交换使其成为堆的方法。
  4. 假设 是一个堆。证明 中最大的元素。假设在用 替换 后, 不是一个堆。描述一种在 最坏情况下,将 中的元素交换使其成为堆的方法。
  5. 描述一种在 最坏情况下,将 中的元素交换使其成为堆的算法。
  6. 使用上述定义的方法,描述一种在 最坏情况下,对数组 进行排序的算法。

Question 10

A bipartite graph is an undirected graph whose nodes can be divided into two groups such that any two nodes in a group are not connected by an edge. Answer the following questions about bipartite graphs.

  1. Write the incidence matrix of the following graph .
graph TB
	subgraph BLACK_NODES
		v1
		v2
		v3
		v4
	end

	subgraph WHITE_NODES
		v5
		v6
		v7
		v8
	end
	
	v1 --- v6
	v1 --- v8
	v2 --- v5
	v2 --- v6
	v3 --- v7
	v3 --- v8
	v4 --- v5
	v4 --- v8
  1. A matching in a graph is a set of edges without common nodes. A perfect matching is a matching that covers all nodes of the graph. List all perfect matchings of .

  2. Prove that a tree is always a bipartite graph.

  3. Prove that a bipartite graph does not contain a circuit with an odd number of edges.


二分图是一个无向图,其节点可以分为两组,使得组内的任意两个节点不通过边连接。回答以下关于二分图的问题。

  1. 写出以下图 的关联矩阵。
graph TB
	subgraph BLACK_NODES
		v1
		v2
		v3
		v4
	end

	subgraph WHITE_NODES
		v5
		v6
		v7
		v8
	end
	
	v1 --- v6
	v1 --- v8
	v2 --- v5
	v2 --- v6
	v3 --- v7
	v3 --- v8
	v4 --- v5
	v4 --- v8
  1. 图中的一个匹配是一组没有公共节点的边。完美匹配是覆盖图中所有节点的匹配。列出图 的所有完美匹配。

  2. 证明树总是一个二分图。

  3. 证明二分图不包含奇数条边的回路。


Question 11

For an arbitrary random variable that takes values in non-negative integers, we define the probability generating function of as . Here, denotes the expected value of , denotes the probability that assumes value , and represents a real number.

  1. Show the following equalities (a), (b). (a) (b)

In the following, and are mutually independent, identically distributed random variables that take values in non-negative integers.

  1. For a positive integer , we define a random variable . Show .

  2. We define a random variable . Here,

Show .

(Hint: for a positive integer )

  1. For , , calculate and .

对于一个取非负整数值的任意随机变量 ,我们定义 的概率生成函数 。其中, 表示 的期望值, 表示 取值为 的概率, 表示一个实数。

  1. 证明以下等式 (a), (b)。 (a) (b)

在下列情形中, 是相互独立的同分布随机变量,取非负整数值。

  1. 对于一个正整数 ,我们定义一个随机变量 。证明

  2. 我们定义一个随机变量 。 其中,

证明

(提示: 对于一个正整数

  1. 对于 , ,计算

Question 12

Suppose that the maximum score of the global alignment of the two sequences, and , is calculated by a dynamic programming using the following iterations (A).

Here we assume that , and that is the score of alignment of and .

We also assume that the same score of is given to any gap of length .

Answer the following problems.

  1. Show the general form of .

  2. In order to get the maximum score of the alignment as after iterations using (A) for and , answer what initial values should be assigned to the following variables.

    • for
    • for

    You may suppose that

3. For the alignment of biological sequences, usually we set $d > e$. Describe the reason briefly. 4. In order to treat the score of long gaps more accurately, let us consider alignments using the following iterations. Show the general form of the gap score of length $k$, when $d_2 > d_1$, $e_2 > e_1$ holds. Note that classification on the value of $k$ is necessary.

\begin{aligned}

M(i, j) &= \max[M(i-1, j-1), X_1(i-1, j-1), Y_1(i-1, j-1),X_2(i-1, j-1), Y_2(i-1, j-1)] + s(x_i, y_j), \

X_1(i, j) &= \max[M(i-1, j) - d_1, X_1(i-1, j) - e_1], \

Y_1(i, j) &= \max[M(i, j-1) - d_1, Y_1(i, j-1) - e_1], \

X_2(i, j) &= \max[M(i-1, j) - d_2, X_2(i-1, j) - e_2], \

Y_2(i, j) &= \max[M(i, j-1) - d_2, Y_2(i, j-1) - e_2].

\end{aligned}

--- 假设通过使用以下迭代 (A) 计算两个序列 $x = x_1, \ldots, x_m$ 和 $y = y_1, \ldots, y_n$ 的全局比对的最大得分。

\begin{aligned}

M(i, j) &= \max[M(i-1, j-1), X(i-1, j-1), Y(i-1, j-1)] + s(x_i, y_j), \

X(i, j) &= \max[M(i-1, j) - d, X(i-1, j) - e], \

Y(i, j) &= \max[M(i, j-1) - d, Y(i, j-1) - e].

\end{aligned} \tag{A}

这里我们假设 $d > 0$,$e > 0$ 并且 $s(x_i, y_j)$ 是 $x_i$ 和 $y_j$ 的比对得分。 我们还假设对于任意长度为 $k$ 的间隙给出了相同的得分 $g(k)$。 回答以下问题。 1. 展示 $g(k)$ 的一般形式。 2. 为了在使用 (A) 迭代后获得最大比对得分 $\max[M(m, n), X(m, n), Y(m, n)]$,回答应为以下变量分配什么初始值。 - $X(i, 0)$ 对于 $i = 1, \ldots, m$ - $Y(0, j)$ 对于 $j = 1, \ldots, n$ 你可以假设

\begin{aligned}

M(0, 0) &= 0, \

X(0, 0) &= Y(0, 0) = -\infty, \

M(i, 0) &= Y(i, 0) = -\infty \quad (i = 1, \ldots, m), \

M(0, j) &= X(0, j) = -\infty \quad (j = 1, \ldots, n).

\end{aligned}

3. 对于生物序列的比对,通常我们设置 $d > e$。简要描述原因。 4. 为了更准确地处理长间隙的得分,让我们考虑使用以下迭代进行比对。展示当 $d_2 > d_1$,$e_2 > e_1$ 时长度为 $k$ 的间隙得分的一般形式。注意对 $k$ 值的分类是必要的。

\begin{aligned}

M(i, j) &= \max[M(i-1, j-1), X_1(i-1, j-1), Y_1(i-1, j-1), X_2(i-1, j-1), Y_2(i-1, j-1)] + s(x_i, y_j), \

X_1(i, j) &= \max[M(i-1, j) - d_1, X_1(i-1, j) - e_1], \

Y_1(i, j) &= \max[M(i, j-1) - d_1, Y_1(i, j-1) - e_1], \

X_2(i, j) &= \max[M(i-1, j) - d_2, X_2(i-1, j) - e_2], \

Y_2(i, j) &= \max[M(i, j-1) - d_2, Y_2(i, j-1) - e_2].

\end{aligned}