CBMS-2016-12

题目来源：[[文字版题库/CBMS/2016#Problem 12|2016#Problem 12]] 日期：2024-07-31 题目主题：CS-算法-动态规划

解题思路

在解决这个问题时，我们需要利用动态规划的思想和方法来求解两个序列的全局比对问题。该问题的核心是使用一个递推公式来迭代计算最大得分，并且通过记录每一步选择的来源来构建最终的对齐路径。

Solution

1. Show the general form of $g (k)$

The general form of $g (k)$ , which is the score given to a gap of length $k$ , is typically represented as $g (k) = - d k$ where $d$ is a positive penalty for each gap. This linear form assumes a constant penalty for each gap, reflecting the simple gap penalty model.

2. Show the initial values $F (i, 0)$ for $i = 1, \dots, m$ and $F (0, j)$ for $j = 1, \dots, n$ , so that the maximum score of the alignments of the two sequences $x_{1}, \dots, x_{m}$ and $y_{1}, \dots, y_{n}$ is obtained as $F (m, n)$ after updating the iterative formula (A) for $i = 1, \dots, m$ and $j = 1, \dots, n$ . Notice that $F (0, 0) = 0$

The initial values are determined based on the penalty for gaps. Specifically:

F (i, 0) = - d i for i = 1, \dots, m

F (0, j) = - d j for j = 1, \dots, n

This initialization reflects the cumulative penalty for introducing gaps in either sequence up to length $i$ or $j$ .

3. Evaluate the computational time of calculating the maximum score of the alignments of the two sequences $x_{1}, \dots, x_{m}$ and $y_{1}, \dots, y_{n}$ , and describe it by using $m$ and $n$

The computational time of calculating the maximum score of the alignments involves filling in an $m \times n$ matrix $F$ . For each cell $(i, j)$ in this matrix, we compute the value using the iterative formula, which takes constant time $O (1)$ . Since there are $m \times n$ cells, the overall time complexity is:

O (mn)

4. Briefly explain, within five lines, about the role of $π (i, j)$ in the alignment algorithm

The role of $π (i, j)$ is to record the source of the maximum value for $F (i, j)$ , indicating the optimal move that leads to the current cell. This allows us to trace back from $F (m, n)$ to $F (0, 0)$ to reconstruct the optimal alignment path between the two sequences.

知识点

动态规划序列比对全局比对路径回溯复杂度分析

解题技巧和信息

在序列比对中，理解递推公式的三个选择对应的不同操作（匹配/错配、插入间隙）是非常重要的。
初始化边界条件可以帮助你理解和构建完整的动态规划表。
通过路径回溯可以重构出最优的对齐方式，而不仅仅是求得最大得分。

重点词汇

alignment 对齐
sequence 序列
dynamic programming 动态规划
gap penalty 间隙惩罚
traceback 回溯

参考资料

Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. Chap. 2

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

Explorer

CBMS-2016-12

CBMS-2016-12

解题思路

Solution

1. Show the general form of $g (k)$

3. Evaluate the computational time of calculating the maximum score of the alignments of the two sequences $x_{1}, \dots, x_{m}$ and $y_{1}, \dots, y_{n}$ , and describe it by using $m$ and $n$

4. Briefly explain, within five lines, about the role of $π (i, j)$ in the alignment algorithm

知识点

解题技巧和信息

重点词汇

参考资料

Graph View

Table of Contents

Backlinks

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

CBMS-2016-12

CBMS-2016-12

解题思路

Solution

1. Show the general form of g(k)

2. Show the initial values F(i,0) for i=1,…,m and F(0,j) for j=1,…,n, so that the maximum score of the alignments of the two sequences x1​,…,xm​ and y1​,…,yn​ is obtained as F(m,n) after updating the iterative formula (A) for i=1,…,m and j=1,…,n. Notice that F(0,0)=0

3. Evaluate the computational time of calculating the maximum score of the alignments of the two sequences x1​,…,xm​ and y1​,…,yn​, and describe it by using m and n

4. Briefly explain, within five lines, about the role of π(i,j) in the alignment algorithm

知识点

解题技巧和信息

重点词汇

参考资料

Graph View

Table of Contents

Backlinks

1. Show the general form of $g (k)$

3. Evaluate the computational time of calculating the maximum score of the alignments of the two sequences $x_{1}, \dots, x_{m}$ and $y_{1}, \dots, y_{n}$ , and describe it by using $m$ and $n$

4. Briefly explain, within five lines, about the role of $π (i, j)$ in the alignment algorithm