CBMS-2016-12
题目来源:[[文字版题库/CBMS/2016#Problem 12|2016#Problem 12]] 日期:2024-07-31 题目主题:CS-算法-动态规划
解题思路
在解决这个问题时,我们需要利用动态规划的思想和方法来求解两个序列的全局比对问题。该问题的核心是使用一个递推公式来迭代计算最大得分,并且通过记录每一步选择的来源来构建最终的对齐路径。
Solution
1. Show the general form of
The general form of , which is the score given to a gap of length , is typically represented as where is a positive penalty for each gap. This linear form assumes a constant penalty for each gap, reflecting the simple gap penalty model.
2. Show the initial values for and for , so that the maximum score of the alignments of the two sequences and is obtained as after updating the iterative formula (A) for and . Notice that
The initial values are determined based on the penalty for gaps. Specifically:
This initialization reflects the cumulative penalty for introducing gaps in either sequence up to length or .
3. Evaluate the computational time of calculating the maximum score of the alignments of the two sequences and , and describe it by using and
The computational time of calculating the maximum score of the alignments involves filling in an matrix . For each cell in this matrix, we compute the value using the iterative formula, which takes constant time . Since there are cells, the overall time complexity is:
4. Briefly explain, within five lines, about the role of in the alignment algorithm
The role of is to record the source of the maximum value for , indicating the optimal move that leads to the current cell. This allows us to trace back from to to reconstruct the optimal alignment path between the two sequences.
知识点
解题技巧和信息
- 在序列比对中,理解递推公式的三个选择对应的不同操作(匹配/错配、插入间隙)是非常重要的。
- 初始化边界条件可以帮助你理解和构建完整的动态规划表。
- 通过路径回溯可以重构出最优的对齐方式,而不仅仅是求得最大得分。
重点词汇
- alignment 对齐
- sequence 序列
- dynamic programming 动态规划
- gap penalty 间隙惩罚
- traceback 回溯
参考资料
- Durbin, R., Eddy, S., Krogh, A., & Mitchison, G. (1998). Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press. Chap. 2