CBMS-2015-12

题目来源Question 12 日期:2024-08-01 题目主题:CS-算法-动态规划-序列比对

解题思路

这道题目主要涉及序列比对中的动态规划算法,特别是关于全局比对和空位罚分的计算。我们需要:

  1. 分析给定的递推公式,推导出空位罚分函数的一般形式
  2. 根据给定的初始条件,确定边界情况的初始值
  3. 解释生物序列比对中常用的参数设置
  4. 分析更复杂的空位罚分模型,并推导其一般形式

关键是要理解动态规划算法在序列比对中的应用,以及不同空位罚分模型的意义和影响。

Solution

1. General form of the gap score

To find the general form of , we need to analyze the recursive formulas for and .

For a gap of length , the score will be:

Explanation:

  • The first gap opening costs
  • Each subsequent gap extension costs
  • For a gap of length , we have 1 opening and extensions

Therefore, the general form of is:

2. Initial values for and

Given the initial conditions:

We need to determine for and for .

For :

Therefore, for .

For :

Therefore, for .

3. Reason for setting in biological sequence alignment

In biological sequence alignment, we usually set (i.e., the gap opening penalty is larger than the gap extension penalty) for the following reasons:

  1. Biological relevance: In evolutionary processes, it’s more likely for a single mutation event to cause a contiguous gap (insertion or deletion) of multiple residues than for multiple independent single-residue indel events to occur adjacently. Setting reflects this biological reality.

  2. Alignment quality: This setting encourages fewer, longer gaps rather than many short gaps in the alignment. This often leads to more biologically meaningful alignments, as it better represents the evolutionary events that may have occurred.

  3. Computational efficiency: By discouraging the introduction of many small gaps, this setting can reduce the search space for optimal alignments, potentially improving the efficiency of alignment algorithms.

4. General form of gap score for the more complex model

For the more complex model with two sets of gap penalties and , where and , the general form of the gap score will be:

where is the break-even point where the two scoring schemes give the same result.

To find , we solve:

Solving for :

Note that may not be an integer. In practice, we would use the floor or ceiling of this value, depending on the specific requirements of the alignment algorithm.

This more complex model allows for a more nuanced treatment of gap penalties, potentially leading to more accurate alignments, especially for sequences with varying gap length distributions.

知识点

动态规划序列比对全局比对生物信息学

难点思路

  1. 理解动态规划在序列比对中的应用
  2. 分析递推公式以推导空位罚分函数
  3. 理解不同空位罚分模型的生物学意义
  4. 处理更复杂的空位罚分模型,并推导其一般形式

解题技巧和信息

  1. 分析递推公式时,关注 gap opening 和 gap extension 的区别
  2. 初始值的设置对于动态规划算法的正确性至关重要
  3. 在生物序列比对中,参数的设置常常基于生物学知识和经验
  4. 处理分段函数时,寻找分界点(break-even point)是关键

重点词汇

  • Global alignment: 全局比对
  • Dynamic programming: 动态规划
  • Gap penalty: 空位罚分
  • Gap opening penalty: 空位开放罚分
  • Gap extension penalty: 空位延伸罚分
  • Sequence alignment: 序列比对
  • Affine gap penalty: 仿射空位罚分

参考资料

  1. Gusfield, D. (1997). Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press, Chap. 3.