CBMS-2023-12
题目来源:Problem 12 日期:2024-07-11 题目主题:Math-概率论-随机序列中的字符串出现概率
具体题目
We would like to know the probability that the string appears in a random protein sequence. Assume the sequence has random independent letters from the protein alphabet , and letter has probability . Define to be the probability that appears in a random sequence of length , given that the sequence ends with , where is a length-4 string.
-
What is when ?
-
What is when ?
Define to be without its final letter. Define to be with letter prepended to it. These definitions may be useful to answer the following questions.
-
What is , in terms of , where is any length-4 string, when ?
-
What is when ?
Define to be the product of the probabilities of the letters in . Define to be the set of all possible length-4 strings. These definitions may be useful to answer the following question.
- What is the probability that appears in a random sequence of length , in terms of ?
正确解答
Question 1
when :
If , then has not appeared by the time the sequence reaches length 4 with ending . Therefore, the probability .
Question 2
when :
If , then the sequence exactly matches , meaning has appeared. Therefore, the probability .
Question 3
in terms of when :
To find , we need to consider all possible sequences of length that could lead to a sequence ending with when an additional letter is appended. Specifically, depends on for all possible letters .
Since , we have:
Question 4
when :
If the sequence ends in at length , then has appeared, so .
Question 5
The probability that appears in a random sequence of length :
To find the total probability that appears in a sequence of length , we will consider 3 cases:
-
: In this case, cannot appear, so the probability is 0.
-
: The probability that appears in a sequence of length 4 is given by .
-
: The probability that appears in a sequence of length is given by:
where is the product of the probabilities of the letters in .
Therefore, the probability that appears in a random sequence of length can be concluded as follows:
知识点
难点解题思路
对于随机序列中特定字符串的出现概率问题,关键在于递推关系的构建以及边界条件的处理。通过定义恰当的递推公式,可以逐步推导出所需概率。
解题技巧和信息
- 构建递推关系,考虑当前状态如何从前一个状态转移。
- 确定边界条件,并基于这些条件初始化递推公式。
- 注意概率的加总,确保所有可能的转移情况都被考虑在内。
重点词汇
- Probability: 概率
- Sequence: 序列
- Random: 随机的
- Independent: 独立的
- Recurrence relation: 递推关系
参考资料
- Introduction to Probability Models, Chapter 3
- Probability and Statistics for Engineers and Scientists, Chapter 5