IS CS-2020S1-01

题目来源：Problem 1 日期：2024-08-07

题目主题：CS-算法-字符串匹配算法

解题思路

这道题目主要涉及字符串匹配算法,包括暴力匹配和基于哈希的匹配算法。我们需要分析这些算法的时间复杂度,并设计一个高效的算法来解决 FIND 问题。解题过程中,我们会用到字符串哈希、滚动哈希等技巧,同时需要考虑算法的正确性和效率。

Solution

1. Algorithm S Time Complexity

The worst-case time complexity of algorithm S can be expressed as:

$O (ℓ (s) \cdot ℓ (p))$

Explanation:

The outer loop runs $ℓ (s) - ℓ (p) + 1$ times, which is $O (ℓ (s))$ .
For each iteration, the eq function is called, which has a time complexity of $O (ℓ (p))$ .
Therefore, the total time complexity is $O (ℓ (s) \cdot ℓ (p))$ .

2. Computing $h (s + i + 1, m)$ in $O (1)$ time

To compute $h (s + i + 1, m)$ in $O (1)$ time, we can use the rolling hash technique:

$h (s + i + 1, m) = ((h^{'} - numval (s [i]) \cdot d_{m}) \cdot d + numval (s [i + m])) mod q$

Explanation:

We remove the contribution of $s [i]$ from $h^{'}$ .
We multiply the result by $d$ to shift all values left.
We add the contribution of the new character $s [i + m]$ .
We take the modulo $q$ to keep the hash value in the correct range.

This computation can be done in $O (1)$ time as all operations (subtraction, multiplication, addition, and modulo) are assumed to take constant time.

3. Algorithm $H_{0}$

Here’s an algorithm $H_{0}$ that finds the least non-negative integer $i$ satisfying $h (p, ℓ (p)) = h (s + i, ℓ (p))$ in $O (ℓ (s) + ℓ (p))$ time:

def H_0(s, p):
    m = len(p)
    n = len(s)
    
    # Compute h(p, m)
    hp = 0
    for i in range(m):
        hp = (hp * d + numval(p[i])) % q
    
    # Compute h(s, m) for the first m characters of s
    hs = 0
    for i in range(m):
        hs = (hs * d + numval(s[i])) % q
    
    # Precompute d^(m-1)
    d_m = pow(d, m-1, q)
    
    # Check the first position
    if hs == hp:
        return 0
    
    # Check the remaining positions using rolling hash
    for i in range(1, n - m + 1):
        hs = ((hs - numval(s[i-1]) * d_m) * d + numval(s[i+m-1])) % q
        if hs == hp:
            return i
    
    return -1

The algorithm $H_{0}$ may output a value that is not the solution of problem FIND when there is a hash collision. This can happen when two different substrings of $s$ have the same hash value as $p$ , but they are not actually equal to $p$ . This is known as a “false positive” in hashing.

4. Algorithm $H$

Here’s an algorithm $H$ that satisfies all the given conditions:

def H(s, p):
    m = len(p)
    n = len(s)
    
    # Compute h(p, m)
    hp = 0
    for i in range(m):
        hp = (hp * d + numval(p[i])) % q
    
    # Compute h(s, m) for the first m characters of s
    hs = 0
    for i in range(m):
        hs = (hs * d + numval(s[i])) % q
    
    # Precompute d^(m-1)
    d_m = pow(d, m-1, q)
    
    # Check the first position
    if hs == hp and eq(s, p) == 1:
        return 0
    
    # Check the remaining positions using rolling hash
    for i in range(1, n - m + 1):
        hs = ((hs - numval(s[i-1]) * d_m) * d + numval(s[i+m-1])) % q
        if hs == hp and eq(s + i, p) == 1:
            return i
    
    return -1

This algorithm satisfies all the given conditions:

(a) It always answers the solution of problem FIND because it uses the eq function to verify matches.

(b) It uses the hash $h (s, m)$ and function $eq (r, p)$ .

The time complexity of algorithm $H$ would be larger than $O (ℓ (s) + ℓ (p))$ when there are many hash collisions. In the worst case, if every position in $s$ has the same hash value as $p$ , the algorithm would need to call $eq (s + i, p)$ for every position, leading to a time complexity of $O (ℓ (s) \cdot ℓ (p))$ .

The worst-case time complexity of algorithm $H$ is $O (ℓ (s) \cdot ℓ (p))$ , which occurs when there are many hash collisions and the algorithm needs to verify each position using the $eq$ function.

知识点

字符串匹配哈希函数滚动哈希复杂度分析

解题技巧和信息

在分析字符串匹配算法时,要考虑最坏情况下的时间复杂度。
滚动哈希是一种高效的技术,可以在 O(1) 时间内更新哈希值。
在使用哈希函数时,要注意处理哈希冲突的问题。
算法的时间复杂度可能会因输入的特性而变化,需要考虑不同的情况。

重点词汇

string matching 字符串匹配
hash function 哈希函数
rolling hash 滚动哈希
time complexity 时间复杂度
worst-case scenario 最坏情况
hash collision 哈希冲突

参考资料

Introduction to Algorithms (CLRS), Chapter 32: String Matching
Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology by Dan Gusfield

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

Explorer

IS_CS-2020S1-01

IS CS-2020S1-01

题目主题：CS-算法-字符串匹配算法

解题思路

Solution

1. Algorithm S Time Complexity

2. Computing $h (s + i + 1, m)$ in $O (1)$ time

3. Algorithm $H_{0}$

4. Algorithm $H$

知识点

解题技巧和信息

重点词汇

参考资料

Graph View

Table of Contents

Backlinks

Zephyr's Notes on ISCS & CBMS, UTokyo

Explorer

IS_CS-2020S1-01

IS CS-2020S1-01

题目主题：CS-算法-字符串匹配算法

解题思路

Solution

1. Algorithm S Time Complexity

2. Computing h(s+i+1,m) in O(1) time

3. Algorithm H0​

4. Algorithm H

知识点

解题技巧和信息

重点词汇

参考资料

Graph View

Table of Contents

Backlinks

2. Computing $h (s + i + 1, m)$ in $O (1)$ time

3. Algorithm $H_{0}$

4. Algorithm $H$