ā Back to Cheat Sheets
String Matching Kmp Rolling Hash
Table of Contents
- # 0) Concept
- # 0-1) Types
- # 0-2) Pattern
- # 1) General form
- # 1-1) Basic OP
- # 1-2) Templates
- # 1-3) Double Hashing (Collision Reduction)
- # 2) LC Example
- # 2-1) Find the Index of the First Occurrence in a String (LC 28)
- # 2-2) Repeated Substring Pattern (LC 459)
- # 2-3) Longest Duplicate Substring (LC 1044)
- # 2-4) Shortest Palindrome (LC 214)
- # 2-5) Distinct Echo Substrings (LC 1316)
- # 2-6) Rotate String (LC 796)
- # 2-7) Implement strStr() Variants
- # 2-8) Longest Happy Prefix (LC 1392)
- # 3) Common Applications
- # 3-1) Pattern Matching Problems
- # 3-2) String Period Detection
- # 3-3) Palindrome Problems
- # 3-4) Duplicate Detection
- # 4) Algorithm Comparison
- # 5) Tips & Tricks
- # 5-1) When to Use KMP
- # 5-2) When to Use Rolling Hash
- # 5-3) Common Patterns
- # 6) Summary
- # Key Takeaways
- # Common LeetCode Problems
- # Interview Tips
# String Matching (KMP, Rolling Hash)
# 0) Concept
# 0-1) Types
String Matching refers to finding occurrences of a pattern within a text efficiently. The two most important algorithms are:
- KMP (Knuth-Morris-Pratt): Uses a failure function to avoid redundant comparisons
- Rolling Hash (Rabin-Karp): Uses polynomial hashing to compare substrings in O(1) time
# 0-2) Pattern
Both algorithms optimize naive string matching from O(nm) to O(n+m) or O(n) where:
- n = length of text
- m = length of pattern
Key Characteristics:
- KMP: Deterministic, no hash collisions, O(n+m) time, O(m) space
- Rolling Hash: Probabilistic, may have collisions, O(n) average time, O(1) space
- When to Use KMP: Single pattern search, guaranteed correctness
- When to Use Rolling Hash: Multiple patterns, substring comparisons, duplicate detection
# 1) General form
# 1-1) Basic OP
# KMP Algorithm Components
-
Failure Function (LPS Array)
- LPS = Longest Proper Prefix which is also Suffix
- Helps skip unnecessary comparisons
- Time: O(m), Space: O(m)
-
Pattern Matching
- Uses LPS array to avoid backtracking in text
- Time: O(n), Space: O(1)
# Rolling Hash Components
-
Hash Calculation
- Polynomial hash: hash = (cā Ć base^(m-1) + cā Ć base^(m-2) + ⦠+ cā) % mod
- Base: typically 256 or prime number
- Mod: large prime to reduce collisions
-
Rolling Window
- Remove leftmost character: hash = (hash - cā Ć base^(m-1)) % mod
- Add rightmost character: hash = (hash Ć base + cā) % mod
# 1-2) Templates
# KMP Template
def kmp_search(text, pattern):
"""Find all occurrences of pattern in text using KMP"""
def build_lps(pattern):
"""Build Longest Prefix Suffix array"""
m = len(pattern)
lps = [0] * m
length = 0 # length of previous longest prefix suffix
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
# Don't increment i, try with shorter prefix
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
if not pattern:
return [0]
n, m = len(text), len(pattern)
lps = build_lps(pattern)
matches = []
i = j = 0 # i for text, j for pattern
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
# Found pattern at index i-j
matches.append(i - j)
j = lps[j - 1]
elif i < n and text[i] != pattern[j]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return matches
// Java KMP Implementation
public class KMP {
public List<Integer> search(String text, String pattern) {
List<Integer> matches = new ArrayList<>();
if (pattern.isEmpty()) return matches;
int[] lps = buildLPS(pattern);
int n = text.length(), m = pattern.length();
int i = 0, j = 0;
while (i < n) {
if (text.charAt(i) == pattern.charAt(j)) {
i++;
j++;
}
if (j == m) {
matches.add(i - j);
j = lps[j - 1];
} else if (i < n && text.charAt(i) != pattern.charAt(j)) {
if (j != 0) {
j = lps[j - 1];
} else {
i++;
}
}
}
return matches;
}
private int[] buildLPS(String pattern) {
int m = pattern.length();
int[] lps = new int[m];
int length = 0;
int i = 1;
while (i < m) {
if (pattern.charAt(i) == pattern.charAt(length)) {
length++;
lps[i] = length;
i++;
} else {
if (length != 0) {
length = lps[length - 1];
} else {
lps[i] = 0;
i++;
}
}
}
return lps;
}
}
# Rolling Hash Template
class RollingHash:
"""Rolling hash for efficient substring matching"""
def __init__(self, s, base=256, mod=10**9 + 7):
self.s = s
self.n = len(s)
self.base = base
self.mod = mod
# Precompute hash values for all prefixes
self.hash_values = [0] * (self.n + 1)
self.base_powers = [1] * (self.n + 1)
for i in range(self.n):
self.hash_values[i + 1] = (self.hash_values[i] * base + ord(s[i])) % mod
self.base_powers[i + 1] = (self.base_powers[i] * base) % mod
def get_hash(self, left, right):
"""Get hash of substring s[left:right+1] in O(1)"""
length = right - left + 1
result = (self.hash_values[right + 1] -
self.hash_values[left] * self.base_powers[length]) % self.mod
return result if result >= 0 else result + self.mod
def compare_substrings(self, l1, r1, l2, r2):
"""Compare two substrings in O(1) using hash"""
return (r1 - l1 == r2 - l2 and
self.get_hash(l1, r1) == self.get_hash(l2, r2))
def rabin_karp_search(text, pattern):
"""Find all occurrences of pattern using Rabin-Karp"""
if not pattern or len(pattern) > len(text):
return []
base = 256
mod = 10**9 + 7
n, m = len(text), len(pattern)
# Calculate base^(m-1) % mod
base_power = pow(base, m - 1, mod)
# Calculate initial hashes
pattern_hash = 0
text_hash = 0
for i in range(m):
pattern_hash = (pattern_hash * base + ord(pattern[i])) % mod
text_hash = (text_hash * base + ord(text[i])) % mod
matches = []
# Slide pattern over text
for i in range(n - m + 1):
# If hash matches, verify character by character
if pattern_hash == text_hash:
if text[i:i+m] == pattern:
matches.append(i)
# Calculate hash for next window
if i < n - m:
text_hash = (text_hash - ord(text[i]) * base_power) % mod
text_hash = (text_hash * base + ord(text[i + m])) % mod
if text_hash < 0:
text_hash += mod
return matches
// Java Rolling Hash Implementation
public class RollingHash {
private static final long BASE = 256;
private static final long MOD = 1_000_000_007;
public List<Integer> search(String text, String pattern) {
List<Integer> matches = new ArrayList<>();
int n = text.length(), m = pattern.length();
if (m > n) return matches;
long basePower = 1;
for (int i = 0; i < m - 1; i++) {
basePower = (basePower * BASE) % MOD;
}
long patternHash = 0, textHash = 0;
// Calculate initial hashes
for (int i = 0; i < m; i++) {
patternHash = (patternHash * BASE + pattern.charAt(i)) % MOD;
textHash = (textHash * BASE + text.charAt(i)) % MOD;
}
// Slide pattern over text
for (int i = 0; i <= n - m; i++) {
if (patternHash == textHash) {
// Verify character by character
if (text.substring(i, i + m).equals(pattern)) {
matches.add(i);
}
}
if (i < n - m) {
textHash = (textHash - text.charAt(i) * basePower % MOD + MOD) % MOD;
textHash = (textHash * BASE + text.charAt(i + m)) % MOD;
}
}
return matches;
}
}
# 1-3) Double Hashing (Collision Reduction)
class DoubleHash:
"""Double hashing to minimize collision probability"""
def __init__(self, s):
self.s = s
self.n = len(s)
# Two different bases and moduli
self.base1, self.mod1 = 257, 10**9 + 7
self.base2, self.mod2 = 263, 10**9 + 9
# Precompute for both hash functions
self.hash1 = [0] * (self.n + 1)
self.hash2 = [0] * (self.n + 1)
self.pow1 = [1] * (self.n + 1)
self.pow2 = [1] * (self.n + 1)
for i in range(self.n):
self.hash1[i + 1] = (self.hash1[i] * self.base1 + ord(s[i])) % self.mod1
self.hash2[i + 1] = (self.hash2[i] * self.base2 + ord(s[i])) % self.mod2
self.pow1[i + 1] = (self.pow1[i] * self.base1) % self.mod1
self.pow2[i + 1] = (self.pow2[i] * self.base2) % self.mod2
def get_hash(self, left, right):
"""Get double hash tuple for substring"""
length = right - left + 1
h1 = (self.hash1[right + 1] - self.hash1[left] * self.pow1[length]) % self.mod1
h2 = (self.hash2[right + 1] - self.hash2[left] * self.pow2[length]) % self.mod2
return (h1 if h1 >= 0 else h1 + self.mod1,
h2 if h2 >= 0 else h2 + self.mod2)
# 2) LC Example
# 2-1) Find the Index of the First Occurrence in a String (LC 28)
# KMP Solution
def strStr(haystack, needle):
"""LC 28: Implement strStr() using KMP"""
if not needle:
return 0
def build_lps(pattern):
m = len(pattern)
lps = [0] * m
length = 0
i = 1
while i < m:
if pattern[i] == pattern[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
return lps
n, m = len(haystack), len(needle)
lps = build_lps(needle)
i = j = 0
while i < n:
if haystack[i] == needle[j]:
i += 1
j += 1
if j == m:
return i - j
elif i < n and haystack[i] != needle[j]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return -1
# Rolling Hash Solution
def strStr_hash(haystack, needle):
"""LC 28: Using rolling hash"""
if not needle:
return 0
if len(needle) > len(haystack):
return -1
base = 256
mod = 10**9 + 7
n, m = len(haystack), len(needle)
base_power = pow(base, m - 1, mod)
needle_hash = 0
hay_hash = 0
for i in range(m):
needle_hash = (needle_hash * base + ord(needle[i])) % mod
hay_hash = (hay_hash * base + ord(haystack[i])) % mod
for i in range(n - m + 1):
if needle_hash == hay_hash:
if haystack[i:i+m] == needle:
return i
if i < n - m:
hay_hash = (hay_hash - ord(haystack[i]) * base_power) % mod
hay_hash = (hay_hash * base + ord(haystack[i + m])) % mod
if hay_hash < 0:
hay_hash += mod
return -1
# 2-2) Repeated Substring Pattern (LC 459)
def repeatedSubstringPattern(s):
"""LC 459: Check if string is made of repeated pattern"""
# Method 1: KMP-based
def kmp_solution(s):
n = len(s)
lps = [0] * n
length = 0
i = 1
while i < n:
if s[i] == s[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
# If lps[n-1] != 0 and n % (n - lps[n-1]) == 0
# then string has repeating pattern
return lps[n - 1] != 0 and n % (n - lps[n - 1]) == 0
# Method 2: String rotation trick
def rotation_solution(s):
# If s is repeated pattern, it will appear in (s+s)[1:-1]
return s in (s + s)[1:-1]
return kmp_solution(s)
# 2-3) Longest Duplicate Substring (LC 1044)
def longestDupSubstring(s):
"""LC 1044: Find longest duplicate substring using binary search + rolling hash"""
def has_duplicate(length):
"""Check if there's duplicate substring of given length"""
base = 256
mod = 2**63 - 1
# Calculate base^length
base_power = pow(base, length, mod)
# Initial hash
current_hash = 0
for i in range(length):
current_hash = (current_hash * base + ord(s[i])) % mod
seen = {current_hash}
# Rolling hash
for i in range(length, len(s)):
# Remove leftmost character
current_hash = (current_hash - ord(s[i - length]) * base_power) % mod
# Add rightmost character
current_hash = (current_hash * base + ord(s[i])) % mod
if current_hash in seen:
return i - length + 1
seen.add(current_hash)
return -1
# Binary search on length
left, right = 0, len(s) - 1
result_start = -1
while left <= right:
mid = (left + right) // 2
start_pos = has_duplicate(mid)
if start_pos != -1:
result_start = start_pos
left = mid + 1
else:
right = mid - 1
return s[result_start:result_start + right] if result_start != -1 else ""
# 2-4) Shortest Palindrome (LC 214)
def shortestPalindrome(s):
"""LC 214: Find shortest palindrome by adding chars to front"""
# KMP approach: Find longest palindromic prefix
def kmp_solution(s):
# Create pattern: s + "#" + reverse(s)
# Find longest prefix of s that matches suffix of reverse(s)
rev = s[::-1]
new_s = s + "#" + rev
# Build LPS array
n = len(new_s)
lps = [0] * n
length = 0
i = 1
while i < n:
if new_s[i] == new_s[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
# lps[-1] gives length of longest palindromic prefix
palindrome_len = lps[-1]
# Add reverse of remaining suffix to front
return rev[:len(s) - palindrome_len] + s
return kmp_solution(s)
# 2-5) Distinct Echo Substrings (LC 1316)
def distinctEchoSubstrings(text):
"""LC 1316: Count distinct echo substrings (repeated twice consecutively)"""
rh = RollingHash(text)
seen = set()
count = 0
# Try all possible lengths (must be even)
for length in range(2, len(text) + 1, 2):
half = length // 2
for start in range(len(text) - length + 1):
mid = start + half
end = start + length - 1
# Compare first half with second half
if rh.compare_substrings(start, mid - 1, mid, end):
substring_hash = rh.get_hash(start, end)
if substring_hash not in seen:
seen.add(substring_hash)
count += 1
return count
# 2-6) Rotate String (LC 796)
def rotateString(s, goal):
"""LC 796: Check if goal is rotation of s"""
# If goal is rotation of s, it will be in s+s
if len(s) != len(goal):
return False
# Method 1: Simple check
return goal in (s + s)
# Method 2: KMP approach
def kmp_approach():
if len(s) != len(goal):
return False
# Search for goal in s+s
matches = kmp_search(s + s, goal)
return len(matches) > 0
return kmp_approach()
# 2-7) Implement strStr() Variants
# Count all occurrences
def count_occurrences(text, pattern):
"""Count all occurrences of pattern in text"""
matches = kmp_search(text, pattern)
return len(matches)
# Find all overlapping occurrences
def find_all_overlapping(text, pattern):
"""Find all overlapping occurrences (including overlaps)"""
n, m = len(text), len(pattern)
lps = build_lps(pattern)
matches = []
i = j = 0
while i < n:
if text[i] == pattern[j]:
i += 1
j += 1
if j == m:
matches.append(i - j)
j = lps[j - 1] # Continue to find overlapping matches
elif i < n and text[i] != pattern[j]:
if j != 0:
j = lps[j - 1]
else:
i += 1
return matches
# Multiple pattern matching
def multiple_pattern_search(text, patterns):
"""Search for multiple patterns in text"""
results = {}
for pattern in patterns:
results[pattern] = kmp_search(text, pattern)
return results
# 2-8) Longest Happy Prefix (LC 1392)
def longestPrefix(s):
"""LC 1392: Find longest happy prefix (prefix == suffix)"""
n = len(s)
lps = [0] * n
length = 0
i = 1
while i < n:
if s[i] == s[length]:
length += 1
lps[i] = length
i += 1
else:
if length != 0:
length = lps[length - 1]
else:
lps[i] = 0
i += 1
# lps[-1] gives length of longest prefix which is also suffix
return s[:lps[-1]]
# 3) Common Applications
# 3-1) Pattern Matching Problems
- Find first/all occurrences
- Multiple pattern search
- Substring with wildcard matching
# 3-2) String Period Detection
- Repeated substring pattern
- String rotation
- Cyclic string comparison
# 3-3) Palindrome Problems
- Shortest palindrome
- Palindromic prefix/suffix
# 3-4) Duplicate Detection
- Longest duplicate substring
- Distinct substrings
- Echo substrings
# 4) Algorithm Comparison
| Algorithm | Time | Space | Pros | Cons |
|---|---|---|---|---|
| KMP | O(n+m) | O(m) | Deterministic, no collisions | More complex implementation |
| Rolling Hash | O(n) avg | O(1) | Simple, multiple patterns | Hash collisions possible |
| Naive | O(nm) | O(1) | Very simple | Too slow for large inputs |
# 5) Tips & Tricks
# 5-1) When to Use KMP
- Single pattern matching
- Need guaranteed correctness
- Pattern has repeating structure
- LC 28, 214, 459, 1392
# 5-2) When to Use Rolling Hash
- Multiple pattern search
- Substring comparison
- Duplicate detection
- LC 1044, 1316, 718, 1062
# 5-3) Common Patterns
# Pattern 1: String in (s+s) for rotation
def is_rotation(s1, s2):
return len(s1) == len(s2) and s2 in (s1 + s1)
# Pattern 2: LPS for periodicity
def has_period(s):
lps = build_lps(s)
n = len(s)
return lps[n-1] != 0 and n % (n - lps[n-1]) == 0
# Pattern 3: Binary search + hash for optimization
def find_longest(s, condition):
left, right = 0, len(s)
result = -1
while left <= right:
mid = (left + right) // 2
if check_with_hash(s, mid, condition):
result = mid
left = mid + 1
else:
right = mid - 1
return result
# 6) Summary
# Key Takeaways
- KMP is optimal for single pattern matching with O(n+m) time
- Rolling Hash excels at substring comparison in O(1) amortized time
- Use double hashing to minimize collision probability
- Binary search + hash is powerful for optimization problems
- LPS array reveals string periodicity and structure
# Common LeetCode Problems
| Problem | Number | Algorithm | Difficulty |
|---|---|---|---|
| Implement strStr() | 28 | KMP/Hash | Easy |
| Shortest Palindrome | 214 | KMP | Hard |
| Repeated Substring Pattern | 459 | KMP | Easy |
| Rotate String | 796 | String | Easy |
| Longest Duplicate Substring | 1044 | Hash+BS | Hard |
| Distinct Echo Substrings | 1316 | Hash | Hard |
| Longest Happy Prefix | 1392 | KMP | Hard |
| Maximum Repeating Substring | 1668 | String | Easy |
# Interview Tips
- Always clarify if overlapping matches are allowed
- Consider string length constraints (KMP vs naive)
- For hash, explain collision handling strategy
- Time complexity: O(n+m) for KMP, O(n) for hash
- Space complexity: O(m) for KMP, O(1) for basic hash