Hashing & Counting
Overview
Hashing & Counting techniques use hash tables and frequency maps to solve problems involving counting, grouping, and fast lookups.
Key Properties
- Time Complexity: O(1) average for hash operations, O(n) for full traversal
- Space Complexity: O(n) for hash table storage
- Core Idea: Trade space for time using hash table data structures
- When to Use: Fast lookups, frequency counting, duplicate detection, grouping
- Key Data Structures: HashMap, HashSet, Counter, defaultdict
Core Characteristics
- Fast Lookups: O(1) average case for search/insert/delete
- Frequency Tracking: Count occurrences of elements
- Duplicate Detection: Identify seen elements
- Grouping: Collect items with same properties
- Rolling Hash: Efficient string matching and substring problems
Problem Categories
Category 1: Frequency Maps
- Description: Count occurrences and group by frequency
- Examples: LC 242 (Valid Anagram), LC 49 (Group Anagrams), LC 169 (Majority Element)
- Pattern: Use HashMap to count frequencies, then analyze counts
Category 2: Prefix Hash / Rolling Hash
- Description: Efficient string matching using hash functions
- Examples: LC 28 (Find Index), LC 187 (Repeated DNA), LC 1044 (Longest Duplicate Substring)
- Pattern: Compute rolling hash for sliding windows
Category 3: HashSet for Seen States
- Description: Track visited elements to detect patterns or cycles
- Examples: LC 202 (Happy Number), LC 141 (Linked List Cycle), LC 128 (Longest Consecutive)
- Pattern: Use HashSet to remember seen states
Templates & Algorithms
Template Comparison Table
| Template Type |
Use Case |
Time Complexity |
When to Use |
| Frequency Counter |
Count elements |
O(n) |
Anagrams, duplicates |
| Rolling Hash |
String matching |
O(n+m) |
Substring search |
| Seen States |
Cycle detection |
O(n) |
Detect patterns |
| Group by Hash |
Categorization |
O(n) |
Grouping similar items |
Template 1: Frequency Counter
def frequency_counter_template(arr):
"""Basic frequency counting template"""
from collections import Counter, defaultdict
# Method 1: Using Counter
count = Counter(arr)
# Method 2: Using defaultdict
freq = defaultdict(int)
for item in arr:
freq[item] += 1
# Method 3: Manual counting
manual_count = {}
for item in arr:
manual_count[item] = manual_count.get(item, 0) + 1
return count, freq, manual_count
Template 2: Rolling Hash (Rabin-Karp)
def rolling_hash_template(text, pattern):
"""Rolling hash for pattern matching"""
if len(pattern) > len(text):
return -1
# Hash function parameters
base = 256
mod = 10**9 + 7
def compute_hash(s, length):
"""Compute hash for first 'length' characters"""
hash_val = 0
for i in range(length):
hash_val = (hash_val * base + ord(s[i])) % mod
return hash_val
def rolling_hash(s, old_hash, old_char, new_char, base_power, mod):
"""Update hash by removing old_char and adding new_char"""
new_hash = (old_hash - ord(old_char) * base_power) % mod
new_hash = (new_hash * base + ord(new_char)) % mod
return new_hash
pattern_len = len(pattern)
pattern_hash = compute_hash(pattern, pattern_len)
text_hash = compute_hash(text, pattern_len)
# Precompute base^(pattern_len-1) % mod
base_power = pow(base, pattern_len - 1, mod)
# Check first window
if pattern_hash == text_hash and text[:pattern_len] == pattern:
return 0
# Rolling hash for remaining windows
for i in range(len(text) - pattern_len):
text_hash = rolling_hash(
text, text_hash, text[i], text[i + pattern_len], base_power, mod
)
if pattern_hash == text_hash and text[i+1:i+1+pattern_len] == pattern:
return i + 1
return -1
Template 3: HashSet for Cycle Detection
def cycle_detection_template(start_value, next_function):
"""Detect cycles using HashSet"""
seen = set()
current = start_value
while current not in seen:
seen.add(current)
current = next_function(current)
# Optional: check for termination condition
if is_terminal(current):
return False
return True # Cycle detected
def floyd_cycle_detection(start_value, next_function):
"""Floyd's cycle detection (tortoise and hare)"""
slow = fast = start_value
# Phase 1: Detect if cycle exists
while True:
slow = next_function(slow)
fast = next_function(next_function(fast))
if slow == fast:
break
if is_terminal(fast):
return None # No cycle
# Phase 2: Find cycle start
slow = start_value
while slow != fast:
slow = next_function(slow)
fast = next_function(fast)
return slow # Start of cycle
Template 4: Group by Hash Key
def group_by_hash_template(items, key_function):
"""Group items by hash key"""
from collections import defaultdict
groups = defaultdict(list)
for item in items:
key = key_function(item)
groups[key].append(item)
return dict(groups)
def group_anagrams_template(strs):
"""Group anagrams using sorted string as key"""
from collections import defaultdict
groups = defaultdict(list)
for s in strs:
# Use sorted string as key
key = ''.join(sorted(s))
groups[key].append(s)
return list(groups.values())
Problems by Pattern
Frequency Maps Problems
| Problem |
LC # |
Key Technique |
Difficulty |
| Valid Anagram |
242 |
Character frequency |
Easy |
| Group Anagrams |
49 |
Sorted string as key |
Medium |
| Majority Element |
169 |
Count frequency |
Easy |
| Top K Frequent Elements |
347 |
Frequency + heap |
Medium |
| Find All Anagrams |
438 |
Sliding window + freq |
Medium |
| Longest Substring Without Repeating |
3 |
Sliding window + seen |
Medium |
Rolling Hash Problems
| Problem |
LC # |
Key Technique |
Difficulty |
| Implement strStr() |
28 |
Rabin-Karp |
Easy |
| Repeated DNA Sequences |
187 |
10-char rolling hash |
Medium |
| Longest Duplicate Substring |
1044 |
Binary search + rolling hash |
Hard |
| Find All Duplicates in Array |
442 |
Index hashing |
Medium |
HashSet for Seen States Problems
| Problem |
LC # |
Key Technique |
Difficulty |
| Happy Number |
202 |
Detect cycle in sequence |
Easy |
| Linked List Cycle |
141 |
Fast/slow or HashSet |
Easy |
| Longest Consecutive Sequence |
128 |
HashSet lookup |
Medium |
| Contains Duplicate |
217 |
Simple HashSet |
Easy |
| Contains Duplicate II |
219 |
HashSet with window |
Easy |
LC Examples
1. Valid Anagram (LC 242)
def isAnagram(s, t):
"""Check if two strings are anagrams"""
if len(s) != len(t):
return False
# Method 1: Frequency counter
from collections import Counter
return Counter(s) == Counter(t)
# Method 2: Manual counting
count = {}
for char in s:
count[char] = count.get(char, 0) + 1
for char in t:
if char not in count:
return False
count[char] -= 1
if count[char] == 0:
del count[char]
return len(count) == 0
# Method 3: Sorting (not using hash)
return sorted(s) == sorted(t)
2. Group Anagrams (LC 49)
def groupAnagrams(strs):
"""Group strings that are anagrams"""
from collections import defaultdict
# Method 1: Use sorted string as key
groups = defaultdict(list)
for s in strs:
key = ''.join(sorted(s))
groups[key].append(s)
return list(groups.values())
# Method 2: Use frequency tuple as key
def get_frequency_key(s):
freq = [0] * 26
for char in s:
freq[ord(char) - ord('a')] += 1
return tuple(freq)
groups = defaultdict(list)
for s in strs:
key = get_frequency_key(s)
groups[key].append(s)
return list(groups.values())
3. Happy Number (LC 202)
def isHappy(n):
"""Detect if number leads to 1 or cycles"""
def get_sum_of_squares(num):
total = 0
while num > 0:
digit = num % 10
total += digit * digit
num //= 10
return total
# Method 1: HashSet to detect cycle
seen = set()
while n != 1 and n not in seen:
seen.add(n)
n = get_sum_of_squares(n)
return n == 1
# Method 2: Floyd's cycle detection
def next_number(num):
return get_sum_of_squares(num)
slow = fast = n
while True:
slow = next_number(slow)
fast = next_number(next_number(fast))
if slow == fast:
break
return slow == 1
4. Longest Consecutive Sequence (LC 128)
def longestConsecutive(nums):
"""Find longest consecutive sequence"""
if not nums:
return 0
num_set = set(nums)
longest = 0
for num in num_set:
# Only start counting if num-1 is not in set
# This ensures we start from the beginning of a sequence
if num - 1 not in num_set:
current_num = num
current_length = 1
# Count consecutive numbers
while current_num + 1 in num_set:
current_num += 1
current_length += 1
longest = max(longest, current_length)
return longest
5. Repeated DNA Sequences (LC 187)
def findRepeatedDnaSequences(s):
"""Find repeated 10-character DNA sequences using rolling hash"""
if len(s) < 10:
return []
# Method 1: Simple approach with substring
seen = set()
repeated = set()
for i in range(len(s) - 9):
substring = s[i:i+10]
if substring in seen:
repeated.add(substring)
else:
seen.add(substring)
return list(repeated)
# Method 2: Rolling hash approach
def char_to_num(c):
return {'A': 0, 'C': 1, 'G': 2, 'T': 3}[c]
def rolling_hash_dna(s):
if len(s) < 10:
return []
seen = set()
repeated = set()
# Compute hash for first window
hash_val = 0
base = 4
mod = 10**9 + 7
for i in range(10):
hash_val = hash_val * base + char_to_num(s[i])
seen.add(hash_val)
base_power = base ** 9
# Rolling hash for remaining windows
for i in range(10, len(s)):
# Remove first character and add new character
hash_val = hash_val - char_to_num(s[i-10]) * base_power
hash_val = hash_val * base + char_to_num(s[i])
if hash_val in seen:
repeated.add(s[i-9:i+1])
else:
seen.add(hash_val)
return list(repeated)
return rolling_hash_dna(s)
6. Top K Frequent Elements (LC 347)
def topKFrequent(nums, k):
"""Find k most frequent elements"""
from collections import Counter
import heapq
# Method 1: Counter + heap
count = Counter(nums)
return heapq.nlargest(k, count.keys(), key=count.get)
# Method 2: Counter + sorting
count = Counter(nums)
return [item for item, freq in count.most_common(k)]
# Method 3: Bucket sort approach
count = Counter(nums)
buckets = [[] for _ in range(len(nums) + 1)]
# Place elements in buckets by frequency
for num, freq in count.items():
buckets[freq].append(num)
# Collect top k elements
result = []
for i in range(len(buckets) - 1, 0, -1):
if buckets[i]:
result.extend(buckets[i])
if len(result) >= k:
return result[:k]
return result
Advanced Techniques
Custom Hash Functions
def custom_hash_techniques():
"""Various custom hashing approaches"""
# 1. Polynomial rolling hash
def polynomial_hash(s, base=31, mod=10**9+7):
hash_val = 0
base_power = 1
for char in s:
hash_val = (hash_val + ord(char) * base_power) % mod
base_power = (base_power * base) % mod
return hash_val
# 2. XOR hash for pairs
def xor_hash(a, b):
return hash(a) ^ hash(b)
# 3. Tuple hash for coordinates
def coordinate_hash(x, y):
return hash((x, y))
# 4. String hash ignoring order
def unordered_hash(s):
return sum(hash(c) for c in s)
Hash-based Data Structures
class HashBasedStructures:
"""Examples of hash-based data structures"""
def __init__(self):
# Frequency counter
from collections import defaultdict, Counter
self.freq_counter = Counter()
self.default_dict = defaultdict(int)
# Seen states
self.visited = set()
# Grouped data
self.groups = defaultdict(list)
def add_element(self, element):
"""Add element and track frequency"""
self.freq_counter[element] += 1
self.visited.add(element)
def group_by_property(self, items, key_func):
"""Group items by a property"""
for item in items:
key = key_func(item)
self.groups[key].append(item)
return dict(self.groups)
Hash Table Best Practices
def optimization_tips():
"""Performance optimization techniques"""
# 1. Pre-size hash tables when possible
large_dict = dict() # Will resize multiple times
presized_dict = {}
# 2. Use appropriate hash functions
def good_hash_function(obj):
# Combine multiple attributes
return hash((obj.attr1, obj.attr2, obj.attr3))
# 3. Minimize hash collisions
from collections import defaultdict
# Use frozenset for set hashing
set_as_key = frozenset([1, 2, 3])
hash(set_as_key) # Works because frozenset is hashable
# 4. Consider memory vs speed tradeoffs
memory_efficient = set() # Only stores keys
feature_rich = defaultdict(list) # Stores key-value pairs
Summary & Quick Reference
Common Hash Patterns
| Pattern |
Template |
Use Case |
Example |
| Frequency Count |
Counter(arr) |
Count occurrences |
Anagrams, duplicates |
| Seen States |
visited = set() |
Cycle detection |
Happy number, linked list cycle |
| Group by Key |
groups[key].append(item) |
Categorization |
Group anagrams |
| Rolling Hash |
Update hash incrementally |
Substring search |
Pattern matching |
Time Complexity Guide
| Operation |
Average Case |
Worst Case |
Notes |
| Insert |
O(1) |
O(n) |
With good hash function |
| Search |
O(1) |
O(n) |
Depends on collisions |
| Delete |
O(1) |
O(n) |
Same as search |
| Iteration |
O(n) |
O(n) |
Visit all elements |
Space Complexity Considerations
- Hash Table: O(n) where n is number of elements
- Rolling Hash: O(1) additional space
- Frequency Counter: O(k) where k is number of unique elements
Common Mistakes & Tips
π« Common Mistakes:
- Using mutable objects as hash keys
- Not handling hash collisions properly
- Excessive hash function computation
- Memory leaks with large hash tables
β
Best Practices:
- Use immutable types as keys (strings, tuples, frozensets)
- Choose good hash functions to minimize collisions
- Consider using
defaultdict for automatic initialization
- Use
Counter for frequency counting
- Implement rolling hash for string matching problems
Interview Tips
- Identify hash opportunities: Look for counting, grouping, or fast lookup needs
- Choose right data structure: set vs dict vs Counter vs defaultdict
- Consider time-space tradeoffs: Hash table vs other approaches
- Handle edge cases: Empty inputs, single elements
- Optimize for the problem: Rolling hash for strings, frequency maps for counting
- Test with examples: Verify hash collisions donβt break logic
This comprehensive hashing cheatsheet covers the most important patterns and techniques for solving hash-based problems efficiently.