Hashing

# Overview
# Key Properties
# Core Characteristics
# Problem Categories
# Category 1: Frequency Maps
# Category 2: Prefix Hash / Rolling Hash
# Category 3: HashSet for Seen States
# Templates & Algorithms
# Template Comparison Table
# Template 1: Frequency Counter
# Template 2: Rolling Hash (Rabin-Karp)
# Template 3: HashSet for Cycle Detection
# Template 4: Group by Hash Key
# Problems by Pattern
# Frequency Maps Problems
# Rolling Hash Problems
# HashSet for Seen States Problems
# LC Examples
# 1. Valid Anagram (LC 242)
# 2. Group Anagrams (LC 49)
# 3. Happy Number (LC 202)
# 4. Longest Consecutive Sequence (LC 128)
# 5. Repeated DNA Sequences (LC 187)
# 6. Top K Frequent Elements (LC 347)
# Advanced Techniques
# Custom Hash Functions
# Hash-based Data Structures
# Performance Optimization Tips
# Hash Table Best Practices
# Summary & Quick Reference
# Common Hash Patterns
# Time Complexity Guide
# Space Complexity Considerations
# Common Mistakes & Tips
# Interview Tips

# Hashing & Counting

# Overview

Hashing & Counting techniques use hash tables and frequency maps to solve problems involving counting, grouping, and fast lookups.

# Key Properties

Time Complexity: O(1) average for hash operations, O(n) for full traversal
Space Complexity: O(n) for hash table storage
Core Idea: Trade space for time using hash table data structures
When to Use: Fast lookups, frequency counting, duplicate detection, grouping
Key Data Structures: HashMap, HashSet, Counter, defaultdict

# Core Characteristics

Fast Lookups: O(1) average case for search/insert/delete
Frequency Tracking: Count occurrences of elements
Duplicate Detection: Identify seen elements
Grouping: Collect items with same properties
Rolling Hash: Efficient string matching and substring problems

# Problem Categories

# Category 1: Frequency Maps

Description: Count occurrences and group by frequency
Examples: LC 242 (Valid Anagram), LC 49 (Group Anagrams), LC 169 (Majority Element)
Pattern: Use HashMap to count frequencies, then analyze counts

# Category 2: Prefix Hash / Rolling Hash

Description: Efficient string matching using hash functions
Examples: LC 28 (Find Index), LC 187 (Repeated DNA), LC 1044 (Longest Duplicate Substring)
Pattern: Compute rolling hash for sliding windows

# Category 3: HashSet for Seen States

Description: Track visited elements to detect patterns or cycles
Examples: LC 202 (Happy Number), LC 141 (Linked List Cycle), LC 128 (Longest Consecutive)
Pattern: Use HashSet to remember seen states

# Templates & Algorithms

# Template Comparison Table

Template Type	Use Case	Time Complexity	When to Use
Frequency Counter	Count elements	O(n)	Anagrams, duplicates
Rolling Hash	String matching	O(n+m)	Substring search
Seen States	Cycle detection	O(n)	Detect patterns
Group by Hash	Categorization	O(n)	Grouping similar items

# Template 1: Frequency Counter

def frequency_counter_template(arr):
    """Basic frequency counting template"""
    from collections import Counter, defaultdict

    # Method 1: Using Counter
    count = Counter(arr)

    # Method 2: Using defaultdict
    freq = defaultdict(int)
    for item in arr:
        freq[item] += 1

    # Method 3: Manual counting
    manual_count = {}
    for item in arr:
        manual_count[item] = manual_count.get(item, 0) + 1

    return count, freq, manual_count

# Template 2: Rolling Hash (Rabin-Karp)

def rolling_hash_template(text, pattern):
    """Rolling hash for pattern matching"""
    if len(pattern) > len(text):
        return -1

    # Hash function parameters
    base = 256
    mod = 10**9 + 7

    def compute_hash(s, length):
        """Compute hash for first 'length' characters"""
        hash_val = 0
        for i in range(length):
            hash_val = (hash_val * base + ord(s[i])) % mod
        return hash_val

    def rolling_hash(s, old_hash, old_char, new_char, base_power, mod):
        """Update hash by removing old_char and adding new_char"""
        new_hash = (old_hash - ord(old_char) * base_power) % mod
        new_hash = (new_hash * base + ord(new_char)) % mod
        return new_hash

    pattern_len = len(pattern)
    pattern_hash = compute_hash(pattern, pattern_len)
    text_hash = compute_hash(text, pattern_len)

    # Precompute base^(pattern_len-1) % mod
    base_power = pow(base, pattern_len - 1, mod)

    # Check first window
    if pattern_hash == text_hash and text[:pattern_len] == pattern:
        return 0

    # Rolling hash for remaining windows
    for i in range(len(text) - pattern_len):
        text_hash = rolling_hash(
            text, text_hash, text[i], text[i + pattern_len], base_power, mod
        )

        if pattern_hash == text_hash and text[i+1:i+1+pattern_len] == pattern:
            return i + 1

    return -1

# Template 3: HashSet for Cycle Detection

def cycle_detection_template(start_value, next_function):
    """Detect cycles using HashSet"""
    seen = set()
    current = start_value

    while current not in seen:
        seen.add(current)
        current = next_function(current)

        # Optional: check for termination condition
        if is_terminal(current):
            return False

    return True  # Cycle detected

def floyd_cycle_detection(start_value, next_function):
    """Floyd's cycle detection (tortoise and hare)"""
    slow = fast = start_value

    # Phase 1: Detect if cycle exists
    while True:
        slow = next_function(slow)
        fast = next_function(next_function(fast))
        if slow == fast:
            break
        if is_terminal(fast):
            return None  # No cycle

    # Phase 2: Find cycle start
    slow = start_value
    while slow != fast:
        slow = next_function(slow)
        fast = next_function(fast)

    return slow  # Start of cycle

# Template 4: Group by Hash Key

def group_by_hash_template(items, key_function):
    """Group items by hash key"""
    from collections import defaultdict

    groups = defaultdict(list)
    for item in items:
        key = key_function(item)
        groups[key].append(item)

    return dict(groups)

def group_anagrams_template(strs):
    """Group anagrams using sorted string as key"""
    from collections import defaultdict

    groups = defaultdict(list)
    for s in strs:
        # Use sorted string as key
        key = ''.join(sorted(s))
        groups[key].append(s)

    return list(groups.values())

# Problems by Pattern

# Frequency Maps Problems

Problem	LC #	Key Technique	Difficulty
Valid Anagram	242	Character frequency	Easy
Group Anagrams	49	Sorted string as key	Medium
Majority Element	169	Count frequency	Easy
Top K Frequent Elements	347	Frequency + heap	Medium
Find All Anagrams	438	Sliding window + freq	Medium
Longest Substring Without Repeating	3	Sliding window + seen	Medium

# Rolling Hash Problems

Problem	LC #	Key Technique	Difficulty
Implement strStr()	28	Rabin-Karp	Easy
Repeated DNA Sequences	187	10-char rolling hash	Medium
Longest Duplicate Substring	1044	Binary search + rolling hash	Hard
Find All Duplicates in Array	442	Index hashing	Medium

# HashSet for Seen States Problems

Problem	LC #	Key Technique	Difficulty
Happy Number	202	Detect cycle in sequence	Easy
Linked List Cycle	141	Fast/slow or HashSet	Easy
Longest Consecutive Sequence	128	HashSet lookup	Medium
Contains Duplicate	217	Simple HashSet	Easy
Contains Duplicate II	219	HashSet with window	Easy

# LC Examples

# 1. Valid Anagram (LC 242)

def isAnagram(s, t):
    """Check if two strings are anagrams"""
    if len(s) != len(t):
        return False

    # Method 1: Frequency counter
    from collections import Counter
    return Counter(s) == Counter(t)

    # Method 2: Manual counting
    count = {}
    for char in s:
        count[char] = count.get(char, 0) + 1

    for char in t:
        if char not in count:
            return False
        count[char] -= 1
        if count[char] == 0:
            del count[char]

    return len(count) == 0

    # Method 3: Sorting (not using hash)
    return sorted(s) == sorted(t)

# 2. Group Anagrams (LC 49)

def groupAnagrams(strs):
    """Group strings that are anagrams"""
    from collections import defaultdict

    # Method 1: Use sorted string as key
    groups = defaultdict(list)
    for s in strs:
        key = ''.join(sorted(s))
        groups[key].append(s)

    return list(groups.values())

    # Method 2: Use frequency tuple as key
    def get_frequency_key(s):
        freq = [0] * 26
        for char in s:
            freq[ord(char) - ord('a')] += 1
        return tuple(freq)

    groups = defaultdict(list)
    for s in strs:
        key = get_frequency_key(s)
        groups[key].append(s)

    return list(groups.values())

# 3. Happy Number (LC 202)

def isHappy(n):
    """Detect if number leads to 1 or cycles"""

    def get_sum_of_squares(num):
        total = 0
        while num > 0:
            digit = num % 10
            total += digit * digit
            num //= 10
        return total

    # Method 1: HashSet to detect cycle
    seen = set()
    while n != 1 and n not in seen:
        seen.add(n)
        n = get_sum_of_squares(n)

    return n == 1

    # Method 2: Floyd's cycle detection
    def next_number(num):
        return get_sum_of_squares(num)

    slow = fast = n
    while True:
        slow = next_number(slow)
        fast = next_number(next_number(fast))
        if slow == fast:
            break

    return slow == 1

# 4. Longest Consecutive Sequence (LC 128)

def longestConsecutive(nums):
    """Find longest consecutive sequence"""
    if not nums:
        return 0

    num_set = set(nums)
    longest = 0

    for num in num_set:
        # Only start counting if num-1 is not in set
        # This ensures we start from the beginning of a sequence
        if num - 1 not in num_set:
            current_num = num
            current_length = 1

            # Count consecutive numbers
            while current_num + 1 in num_set:
                current_num += 1
                current_length += 1

            longest = max(longest, current_length)

    return longest

# 5. Repeated DNA Sequences (LC 187)

def findRepeatedDnaSequences(s):
    """Find repeated 10-character DNA sequences using rolling hash"""
    if len(s) < 10:
        return []

    # Method 1: Simple approach with substring
    seen = set()
    repeated = set()

    for i in range(len(s) - 9):
        substring = s[i:i+10]
        if substring in seen:
            repeated.add(substring)
        else:
            seen.add(substring)

    return list(repeated)

    # Method 2: Rolling hash approach
    def char_to_num(c):
        return {'A': 0, 'C': 1, 'G': 2, 'T': 3}[c]

    def rolling_hash_dna(s):
        if len(s) < 10:
            return []

        seen = set()
        repeated = set()

        # Compute hash for first window
        hash_val = 0
        base = 4
        mod = 10**9 + 7

        for i in range(10):
            hash_val = hash_val * base + char_to_num(s[i])

        seen.add(hash_val)
        base_power = base ** 9

        # Rolling hash for remaining windows
        for i in range(10, len(s)):
            # Remove first character and add new character
            hash_val = hash_val - char_to_num(s[i-10]) * base_power
            hash_val = hash_val * base + char_to_num(s[i])

            if hash_val in seen:
                repeated.add(s[i-9:i+1])
            else:
                seen.add(hash_val)

        return list(repeated)

    return rolling_hash_dna(s)

# 6. Top K Frequent Elements (LC 347)

def topKFrequent(nums, k):
    """Find k most frequent elements"""
    from collections import Counter
    import heapq

    # Method 1: Counter + heap
    count = Counter(nums)
    return heapq.nlargest(k, count.keys(), key=count.get)

    # Method 2: Counter + sorting
    count = Counter(nums)
    return [item for item, freq in count.most_common(k)]

    # Method 3: Bucket sort approach
    count = Counter(nums)
    buckets = [[] for _ in range(len(nums) + 1)]

    # Place elements in buckets by frequency
    for num, freq in count.items():
        buckets[freq].append(num)

    # Collect top k elements
    result = []
    for i in range(len(buckets) - 1, 0, -1):
        if buckets[i]:
            result.extend(buckets[i])
            if len(result) >= k:
                return result[:k]

    return result

# Advanced Techniques

# Custom Hash Functions

def custom_hash_techniques():
    """Various custom hashing approaches"""

    # 1. Polynomial rolling hash
    def polynomial_hash(s, base=31, mod=10**9+7):
        hash_val = 0
        base_power = 1
        for char in s:
            hash_val = (hash_val + ord(char) * base_power) % mod
            base_power = (base_power * base) % mod
        return hash_val

    # 2. XOR hash for pairs
    def xor_hash(a, b):
        return hash(a) ^ hash(b)

    # 3. Tuple hash for coordinates
    def coordinate_hash(x, y):
        return hash((x, y))

    # 4. String hash ignoring order
    def unordered_hash(s):
        return sum(hash(c) for c in s)

# Hash-based Data Structures

class HashBasedStructures:
    """Examples of hash-based data structures"""

    def __init__(self):
        # Frequency counter
        from collections import defaultdict, Counter
        self.freq_counter = Counter()
        self.default_dict = defaultdict(int)

        # Seen states
        self.visited = set()

        # Grouped data
        self.groups = defaultdict(list)

    def add_element(self, element):
        """Add element and track frequency"""
        self.freq_counter[element] += 1
        self.visited.add(element)

    def group_by_property(self, items, key_func):
        """Group items by a property"""
        for item in items:
            key = key_func(item)
            self.groups[key].append(item)
        return dict(self.groups)

# Performance Optimization Tips

# Hash Table Best Practices

def optimization_tips():
    """Performance optimization techniques"""

    # 1. Pre-size hash tables when possible
    large_dict = dict()  # Will resize multiple times
    presized_dict = {}

    # 2. Use appropriate hash functions
    def good_hash_function(obj):
        # Combine multiple attributes
        return hash((obj.attr1, obj.attr2, obj.attr3))

    # 3. Minimize hash collisions
    from collections import defaultdict

    # Use frozenset for set hashing
    set_as_key = frozenset([1, 2, 3])
    hash(set_as_key)  # Works because frozenset is hashable

    # 4. Consider memory vs speed tradeoffs
    memory_efficient = set()  # Only stores keys
    feature_rich = defaultdict(list)  # Stores key-value pairs

# Summary & Quick Reference

# Common Hash Patterns

Pattern	Template	Use Case	Example
Frequency Count	`Counter(arr)`	Count occurrences	Anagrams, duplicates
Seen States	`visited = set()`	Cycle detection	Happy number, linked list cycle
Group by Key	`groups[key].append(item)`	Categorization	Group anagrams
Rolling Hash	Update hash incrementally	Substring search	Pattern matching

# Time Complexity Guide

Operation	Average Case	Worst Case	Notes
Insert	O(1)	O(n)	With good hash function
Search	O(1)	O(n)	Depends on collisions
Delete	O(1)	O(n)	Same as search
Iteration	O(n)	O(n)	Visit all elements

# Space Complexity Considerations

Hash Table: O(n) where n is number of elements
Rolling Hash: O(1) additional space
Frequency Counter: O(k) where k is number of unique elements

# Common Mistakes & Tips

🚫 Common Mistakes:

Using mutable objects as hash keys
Not handling hash collisions properly
Excessive hash function computation
Memory leaks with large hash tables

✅ Best Practices:

Use immutable types as keys (strings, tuples, frozensets)
Choose good hash functions to minimize collisions
Consider using defaultdict for automatic initialization
Use Counter for frequency counting
Implement rolling hash for string matching problems

# Interview Tips

Identify hash opportunities: Look for counting, grouping, or fast lookup needs
Choose right data structure: set vs dict vs Counter vs defaultdict
Consider time-space tradeoffs: Hash table vs other approaches
Handle edge cases: Empty inputs, single elements
Optimize for the problem: Rolling hash for strings, frequency maps for counting
Test with examples: Verify hash collisions don’t break logic

This comprehensive hashing cheatsheet covers the most important patterns and techniques for solving hash-based problems efficiently.