Table of Contents
- # Overview
- # Key Properties
- # 0) Concept
- # 0-1) Types
- # Implementation Comparison
- # 0-2) Pattern
- # 1) General form
- # 1-1) Basic OP
- # 2) LC Example
- # 2-1) Lowest Common Ancestor of a Binary Tree III
- # 2-2) Contains Duplicate
- # 2-3) Intersection of Two Arrays
- # 2-4) Happy Number
- # 2-5) Longest Consecutive Sequence
- # 2-6) Single Number
- # 2-7) Valid Sudoku
- # 2-8) Number of Distinct Islands
- # 2-9) Linked List Cycle Detection
- # 2-10) Word Pattern
- # Problem Categories
- # Category 1: Duplicate Detection (10 problems)
- # Category 2: Set Operations (8 problems)
- # Category 3: Path/Ancestry Tracking (6 problems)
- # Category 4: Sequence Problems (7 problems)
- # Category 5: Graph/Island Problems (5 problems)
- # Category 6: String/Pattern Matching (6 problems)
- # Decision Framework
- # When to Use Set vs Other Data Structures
- # Set vs HashMap Choice
- # Python Set vs Java Set
- # Summary & Best Practices
- # Key Takeaways
- # Interview Tips
# Set

# Overview
Set is a collection data structure that stores unique elements with no duplicates. It provides efficient membership testing, insertion, and deletion operations.
# Key Properties
- Time Complexity:
- Add: O(1) average, O(n) worst
- Remove: O(1) average, O(n) worst
- Contains: O(1) average, O(n) worst
- Union/Intersection: O(min(len(s1), len(s2)))
- Space Complexity: O(n)
- Core Features: No duplicates, unordered (HashSet), O(1) lookups
- When to Use: Remove duplicates, membership testing, set operations (union, intersection, difference)
# 0) Concept
# 0-1) Types
# HashSet
- Python:
set()- unordered, fastest operations - Java:
HashSet<T>- backed by HashMap - Time: O(1) average for add/remove/contains
- Use case: When order doesnβt matter, need fast lookups
# LinkedHashSet
- Python: No native support (use OrderedDict keys)
- Java:
LinkedHashSet<T>- maintains insertion order - Time: O(1) for operations, preserves order
- Use case: Need set operations + insertion order
# TreeSet
- Python: No native support (use sorted containers)
- Java:
TreeSet<T>- sorted, uses Red-Black tree - Time: O(log n) for add/remove/contains
- Use case: Need sorted elements, range queries
# Implementation Comparison
| Type | Ordering | Time | Space | Use Case |
|---|---|---|---|---|
| HashSet | None | O(1) | O(n) | Fast lookups, no order needed |
| LinkedHashSet | Insertion | O(1) | O(n) | Preserve insertion order |
| TreeSet | Sorted | O(log n) | O(n) | Sorted data, range queries |
# 0-2) Pattern
# Pattern 1: Set Operations
# Union, Intersection, Difference
s1 = {1, 2, 3}
s2 = {2, 3, 4}
union = s1 | s2 # {1, 2, 3, 4}
intersection = s1 & s2 # {2, 3}
difference = s1 - s2 # {1}
symmetric_diff = s1 ^ s2 # {1, 4}
# Pattern 2: Duplicate Detection
# Check for duplicates in array
def has_duplicate(nums):
return len(nums) != len(set(nums))
# Find duplicates
def find_duplicates(nums):
seen = set()
duplicates = set()
for num in nums:
if num in seen:
duplicates.add(num)
seen.add(num)
return duplicates
# Pattern 3: Two-Set Tracking
# Track visited and current path (for cycle detection)
def has_cycle(graph, start):
visited = set()
current_path = set()
def dfs(node):
if node in current_path:
return True # Cycle detected
if node in visited:
return False
visited.add(node)
current_path.add(node)
for neighbor in graph[node]:
if dfs(neighbor):
return True
current_path.remove(node)
return False
return dfs(start)
# Pattern 4: Set for Path/Ancestry Tracking
# LC 1650 - Find LCA using set to track ancestors
def lowestCommonAncestor(p, q):
# Track all ancestors of p
ancestors = set()
while p:
ancestors.add(p)
p = p.parent
# Find first common ancestor
while q:
if q in ancestors:
return q
q = q.parent
return None
# 1) General form
# 1-1) Basic OP
# 1-1-1) Set Creation and Basic Operations
# Python
# Create empty set
s = set()
s = {} # Wrong! This creates a dict
# Create with elements
s = {1, 2, 3}
s = set([1, 2, 3])
s = set("abc") # {'a', 'b', 'c'}
# Add element
s.add(4)
# Remove element
s.remove(3) # Raises KeyError if not exists
s.discard(3) # No error if not exists
s.pop() # Remove and return arbitrary element
# Check membership
if 2 in s:
print("Found")
# Size
len(s)
# Clear all
s.clear()
// Java
// Create HashSet
Set<Integer> set = new HashSet<>();
// Add element
set.add(1);
set.add(2);
set.add(3);
// Remove element
set.remove(2);
// Check membership
if (set.contains(1)) {
System.out.println("Found");
}
// Size
int size = set.size();
// Clear
set.clear();
// Iterate
for (int num : set) {
System.out.println(num);
}
# 1-1-2) Set Operations
# Python set operations
s1 = {1, 2, 3, 4}
s2 = {3, 4, 5, 6}
# Union (elements in either set)
union1 = s1 | s2
union2 = s1.union(s2) # {1, 2, 3, 4, 5, 6}
# Intersection (elements in both sets)
inter1 = s1 & s2
inter2 = s1.intersection(s2) # {3, 4}
# Difference (elements in s1 but not s2)
diff1 = s1 - s2
diff2 = s1.difference(s2) # {1, 2}
# Symmetric difference (elements in either but not both)
sym1 = s1 ^ s2
sym2 = s1.symmetric_difference(s2) # {1, 2, 5, 6}
# Subset check
is_subset = s1.issubset(s2) # False
is_superset = s1.issuperset(s2) # False
# Disjoint check (no common elements)
is_disjoint = s1.isdisjoint(s2) # False
// Java set operations
Set<Integer> s1 = new HashSet<>(Arrays.asList(1, 2, 3, 4));
Set<Integer> s2 = new HashSet<>(Arrays.asList(3, 4, 5, 6));
// Union
Set<Integer> union = new HashSet<>(s1);
union.addAll(s2); // {1, 2, 3, 4, 5, 6}
// Intersection
Set<Integer> intersection = new HashSet<>(s1);
intersection.retainAll(s2); // {3, 4}
// Difference
Set<Integer> difference = new HashSet<>(s1);
difference.removeAll(s2); // {1, 2}
// Subset check
boolean isSubset = s2.containsAll(s1); // false
# 1-1-3) Converting Between Collections
# Python conversions
arr = [1, 2, 2, 3, 3, 4]
# Array to set (remove duplicates)
s = set(arr) # {1, 2, 3, 4}
# Set to array
arr_unique = list(s)
# Set to sorted array
arr_sorted = sorted(s)
# String to set
char_set = set("hello") # {'h', 'e', 'l', 'o'}
# Set to string
s = {'a', 'b', 'c'}
string = ''.join(sorted(s)) # 'abc'
// Java conversions
Integer[] arr = {1, 2, 2, 3, 3, 4};
// Array to set
Set<Integer> set = new HashSet<>(Arrays.asList(arr));
// Set to array
Integer[] arrUnique = set.toArray(new Integer[0]);
// Set to list
List<Integer> list = new ArrayList<>(set);
// List to set
Set<Integer> set2 = new HashSet<>(list);
# 2) LC Example
# 2-1) Lowest Common Ancestor of a Binary Tree III
# LC 1650. Lowest Common Ancestor of a Binary Tree III
# NOTE : there are also dict, recursive.. approaches
# V0''
# IDEA : set - track ancestry path
# Time: O(h) where h is tree height
# Space: O(h) for storing ancestors
class Solution:
def lowestCommonAncestor(self, p, q):
# Store all ancestors of p
visited = set()
while p:
visited.add(p)
p = p.parent
# Find first common ancestor with q
while q:
if q in visited:
return q
q = q.parent
# 2-2) Contains Duplicate
# LC 217. Contains Duplicate
# V0
# IDEA: Set to detect duplicates
class Solution:
def containsDuplicate(self, nums):
return len(nums) != len(set(nums))
# V0'
# IDEA: Build set while checking
class Solution:
def containsDuplicate(self, nums):
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
// Java
// LC 217
public boolean containsDuplicate(int[] nums) {
Set<Integer> seen = new HashSet<>();
for (int num : nums) {
if (seen.contains(num)) {
return true;
}
seen.add(num);
}
return false;
}
# 2-3) Intersection of Two Arrays
# LC 349. Intersection of Two Arrays
# V0
# IDEA: Set intersection
class Solution:
def intersection(self, nums1, nums2):
return list(set(nums1) & set(nums2))
# V0'
# IDEA: Convert to sets and use intersection
class Solution:
def intersection(self, nums1, nums2):
set1 = set(nums1)
set2 = set(nums2)
return list(set1.intersection(set2))
// Java
// LC 349
public int[] intersection(int[] nums1, int[] nums2) {
Set<Integer> set1 = new HashSet<>();
for (int num : nums1) {
set1.add(num);
}
Set<Integer> result = new HashSet<>();
for (int num : nums2) {
if (set1.contains(num)) {
result.add(num);
}
}
return result.stream().mapToInt(i -> i).toArray();
}
# 2-4) Happy Number
# LC 202. Happy Number
# V0
# IDEA: Use set to detect cycles
class Solution:
def isHappy(self, n):
def get_next(num):
total = 0
while num > 0:
digit = num % 10
total += digit * digit
num //= 10
return total
seen = set()
while n != 1 and n not in seen:
seen.add(n)
n = get_next(n)
return n == 1
# 2-5) Longest Consecutive Sequence
# LC 128. Longest Consecutive Sequence
# V0
# IDEA: Set for O(1) lookups
# Time: O(n), Space: O(n)
class Solution:
def longestConsecutive(self, nums):
if not nums:
return 0
num_set = set(nums)
max_length = 0
for num in num_set:
# Only start counting from sequence start
if num - 1 not in num_set:
current = num
length = 1
while current + 1 in num_set:
current += 1
length += 1
max_length = max(max_length, length)
return max_length
// Java
// LC 128
public int longestConsecutive(int[] nums) {
Set<Integer> numSet = new HashSet<>();
for (int num : nums) {
numSet.add(num);
}
int maxLength = 0;
for (int num : numSet) {
// Only start from sequence beginning
if (!numSet.contains(num - 1)) {
int current = num;
int length = 1;
while (numSet.contains(current + 1)) {
current++;
length++;
}
maxLength = Math.max(maxLength, length);
}
}
return maxLength;
}
# 2-6) Single Number
# LC 136. Single Number
# V0
# IDEA: XOR all numbers (duplicates cancel out)
class Solution:
def singleNumber(self, nums):
result = 0
for num in nums:
result ^= num
return result
# V0'
# IDEA: Set addition/removal
class Solution:
def singleNumber(self, nums):
return 2 * sum(set(nums)) - sum(nums)
# 2-7) Valid Sudoku
# LC 36. Valid Sudoku
# V0
# IDEA: Use sets to track seen values
class Solution:
def isValidSudoku(self, board):
# Track seen elements in rows, cols, boxes
seen = set()
for i in range(9):
for j in range(9):
if board[i][j] != '.':
val = board[i][j]
box_idx = (i // 3) * 3 + j // 3
# Create unique keys for row, col, box
row_key = f"row_{i}_{val}"
col_key = f"col_{j}_{val}"
box_key = f"box_{box_idx}_{val}"
if row_key in seen or col_key in seen or box_key in seen:
return False
seen.add(row_key)
seen.add(col_key)
seen.add(box_key)
return True
# 2-8) Number of Distinct Islands
# LC 694. Number of Distinct Islands
# V0
# IDEA: Use set to store unique island shapes
class Solution:
def numDistinctIslands(self, grid):
if not grid:
return 0
def dfs(i, j, i0, j0):
# Record relative position from starting point
if (0 <= i < len(grid) and 0 <= j < len(grid[0]) and
grid[i][j] == 1):
grid[i][j] = 0
path.append((i - i0, j - j0))
dfs(i+1, j, i0, j0)
dfs(i-1, j, i0, j0)
dfs(i, j+1, i0, j0)
dfs(i, j-1, i0, j0)
shapes = set()
for i in range(len(grid)):
for j in range(len(grid[0])):
if grid[i][j] == 1:
path = []
dfs(i, j, i, j)
# Convert list to tuple for hashing
shapes.add(tuple(path))
return len(shapes)
# 2-9) Linked List Cycle Detection
# LC 141. Linked List Cycle
# V0
# IDEA: Use set to track visited nodes
class Solution:
def hasCycle(self, head):
visited = set()
current = head
while current:
if current in visited:
return True
visited.add(current)
current = current.next
return False
# V0'
# IDEA: Two pointers (Floyd's algorithm) - O(1) space
class Solution:
def hasCycle(self, head):
if not head:
return False
slow = head
fast = head.next
while slow != fast:
if not fast or not fast.next:
return False
slow = slow.next
fast = fast.next.next
return True
# 2-10) Word Pattern
# LC 290. Word Pattern
# V0
# IDEA: Use 2 sets to track bijection
class Solution:
def wordPattern(self, pattern, s):
words = s.split()
if len(pattern) != len(words):
return False
char_to_word = {}
word_to_char = {}
for c, word in zip(pattern, words):
if c in char_to_word:
if char_to_word[c] != word:
return False
else:
char_to_word[c] = word
if word in word_to_char:
if word_to_char[word] != c:
return False
else:
word_to_char[word] = c
return True
# Problem Categories
# Category 1: Duplicate Detection (10 problems)
| Problem | LC # | Difficulty | Pattern | Key Insight |
|---|---|---|---|---|
| Contains Duplicate | 217 | Easy | Set size | len(nums) != len(set(nums)) |
| Contains Duplicate II | 219 | Easy | Sliding window set | Keep window of k elements |
| Contains Duplicate III | 220 | Medium | TreeSet/SortedList | Maintain sorted window |
| Find Duplicate | 287 | Medium | Cycle detection | Floydβs algorithm or set |
| Find All Duplicates | 442 | Medium | Index marking | Use array as hashmap |
| Single Number | 136 | Easy | XOR/Set | XOR cancels duplicates |
| Single Number II | 137 | Medium | Bit manipulation | Count bits mod 3 |
| Single Number III | 260 | Medium | XOR + grouping | Group by differing bit |
| Missing Number | 268 | Easy | Set/XOR | Expected vs actual |
| First Missing Positive | 41 | Hard | In-place set | Use array indices |
# Category 2: Set Operations (8 problems)
| Problem | LC # | Difficulty | Pattern | Key Insight |
|---|---|---|---|---|
| Intersection of Two Arrays | 349 | Easy | Set intersection | set1 & set2 |
| Intersection of Two Arrays II | 350 | Easy | Counter | Track frequencies |
| Union of Two Arrays | - | Easy | Set union | set1 |
| Distribute Candies | 575 | Easy | Set size | min(len(set), n/2) |
| Uncommon Words | 884 | Easy | Set difference | Count once in either |
| Set Mismatch | 645 | Easy | Set difference | Find duplicate & missing |
| Fair Candy Swap | 888 | Easy | Set membership | Target difference |
| Buddy Strings | 859 | Easy | Set of pairs | Check swap possible |
# Category 3: Path/Ancestry Tracking (6 problems)
| Problem | LC # | Difficulty | Pattern | Key Insight |
|---|---|---|---|---|
| Lowest Common Ancestor III | 1650 | Medium | Ancestor set | Track parent path |
| Linked List Cycle | 141 | Easy | Visited set | Two pointers better |
| Linked List Cycle II | 142 | Medium | Visited set | Floydβs algorithm |
| Course Schedule | 207 | Medium | DFS + set | Detect cycle |
| Course Schedule II | 210 | Medium | Topological sort | Track visited/path |
| Find Eventual Safe Nodes | 802 | Medium | DFS + states | Terminal vs unsafe |
# Category 4: Sequence Problems (7 problems)
| Problem | LC # | Difficulty | Pattern | Key Insight |
|---|---|---|---|---|
| Longest Consecutive Sequence | 128 | Medium | Set lookups | Start from sequence begin |
| Longest Substring Without Repeat | 3 | Medium | Sliding window set | Track seen chars |
| Longest Palindrome | 409 | Easy | Char frequency | Pairs + one odd |
| Maximum Length of Repeated Subarray | 718 | Medium | Set of tuples | Rolling hash |
| Arithmetic Slices | 413 | Medium | Set of differences | Track valid sequences |
| Happy Number | 202 | Easy | Cycle detection | Track seen sums |
| Valid Sudoku | 36 | Medium | Multiple sets | Row/col/box tracking |
# Category 5: Graph/Island Problems (5 problems)
| Problem | LC # | Difficulty | Pattern | Key Insight |
|---|---|---|---|---|
| Number of Islands | 200 | Medium | DFS/BFS visited | Track processed cells |
| Number of Distinct Islands | 694 | Medium | Shape hashing | Normalize positions |
| Max Area of Island | 695 | Medium | DFS + visited | Track seen cells |
| Island Perimeter | 463 | Easy | Border counting | Count land-water edges |
| Surrounded Regions | 130 | Medium | Border DFS | Mark connected to border |
# Category 6: String/Pattern Matching (6 problems)
| Problem | LC # | Difficulty | Pattern | Key Insight |
|---|---|---|---|---|
| Isomorphic Strings | 205 | Easy | Bijection | Two maps or sets |
| Word Pattern | 290 | Easy | Bijection | Char β word mapping |
| Group Anagrams | 49 | Medium | Sorted key | Use sorted string |
| Find Anagrams | 438 | Medium | Window + counter | Sliding character counts |
| Jewels and Stones | 771 | Easy | Set membership | Set of jewels |
| Unique Email Addresses | 929 | Easy | Normalize + set | Clean emails |
# Decision Framework
# When to Use Set vs Other Data Structures
Problem Analysis:
1. Need to track unique elements?
βββ YES β Consider Set
β βββ Need ordering?
β β βββ YES β TreeSet (Java) / sorted list (Python)
β β βββ NO β HashSet
β βββ Need count?
β β βββ NO β Use Counter/HashMap instead
β βββ Need fast lookups?
β βββ YES β HashSet (O(1) average)
βββ NO β Consider other structures
2. Performing set operations (union, intersection)?
βββ YES β Use Set
β βββ Multiple operations β Build set once
βββ NO β Continue analysis
3. Detecting duplicates/cycles?
βββ YES β Use Set for visited tracking
β βββ Space constrained?
β β βββ YES β Consider Floyd's algorithm
β βββ NO β Set is ideal
βββ NO β Continue analysis
4. Checking membership repeatedly?
βββ YES β Convert to Set first
β βββ O(n) conversion + O(1) lookups
βββ NO β Linear search may be fine
# Set vs HashMap Choice
| Use Set When | Use HashMap When |
|---|---|
| Only need existence check | Need key-value mapping |
| Removing duplicates | Counting frequencies |
| Set operations (βͺ, β©, -) | Need associated data |
| Memory efficient (no values) | Need to track counts/indices |
# Python Set vs Java Set
| Feature | Python set |
Java HashSet |
|---|---|---|
| Creation | s = {1,2,3} or set() |
Set<T> s = new HashSet<>() |
| Add | s.add(x) |
s.add(x) |
| Remove | s.remove(x) / s.discard(x) |
s.remove(x) |
| Contains | x in s |
s.contains(x) |
| Union | `s1 | s2ors1.union(s2)` |
| Intersection | s1 & s2 or s1.intersection(s2) |
s1.retainAll(s2) |
| Difference | s1 - s2 or s1.difference(s2) |
s1.removeAll(s2) |
| Size | len(s) |
s.size() |
| Empty check | not s or len(s) == 0 |
s.isEmpty() |
# Summary & Best Practices
# Key Takeaways
-
When to Use Set:
- Remove duplicates from collection
- Fast membership testing (O(1) average)
- Performing set operations (union, intersection, difference)
- Tracking visited nodes in graphs/trees
- Detecting cycles
-
Performance Characteristics:
- HashSet: O(1) average, O(n) worst (hash collisions)
- TreeSet: O(log n) for all operations
- LinkedHashSet: O(1) operations + insertion order
-
Common Patterns:
- Convert array to set to remove duplicates
- Use set for O(1) lookups instead of O(n) linear search
- Track visited nodes with set
- Detect cycles by checking if element already in set
-
Space-Time Tradeoffs:
- Set uses O(n) extra space for O(1) operations
- Consider two-pointer techniques if space is constrained
- For small inputs, linear search may be faster
# Interview Tips
Common Mistakes to Avoid:
- Using
{}to create empty set in Python (creates dict instead) - Forgetting that sets are unordered (donβt assume order)
- Not considering TreeSet when you need sorted elements
- Using set when you need to count occurrences (use Counter/HashMap)
Optimization Tips:
- Convert lists to sets before repeated membership checks
- Use set operations instead of manual loops
- Consider frozenset for immutable/hashable sets
- Use set comprehensions for cleaner code
Follow-up Questions:
- βCan you solve it with O(1) space?β β Consider Floydβs algorithm
- βWhat if we need to preserve order?β β LinkedHashSet or OrderedDict
- βWhat if we need sorted elements?β β TreeSet or sorted list
- βWhat about duplicates with different data?β β Use HashMap instead