705. Design HashSet - Explanation

Problem Link

Description

Design a HashSet without using any built-in hash table libraries.

Implement MyHashSet class:

  • void add(key) Inserts the value key into the HashSet.
  • bool contains(key) Returns whether the value key exists in the HashSet or not.
  • void remove(key) Removes the value key in the HashSet. If key does not exist in the HashSet, do nothing.

Example 1:

Input: ["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]
[[], [1], [2], [1], [3], [2], [2], [2], [2]]

Output: [null, null, null, true, false, null, true, null, false]

Explanation:
MyHashSet myHashSet = new MyHashSet();
myHashSet.add(1); // set = [1]
myHashSet.add(2); // set = [1, 2]
myHashSet.contains(1); // return True
myHashSet.contains(3); // return False, (not found)
myHashSet.add(2); // set = [1, 2]
myHashSet.contains(2); // return True
myHashSet.remove(2); // set = [1]
myHashSet.contains(2); // return False, (already removed)

Constraints:

  • 0 <= key <= 1,000,000
  • At most 10,000 calls will be made to add, remove, and contains.


Topics

Company Tags

Please upgrade to NeetCode Pro to view company tags.



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Hash Functions - Understanding how keys map to bucket indices via modulo operations
  • Linked Lists - Implementing node-based structures for collision handling (separate chaining)
  • Binary Search Trees - Understanding BST operations for an optimized collision resolution approach
  • Bit Manipulation - Using bitwise operations (OR, AND, XOR) for space-efficient storage

1. Brute Force

Intuition

The simplest implementation uses a dynamic array to store all keys. For each operation, we search through the array linearly. This works correctly but is inefficient since every operation requires scanning potentially all stored elements.

Algorithm

  1. Initialize an empty array data.
  2. For add(key): If the key is not already in the array, append it.
  3. For remove(key): If the key exists in the array, remove it.
  4. For contains(key): Return true if the key exists in the array.
class MyHashSet:

    def __init__(self):
        self.data = []

    def add(self, key: int) -> None:
        if key not in self.data:
            self.data.append(key)

    def remove(self, key: int) -> None:
        if key in self.data:
            self.data.remove(key)

    def contains(self, key: int) -> bool:
        return key in self.data

Time & Space Complexity

  • Time complexity: O(n)O(n) for each function call.
  • Space complexity: O(n)O(n)

2. Boolean Array

Intuition

Since keys are constrained to [0, 1000000], we can use direct addressing with a boolean array. The index represents the key, and the boolean value indicates presence. This provides O(1) operations but uses fixed memory regardless of how many keys are stored.

Algorithm

  1. Initialize a boolean array of size 1000001, all set to false.
  2. For add(key): Set data[key] = true.
  3. For remove(key): Set data[key] = false.
  4. For contains(key): Return data[key].
class MyHashSet:

    def __init__(self):
        self.data = [False] * 1000001

    def add(self, key: int) -> None:
        self.data[key] = True

    def remove(self, key: int) -> None:
        self.data[key] = False

    def contains(self, key: int) -> bool:
        return self.data[key]

Time & Space Complexity

  • Time complexity: O(1)O(1) for each function call.
  • Space complexity: O(1000000)O(1000000) since the key is in the range [0,1000000][0, 1000000].

3. Linked List

Intuition

To reduce memory while handling collisions, we use separate chaining. An array of buckets stores linked lists, and keys are assigned to buckets using a hash function. Each operation traverses only the linked list in the relevant bucket, making average-case operations faster than the brute force approach.

Algorithm

  1. Initialize an array of 10000 buckets, each with a dummy head node.
  2. Define hash(key) as key % 10000.
  3. For add(key): Traverse the list at hash(key). If the key already exists, return. Otherwise, append a new node with the key.
  4. For remove(key): Traverse the list at hash(key). If a node with the matching key is found, remove it by updating the previous node's next pointer.
  5. For contains(key): Traverse the list at hash(key). Return true if the key is found, false otherwise.
class ListNode:
    def __init__(self, key: int):
        self.key = key
        self.next = None

class MyHashSet:

    def __init__(self):
        self.set = [ListNode(0) for _ in range(10**4)]

    def add(self, key: int) -> None:
        cur = self.set[key % len(self.set)]
        while cur.next:
            if cur.next.key == key:
                return
            cur = cur.next
        cur.next = ListNode(key)

    def remove(self, key: int) -> None:
        cur = self.set[key % len(self.set)]
        while cur.next:
            if cur.next.key == key:
                cur.next = cur.next.next
                return
            cur = cur.next

    def contains(self, key: int) -> bool:
        cur = self.set[key % len(self.set)]
        while cur.next:
            if cur.next.key == key:
                return True
            cur = cur.next
        return False

Time & Space Complexity

  • Time complexity: O(nk)O(\frac{n}{k}) for each function call.
  • Space complexity: O(k+m)O(k + m)

Where nn is the number of keys, kk is the size of the set (1000010000) and mm is the number of unique keys.


4. Binary Search Tree

Intuition

Instead of linked lists for collision handling, we can use binary search trees (BSTs) in each bucket. This improves the worst-case time complexity from O(n/k) to O(log(n/k)) for each bucket, since BST operations are logarithmic in the number of nodes. The tradeoff is slightly more complex implementation.

Algorithm

  1. Initialize an array of 10000 buckets, each containing an empty BST.
  2. Define hash(key) as key % 10000.
  3. For add(key): If the key is not already in the BST at hash(key), insert it using standard BST insertion.
  4. For remove(key): Delete the key from the BST at hash(key) using standard BST deletion (finding in-order successor when needed).
  5. For contains(key): Search the BST at hash(key) and return true if the key is found.
class TreeNode:
    def __init__(self, key):
        self.key = key
        self.left = None
        self.right = None

class BST:
    def __init__(self):
        self.root = None

    def insert(self, root, key):
        if not root:
            return TreeNode(key)
        if key < root.key:
            root.left = self.insert(root.left, key)
        elif key > root.key:
            root.right = self.insert(root.right, key)
        return root

    def delete(self, root, key):
        if not root:
            return None
        if key < root.key:
            root.left = self.delete(root.left, key)
        elif key > root.key:
            root.right = self.delete(root.right, key)
        else:
            if not root.left:
                return root.right
            if not root.right:
                return root.left
            temp = self.minValueNode(root.right)
            root.key = temp.key
            root.right = self.delete(root.right, temp.key)
        return root

    def minValueNode(self, root):
        while root.left:
            root = root.left
        return root

    def search(self, root, key):
        if not root:
            return False
        if key == root.key:
            return True
        elif key < root.key:
            return self.search(root.left, key)
        else:
            return self.search(root.right, key)

    def add(self, key):
        self.root = self.insert(self.root, key)

    def remove(self, key):
        self.root = self.delete(self.root, key)

    def contains(self, key):
        return self.search(self.root, key)

class MyHashSet:
    def __init__(self):
        self.size = 10000
        self.buckets = [BST() for _ in range(self.size)]

    def _hash(self, key):
        return key % self.size

    def add(self, key: int) -> None:
        idx = self._hash(key)
        if not self.contains(key):
            self.buckets[idx].add(key)

    def remove(self, key: int) -> None:
        idx = self._hash(key)
        self.buckets[idx].remove(key)

    def contains(self, key: int) -> bool:
        idx = self._hash(key)
        return self.buckets[idx].contains(key)

Time & Space Complexity

  • Time complexity: O(log(nk))O(\log (\frac{n}{k})) in average case, O(nk)O(\frac{n}{k}) in worst case for each function call.
  • Space complexity: O(k+m)O(k + m)

Where nn is the number of keys, kk is the size of the set (1000010000) and mm is the number of unique keys.


5. Bit Manipulation

Intuition

We can compress the boolean array approach by using individual bits instead of booleans. Each integer stores 32 bits, so we need only about 31251 integers to cover 1000000+ keys. We use bit operations to set, clear, and check individual bits. This reduces memory usage by a factor of 32 compared to a boolean array.

Algorithm

  1. Initialize an integer array of size 31251 (since 31251 * 32 = 1000032 covers all keys).
  2. Define getMask(key) as 1 << (key % 32) to create a bitmask for the key's position within its integer.
  3. For add(key): Set the bit using set[key / 32] |= getMask(key).
  4. For remove(key): If the key exists, toggle the bit using set[key / 32] ^= getMask(key).
  5. For contains(key): Return true if set[key / 32] & getMask(key) is non-zero.
class MyHashSet:

    def __init__(self):
        # key is in the range [1, 1000000]
        # 31251 * 32 = 1000032
        self.set = [0] * 31251

    def add(self, key: int) -> None:
        self.set[key // 32] |= self.getMask(key)

    def remove(self, key: int) -> None:
        if self.contains(key):
            self.set[key // 32] ^= self.getMask(key)

    def contains(self, key: int) -> bool:
        return self.set[key // 32] & self.getMask(key) != 0

    def getMask(self, key: int) -> int:
        return 1 << (key % 32)

Time & Space Complexity

  • Time complexity: O(1)O(1) for each function call.
  • Space complexity: O(k)O(k)

Where kk is the size of the set (31251)(31251).


Common Pitfalls

Adding Duplicate Keys

The add() operation should be idempotent - adding an existing key should have no effect. A common mistake is not checking for duplicates before insertion.

# Wrong - allows duplicates
def add(self, key: int) -> None:
    cur = self.set[key % len(self.set)]
    while cur.next:
        cur = cur.next
    cur.next = ListNode(key)  # Always adds, even if exists!

# Correct - check for existence first
def add(self, key: int) -> None:
    cur = self.set[key % len(self.set)]
    while cur.next:
        if cur.next.key == key:
            return  # Already exists
        cur = cur.next
    cur.next = ListNode(key)

Using XOR for Remove Without Checking Existence

In the bit manipulation approach, using XOR to remove a key that doesn't exist will incorrectly add it instead.

# Wrong - XOR toggles the bit regardless
def remove(self, key: int) -> None:
    self.set[key // 32] ^= self.getMask(key)  # Adds if not present!

# Correct - only toggle if present
def remove(self, key: int) -> None:
    if self.contains(key):
        self.set[key // 32] ^= self.getMask(key)