Design a HashSet without using any built-in hash table libraries.
Implement MyHashSet class:
void add(key) Inserts the value key into the HashSet.bool contains(key) Returns whether the value key exists in the HashSet or not.void remove(key) Removes the value key in the HashSet. If key does not exist in the HashSet, do nothing.Example 1:
Input: ["MyHashSet", "add", "add", "contains", "contains", "add", "contains", "remove", "contains"]
[[], [1], [2], [1], [3], [2], [2], [2], [2]]
Output: [null, null, null, true, false, null, true, null, false]Explanation:
MyHashSet myHashSet = new MyHashSet();
myHashSet.add(1); // set = [1]
myHashSet.add(2); // set = [1, 2]
myHashSet.contains(1); // return True
myHashSet.contains(3); // return False, (not found)
myHashSet.add(2); // set = [1, 2]
myHashSet.contains(2); // return True
myHashSet.remove(2); // set = [1]
myHashSet.contains(2); // return False, (already removed)
Constraints:
0 <= key <= 1,000,00010,000 calls will be made to add, remove, and contains.The simplest implementation uses a dynamic array to store all keys. For each operation, we search through the array linearly. This works correctly but is inefficient since every operation requires scanning potentially all stored elements.
data.add(key): If the key is not already in the array, append it.remove(key): If the key exists in the array, remove it.contains(key): Return true if the key exists in the array.class MyHashSet:
def __init__(self):
self.data = []
def add(self, key: int) -> None:
if key not in self.data:
self.data.append(key)
def remove(self, key: int) -> None:
if key in self.data:
self.data.remove(key)
def contains(self, key: int) -> bool:
return key in self.dataSince keys are constrained to [0, 1000000], we can use direct addressing with a boolean array. The index represents the key, and the boolean value indicates presence. This provides O(1) operations but uses fixed memory regardless of how many keys are stored.
1000001, all set to false.add(key): Set data[key] = true.remove(key): Set data[key] = false.contains(key): Return data[key].To reduce memory while handling collisions, we use separate chaining. An array of buckets stores linked lists, and keys are assigned to buckets using a hash function. Each operation traverses only the linked list in the relevant bucket, making average-case operations faster than the brute force approach.
10000 buckets, each with a dummy head node.hash(key) as key % 10000.add(key): Traverse the list at hash(key). If the key already exists, return. Otherwise, append a new node with the key.remove(key): Traverse the list at hash(key). If a node with the matching key is found, remove it by updating the previous node's next pointer.contains(key): Traverse the list at hash(key). Return true if the key is found, false otherwise.class ListNode:
def __init__(self, key: int):
self.key = key
self.next = None
class MyHashSet:
def __init__(self):
self.set = [ListNode(0) for _ in range(10**4)]
def add(self, key: int) -> None:
cur = self.set[key % len(self.set)]
while cur.next:
if cur.next.key == key:
return
cur = cur.next
cur.next = ListNode(key)
def remove(self, key: int) -> None:
cur = self.set[key % len(self.set)]
while cur.next:
if cur.next.key == key:
cur.next = cur.next.next
return
cur = cur.next
def contains(self, key: int) -> bool:
cur = self.set[key % len(self.set)]
while cur.next:
if cur.next.key == key:
return True
cur = cur.next
return FalseWhere is the number of keys, is the size of the set () and is the number of unique keys.
Instead of linked lists for collision handling, we can use binary search trees (BSTs) in each bucket. This improves the worst-case time complexity from O(n/k) to O(log(n/k)) for each bucket, since BST operations are logarithmic in the number of nodes. The tradeoff is slightly more complex implementation.
10000 buckets, each containing an empty BST.hash(key) as key % 10000.add(key): If the key is not already in the BST at hash(key), insert it using standard BST insertion.remove(key): Delete the key from the BST at hash(key) using standard BST deletion (finding in-order successor when needed).contains(key): Search the BST at hash(key) and return true if the key is found.class TreeNode:
def __init__(self, key):
self.key = key
self.left = None
self.right = None
class BST:
def __init__(self):
self.root = None
def insert(self, root, key):
if not root:
return TreeNode(key)
if key < root.key:
root.left = self.insert(root.left, key)
elif key > root.key:
root.right = self.insert(root.right, key)
return root
def delete(self, root, key):
if not root:
return None
if key < root.key:
root.left = self.delete(root.left, key)
elif key > root.key:
root.right = self.delete(root.right, key)
else:
if not root.left:
return root.right
if not root.right:
return root.left
temp = self.minValueNode(root.right)
root.key = temp.key
root.right = self.delete(root.right, temp.key)
return root
def minValueNode(self, root):
while root.left:
root = root.left
return root
def search(self, root, key):
if not root:
return False
if key == root.key:
return True
elif key < root.key:
return self.search(root.left, key)
else:
return self.search(root.right, key)
def add(self, key):
self.root = self.insert(self.root, key)
def remove(self, key):
self.root = self.delete(self.root, key)
def contains(self, key):
return self.search(self.root, key)
class MyHashSet:
def __init__(self):
self.size = 10000
self.buckets = [BST() for _ in range(self.size)]
def _hash(self, key):
return key % self.size
def add(self, key: int) -> None:
idx = self._hash(key)
if not self.contains(key):
self.buckets[idx].add(key)
def remove(self, key: int) -> None:
idx = self._hash(key)
self.buckets[idx].remove(key)
def contains(self, key: int) -> bool:
idx = self._hash(key)
return self.buckets[idx].contains(key)Where is the number of keys, is the size of the set () and is the number of unique keys.
We can compress the boolean array approach by using individual bits instead of booleans. Each integer stores 32 bits, so we need only about 31251 integers to cover 1000000+ keys. We use bit operations to set, clear, and check individual bits. This reduces memory usage by a factor of 32 compared to a boolean array.
31251 (since 31251 * 32 = 1000032 covers all keys).getMask(key) as 1 << (key % 32) to create a bitmask for the key's position within its integer.add(key): Set the bit using set[key / 32] |= getMask(key).remove(key): If the key exists, toggle the bit using set[key / 32] ^= getMask(key).contains(key): Return true if set[key / 32] & getMask(key) is non-zero.class MyHashSet:
def __init__(self):
# key is in the range [1, 1000000]
# 31251 * 32 = 1000032
self.set = [0] * 31251
def add(self, key: int) -> None:
self.set[key // 32] |= self.getMask(key)
def remove(self, key: int) -> None:
if self.contains(key):
self.set[key // 32] ^= self.getMask(key)
def contains(self, key: int) -> bool:
return self.set[key // 32] & self.getMask(key) != 0
def getMask(self, key: int) -> int:
return 1 << (key % 32)Where is the size of the set .
The add() operation should be idempotent - adding an existing key should have no effect. A common mistake is not checking for duplicates before insertion.
# Wrong - allows duplicates
def add(self, key: int) -> None:
cur = self.set[key % len(self.set)]
while cur.next:
cur = cur.next
cur.next = ListNode(key) # Always adds, even if exists!
# Correct - check for existence first
def add(self, key: int) -> None:
cur = self.set[key % len(self.set)]
while cur.next:
if cur.next.key == key:
return # Already exists
cur = cur.next
cur.next = ListNode(key)In the bit manipulation approach, using XOR to remove a key that doesn't exist will incorrectly add it instead.
# Wrong - XOR toggles the bit regardless
def remove(self, key: int) -> None:
self.set[key // 32] ^= self.getMask(key) # Adds if not present!
# Correct - only toggle if present
def remove(self, key: int) -> None:
if self.contains(key):
self.set[key // 32] ^= self.getMask(key)