Prerequisites

Before attempting this problem, you should be comfortable with:

  • Trie (Prefix Tree) - Building and traversing a trie for efficient prefix-based lookups
  • Hash Maps - Storing and retrieving key-value pairs for sentence frequencies
  • Sorting - Custom sorting by multiple criteria (frequency descending, then alphabetically)
  • Heap / Priority Queue - Using heaps to efficiently find top-k elements

1. Trie

Intuition

Autocomplete works by finding sentences that share a common prefix with what the user has typed so far. A trie is ideal for prefix-based lookups. We build a trie where each node stores all sentences that pass through it along with their frequencies. As the user types, we walk down the trie character by character. At any node, we have direct access to all matching sentences and can sort them by frequency (highest first) and then alphabetically to return the top 3 results.

Algorithm

  1. Initialization: Build a trie from the given sentences. For each character in a sentence, create a trie node if needed and store the full sentence with its count at every node along the path. Track the current input, the current trie node, and a dead node for invalid prefixes.
  2. input(c): If c is #, add the completed sentence to the trie, reset the current sentence and node, and return an empty list.
  3. Otherwise, append c to the current sentence. If c is not a child of the current node, move to the dead node and return an empty list.
  4. Move to the child node for c. Retrieve all sentences stored at this node, sort them by frequency descending and then alphabetically, and return the top 3.
  5. addToTrie(sentence, count): Walk through each character in the sentence, creating nodes as needed. At each node, increment the sentence's count.
class TrieNode:
    def __init__(self):
        self.children = {}
        self.sentences = defaultdict(int)

class AutocompleteSystem:
    def __init__(self, sentences: List[str], times: List[int]):
        self.root = TrieNode()
        for sentence, count in zip(sentences, times):
            self.add_to_trie(sentence, count)
            
        self.curr_sentence = []
        self.curr_node = self.root
        self.dead = TrieNode()
        
    def input(self, c: str) -> List[str]:
        if c == "#":
            curr_sentence = "".join(self.curr_sentence)
            self.add_to_trie(curr_sentence, 1)
            self.curr_sentence = []
            self.curr_node = self.root
            return []
        
        self.curr_sentence.append(c)
        if c not in self.curr_node.children:
            self.curr_node = self.dead
            return []
        
        self.curr_node = self.curr_node.children[c]
        sentences = self.curr_node.sentences
        sorted_sentences = sorted(sentences.items(), key = lambda x: (-x[1], x[0]))
        
        ans = []
        for i in range(min(3, len(sorted_sentences))):
            ans.append(sorted_sentences[i][0])
        
        return ans

    def add_to_trie(self, sentence, count):
        node = self.root
        for c in sentence:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]
            node.sentences[sentence] += count

Time & Space Complexity

  • Time complexity: O(nk+m(n+mk)log(n+mk))O(n \cdot k + m \cdot (n + \frac{m}{k}) \cdot \log(n + \frac{m}{k}))

    constructor:

    • We initialize the trie, which costs O(nk)O(n \cdot k) as we iterate over each character in each sentence.

    input:

    • We add a character to currSentence and the trie, both cost O(1)O(1). Next, we fetch and sort the sentences in the current node. Initially, a node could hold O(n)O(n) sentences. After we call input mm times, we could add mk\frac{m}{k} new sentences. Overall, there could be up to O(n+mk)O(n + \frac{m}{k}) sentences, so a sort would cost O((n+mk)log(n+mk))O((n + \frac{m}{k}) \cdot \log(n + \frac{m}{k})).
    • The work done in the other cases (like adding a new sentence to the trie) will be dominated by this sort.
    • input is called mm times, which gives us a total of O(m(n+mk)log(n+mk))O(m \cdot (n + \frac{m}{k}) \cdot \log(n + \frac{m}{k}))
  • Space complexity: O(k(nk+m))O(k \cdot (n \cdot k + m))

Where nn is the length of sentences, kk is the average length of all sentences, and mm is the number of times input is called.


2. Optimize with Heap

Intuition

Sorting all matching sentences every time is expensive when there are many matches. Since we only need the top 3 results, we can use a heap to find them more efficiently. By building a min-heap of size 3, we process each sentence once and keep only the best candidates. In languages with linear-time heapify (like Python's heapq.nsmallest), this gives a performance boost over full sorting.

Algorithm

  1. Initialization: Same as before: build the trie with sentence counts at each node. Store counts as negative values so that a min-heap naturally gives us the highest frequencies.
  2. input(c): If c is #, add the completed sentence to the trie, reset state, and return empty.
  3. Otherwise, append c and navigate the trie. If the character does not exist, go to the dead node and return empty.
  4. At the current node, use a heap to find the top 3 sentences. With negative counts, the smallest values represent the highest frequencies. Extract up to 3 items and return them.
  5. addToTrie(sentence, count): Walk through each character, creating nodes as needed. Subtract (rather than add) the count so that smaller numeric values indicate higher popularity.
class TrieNode:
    def __init__(self):
        self.children = {}
        self.sentences = defaultdict(int)

class AutocompleteSystem:
    def __init__(self, sentences: List[str], times: List[int]):
        self.root = TrieNode()
        for sentence, count in zip(sentences, times):
            self.add_to_trie(sentence, count)
            
        self.curr_sentence = []
        self.curr_node = self.root
        self.dead = TrieNode()
        
    def input(self, c: str) -> List[str]:
        if c == "#":
            curr_sentence = "".join(self.curr_sentence)
            self.add_to_trie(curr_sentence, 1)
            self.curr_sentence = []
            self.curr_node = self.root
            return []
        
        self.curr_sentence.append(c)
        if c not in self.curr_node.children:
            self.curr_node = self.dead
            return []
        
        self.curr_node = self.curr_node.children[c]
        items = [(val, key) for key, val in self.curr_node.sentences.items()]
        ans = heapq.nsmallest(3, items)
        return [item[1] for item in ans]

    def add_to_trie(self, sentence, count):
        node = self.root
        for c in sentence:
            if c not in node.children:
                node.children[c] = TrieNode()
            node = node.children[c]
            node.sentences[sentence] -= count

Time & Space Complexity

This analysis will assume that you have access to a linear time heapify method, like in the Python implementation.

  • Time complexity: O(nk+m(n+mk))O(n \cdot k + m \cdot (n + \frac{m}{k}))

    constructor:

    • We initialize the trie, which costs O(nk)O(n \cdot k) as we iterate over each character in each sentence.

    input:

    • We add a character to currSentence and the trie, both cost O(1)O(1). Next, we fetch the sentences in the current node. Initially, a node could hold O(n)O(n) sentences. After we call input mm times, we could add mk\frac{m}{k} new sentences. Overall, there could be up to O(n+mk)O(n + \frac{m}{k}) sentences. We heapify these sentences and find the best 3 in linear time, which costs O(n+mk)O(n + \frac{m}{k}).
    • The work done in the other cases (like adding a new sentence to the trie) will be dominated by this.
    • input is called mm times, which gives us a total of O(m(n+mk))O(m \cdot (n + \frac{m}{k})).
  • Space complexity: O(k(nk+m))O(k \cdot (n \cdot k + m))

Where nn is the length of sentences, kk is the average length of all sentences, and mm is the number of times input is called.

Common Pitfalls

Forgetting to Reset State on '#'

When the user presses #, you must add the completed sentence to the trie, reset the current sentence, and return the trie pointer to root. Missing any of these steps causes incorrect behavior on subsequent inputs.

# Wrong: forgets to reset current node
if c == "#":
    self.add_to_trie(curr_sentence, 1)
    self.curr_sentence = []
    return []  # Missing: self.curr_node = self.root

# Correct
if c == "#":
    self.add_to_trie("".join(self.curr_sentence), 1)
    self.curr_sentence = []
    self.curr_node = self.root
    return []

Not Handling Non-Existent Prefix Path

When a character doesn't exist in the current node's children, the system should return empty results for all subsequent characters until # is pressed. Using a "dead" node pattern handles this cleanly.

# Wrong: crashes on missing child
self.curr_node = self.curr_node.children[c]  # KeyError

# Correct: use dead node for invalid paths
if c not in self.curr_node.children:
    self.curr_node = self.dead  # dead is empty TrieNode
    return []
self.curr_node = self.curr_node.children[c]

Wrong Sorting Order for Results

Results must be sorted by frequency (highest first), then alphabetically for ties. Reversing this order or sorting in the wrong direction gives incorrect top-3 results.

# Wrong: sorts alphabetically first, then by frequency
sorted_sentences = sorted(sentences.items(), key=lambda x: (x[0], -x[1]))

# Correct: frequency descending, then alphabetically
sorted_sentences = sorted(sentences.items(), key=lambda x: (-x[1], x[0]))

Not Storing Sentence at Every Node Along Path

Each trie node must store all sentences that pass through it, not just at the leaf. Otherwise, prefix queries won't find matching sentences.

# Wrong: only stores at final node
def add_to_trie(self, sentence, count):
    node = self.root
    for c in sentence:
        if c not in node.children:
            node.children[c] = TrieNode()
        node = node.children[c]
    node.sentences[sentence] += count  # Only at end

# Correct: store at every node along path
def add_to_trie(self, sentence, count):
    node = self.root
    for c in sentence:
        if c not in node.children:
            node.children[c] = TrieNode()
        node = node.children[c]
        node.sentences[sentence] += count  # At every node

Returning More Than 3 Results

The problem requires returning at most 3 results. Forgetting to limit the output or using the wrong slice gives incorrect answers.

# Wrong: returns all matching sentences
return [s[0] for s in sorted_sentences]

# Correct: return at most 3
return [s[0] for s in sorted_sentences[:3]]