2707. Extra Characters in a String - Explanation

Problem Link

Description

You are given a string s and a dictionary of words dictionary. You have to break s into one or more non-overlapping substrings such that each substring is present in dictionary. There may be some extra characters in s which are not present in any of the substrings.

Return the minimum number of extra characters left over if you break up s optimally.

Note that the same word in the dictionary may be reused multiple times.

Example 1:

Input: s = "neetcodes", dictionary = ["neet","code","neetcode"]

Output: 1

Explanation: The optimal way is to break s into two substrings: "neet" from index 0 to 3 and "code" from index 4 to 7. There is one character which is at index 8.

Example 2:

Input: s = "neetcodde", dictionary = ["neet","code","neetcode"]

Output: 5

Explanation: The optimal way is to break s into one substring: "neet" from index 0 to 3. The characters at indices from 4 to 8 are extra.

Constraints:

  • 1 <= s.length <= 50
  • 1 <= dictionary.length <= 50
  • 1 <= dictionary[i].length <= 50
  • s and dictionary[i] consist of only lowercase English letters
  • dictionary contains distinct words


Topics

Company Tags

Please upgrade to NeetCode Pro to view company tags.



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Dynamic Programming - Both top-down (memoization) and bottom-up approaches for optimization problems
  • Hash Sets - Efficient O(1) lookup for dictionary word matching
  • String Manipulation - Substring extraction and comparison
  • Trie Data Structure - Prefix tree for efficient word matching (used in optimized solutions)
  • Recursion with Memoization - Avoiding redundant computation in recursive solutions

1. Recursion

Intuition

At each position in the string, we have two choices: either skip the current character (counting it as extra), or try to match a dictionary word starting at this position. If a word matches, we can jump past it without counting those characters as extra.

This recursive approach explores all possibilities. For each index, we take the minimum of skipping one character versus matching any dictionary word that starts at that index.

Algorithm

  1. Convert the dictionary to a set for O(1) lookups.
  2. Define a recursive function dfs(i) that returns the minimum extra characters from index i to the end:
    • If i == len(s), return 0 (no characters left).
    • Start with res = 1 + dfs(i + 1) (skip current character).
    • For each ending index j from i to len(s) - 1:
      • If s[i:j+1] is in the dictionary, update res = min(res, dfs(j + 1)).
    • Return res.
  3. Return dfs(0).
class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        words = set(dictionary)

        def dfs(i):
            if i == len(s):
                return 0

            res = 1 + dfs(i + 1)
            for j in range(i, len(s)):
                if s[i:j + 1] in words:
                    res = min(res, dfs(j + 1))

            return res

        return dfs(0)

Time & Space Complexity

  • Time complexity: O(n∗2n+m∗k)O(n * 2 ^ n + m * k)
  • Space complexity: O(n+m∗k)O(n + m * k)

Where nn is the length of the string ss, mm is the number of words in the dictionary, and kk is the average length of a word in the dictionary.


2. Dynamic Programming (Top-Down) Using Hash Set

Intuition

The recursive solution has overlapping subproblems since we may compute the answer for the same index multiple times. Memoization stores results so each subproblem is solved only once.

Using a hash set for the dictionary allows efficient substring lookups. The memoization table dp[i] stores the minimum extra characters from index i onward.

Algorithm

  1. Convert the dictionary to a set for O(1) lookups.
  2. Initialize dp with dp[len(s)] = 0 as the base case.
  3. Define dfs(i):
    • If i is already in dp, return dp[i].
    • Start with res = 1 + dfs(i + 1).
    • For each j from i to len(s) - 1, if s[i:j+1] is in the set, update res = min(res, dfs(j + 1)).
    • Store and return dp[i] = res.
  4. Return dfs(0).
class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        words = set(dictionary)
        dp = {len(s): 0}

        def dfs(i):
            if i in dp:
                return dp[i]
            res = 1 + dfs(i + 1)
            for j in range(i, len(s)):
                if s[i:j + 1] in words:
                    res = min(res, dfs(j + 1))
            dp[i] = res
            return res

        return dfs(0)

Time & Space Complexity

  • Time complexity: O(n3+m∗k)O(n ^ 3 + m * k)
  • Space complexity: O(n+m∗k)O(n + m * k)

Where nn is the length of the string ss, mm is the number of words in the dictionary, and kk is the average length of a word in the dictionary.


3. Dynamic Programming (Bottom-Up) Using Hash Set

Intuition

Instead of recursion with memoization, we can fill the DP table iteratively from right to left. dp[i] represents the minimum extra characters when starting from index i. Since dp[i] depends on dp[j+1] for j >= i, processing from right to left ensures all dependencies are resolved.

Algorithm

  1. Convert the dictionary to a set.
  2. Create an array dp of size n + 1, initialized to 0.
  3. For i from n - 1 down to 0:
    • Set dp[i] = 1 + dp[i + 1] (skip current character).
    • For each j from i to n - 1, if s[i:j+1] is in the set, update dp[i] = min(dp[i], dp[j + 1]).
  4. Return dp[0].
class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        words = set(dictionary)
        n = len(s)
        dp = [0] * (n + 1)

        for i in range(n - 1, -1, -1):
            dp[i] = 1 + dp[i + 1]
            for j in range(i, n):
                if s[i:j + 1] in words:
                    dp[i] = min(dp[i], dp[j + 1])
        return dp[0]

Time & Space Complexity

  • Time complexity: O(n3+m∗k)O(n ^ 3 + m * k)
  • Space complexity: O(n+m∗k)O(n + m * k)

Where nn is the length of the string ss, mm is the number of words in the dictionary, and kk is the average length of a word in the dictionary.


4. Dynamic Programming (Top-Down)

Intuition

Instead of checking all substrings against a hash set, we can iterate through dictionary words directly. For each position, we check if any dictionary word matches starting at that position. This can be faster when the dictionary is small relative to the string length.

Algorithm

  1. Initialize dp with dp[len(s)] = 0.
  2. Define dfs(i):
    • If i is in dp, return dp[i].
    • Start with res = 1 + dfs(i + 1).
    • For each word in the dictionary:
      • If i + len(word) <= len(s) and s[i:i+len(word)] == word, update res = min(res, dfs(i + len(word))).
    • Store and return dp[i] = res.
  3. Return dfs(0).
class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        dp = {len(s) : 0}

        def dfs(i):
            if i in dp:
                return dp[i]

            res = 1 + dfs(i + 1)
            for word in dictionary:
                if i + len(word) > len(s):
                    continue

                flag = True
                for j in range(len(word)):
                    if s[i + j] != word[j]:
                        flag = False
                        break
                if flag:
                    res = min(res, dfs(i + len(word)))

            dp[i] = res
            return res

        return dfs(0)

Time & Space Complexity

  • Time complexity: O(n∗m∗k)O(n * m * k)
  • Space complexity: O(n)O(n)

Where nn is the length of the string ss, mm is the number of words in the dictionary, and kk is the average length of a word in the dictionary.


5. Dynamic Programming (Bottom-Up)

Intuition

This is the iterative version of the previous approach, iterating through dictionary words at each position rather than checking all possible substrings. It avoids recursion overhead while maintaining the same logic.

Algorithm

  1. Create an array dp of size n + 1, initialized to 0.
  2. For i from n - 1 down to 0:
    • Set dp[i] = 1 + dp[i + 1].
    • For each word in the dictionary:
      • If i + len(word) <= n and s[i:i+len(word)] == word, update dp[i] = min(dp[i], dp[i + len(word)]).
  3. Return dp[0].
class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        n = len(s)
        dp = [0] * (n + 1)

        for i in range(n - 1, -1, -1):
            dp[i] = 1 + dp[i + 1]
            for word in dictionary:
                if i + len(word) <= n and s[i:i + len(word)] == word:
                    dp[i] = min(dp[i], dp[i + len(word)])

        return dp[0]

Time & Space Complexity

  • Time complexity: O(n∗m∗k)O(n * m * k)
  • Space complexity: O(n)O(n)

Where nn is the length of the string ss, mm is the number of words in the dictionary, and kk is the average length of a word in the dictionary.


6. Dynamic Programming (Top-Down) Using Trie

Intuition

A Trie (prefix tree) allows efficient prefix matching. Instead of checking each dictionary word independently, we traverse the Trie character by character. If we hit a dead end (no child for the current character), we stop early. This is faster than checking each word when there are many dictionary words sharing common prefixes.

Algorithm

  1. Build a Trie from all dictionary words.
  2. Initialize dp with dp[len(s)] = 0.
  3. Define dfs(i):
    • If i is in dp, return dp[i].
    • Start with res = 1 + dfs(i + 1).
    • Traverse the Trie starting from the root:
      • For j from i to len(s) - 1:
        • If s[j] is not a child of the current node, break.
        • Move to the child node.
        • If this node marks the end of a word, update res = min(res, dfs(j + 1)).
    • Store and return dp[i] = res.
  4. Return dfs(0).
class TrieNode:
    def __init__(self):
        self.children = {}
        self.isWord = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def addWord(self, word):
        curr = self.root
        for c in word:
            if c not in curr.children:
                curr.children[c] = TrieNode()
            curr = curr.children[c]
        curr.isWord = True

class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        trie = Trie()
        for w in dictionary:
            trie.addWord(w)

        dp = {len(s): 0}

        def dfs(i):
            if i in dp:
                return dp[i]
            res = 1 + dfs(i + 1)
            curr = trie.root
            for j in range(i, len(s)):
                if s[j] not in curr.children:
                    break
                curr = curr.children[s[j]]
                if curr.isWord:
                    res = min(res, dfs(j + 1))

            dp[i] = res
            return res

        return dfs(0)

Time & Space Complexity

  • Time complexity: O(n2+m∗k)O(n ^ 2 + m * k)
  • Space complexity: O(n+m∗k)O(n + m * k)

Where nn is the length of the string ss, mm is the number of words in the dictionary, and kk is the average length of a word in the dictionary.


7. Dynamic Programming (Bottom-Up) Using Trie

Intuition

This combines the bottom-up DP approach with Trie-based matching. We iterate from right to left, and at each position, we traverse the Trie to find all dictionary words that start at that position. The Trie allows early termination when no dictionary word can match the current prefix.

Algorithm

  1. Build a Trie from all dictionary words.
  2. Create an array dp of size n + 1, initialized to 0.
  3. For i from n - 1 down to 0:
    • Set dp[i] = 1 + dp[i + 1].
    • Start at the Trie root and traverse:
      • For j from i to n - 1:
        • If s[j] is not a child, break.
        • Move to the child node.
        • If this node marks a word end, update dp[i] = min(dp[i], dp[j + 1]).
  4. Return dp[0].
class TrieNode:
    def __init__(self):
        self.children = {}
        self.isWord = False

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def addWord(self, word):
        curr = self.root
        for c in word:
            if c not in curr.children:
                curr.children[c] = TrieNode()
            curr = curr.children[c]
        curr.isWord = True

class Solution:
    def minExtraChar(self, s: str, dictionary: List[str]) -> int:
        trie = Trie()
        for w in dictionary:
            trie.addWord(w)

        n = len(s)
        dp = [0] * (n + 1)

        for i in range(n - 1, -1, -1):
            dp[i] = 1 + dp[i + 1]
            curr = trie.root
            for j in range(i, n):
                if s[j] not in curr.children:
                    break
                curr = curr.children[s[j]]
                if curr.isWord:
                    dp[i] = min(dp[i], dp[j + 1])

        return dp[0]

Time & Space Complexity

  • Time complexity: O(n2+m∗k)O(n ^ 2 + m * k)
  • Space complexity: O(n+m∗k)O(n + m * k)

Where nn is the length of the string ss, mm is the number of words in the dictionary, and kk is the average length of a word in the dictionary.


Common Pitfalls

Off-By-One Errors in Substring Boundaries

When checking if s[i:j+1] matches a dictionary word, getting the indices wrong is a common mistake. In Python, s[i:j] excludes index j, so you need s[i:j+1] to include the character at position j. Similarly, when jumping to the next position after a match, you should move to j+1, not j.

Not Considering the Skip-Character Option

At each position, you must always consider the option of skipping the current character (counting it as extra) with cost 1 + dp[i+1]. If you only check dictionary matches without this fallback, positions where no dictionary word starts will cause incorrect results or infinite loops.

Inefficient Substring Matching Without Memoization or Trie

The naive approach of checking all possible substrings against the dictionary for every position leads to exponential time complexity. Without memoization, the same subproblems are solved repeatedly. For large inputs, either use dynamic programming with memoization or build a Trie to efficiently find all dictionary words starting at each position.