472. Concatenated Words - Explanation

Problem Link

Description

You are given an array of strings words (without duplicates), return all the concatenated words in the given list of words in any order.

A concatenated word is defined as a string that is comprised entirely of at least two shorter words (not necessarily distinct) in the given array.

Example 1:

Input: words = ["cat","cats","catsdogcats","dog","dogcatsdog","hippopotamuses","rat","ratcatdogcat"]

Output: ["catsdogcats","dogcatsdog","ratcatdogcat"]

Explanation: "catsdogcats" can be concatenated by "cats", "dog" and "cats";
"dogcatsdog" can be concatenated by "dog", "cats" and "dog";
"ratcatdogcat" can be concatenated by "rat", "cat", "dog" and "cat".

Example 2:

Input: words = ["cat","dog","catdog"]

Output: ["catdog"]

Constraints:

  • 1 <= words.length <= 10,000
  • 1 <= words[i].length <= 30
  • words[i] consists of only lowercase English letters.
  • All the strings of words are unique.
  • 1 <= sum(words[i].length) <= 100,000


Topics

Company Tags

Please upgrade to NeetCode Pro to view company tags.



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Hash Sets - Using sets for O(1) lookups to check if a word exists in the dictionary
  • Recursion - Breaking down strings into prefix/suffix and recursively checking validity
  • Dynamic Programming - Using memoization to avoid recomputing results for the same substrings
  • String Manipulation - Splitting strings into substrings and concatenating words

1. Brute Force (Backtracking)

Intuition

We can generate all possible concatenations of words and check if any concatenation exists in our word set. By building concatenations through backtracking and checking membership, we find words that are formed by joining two or more shorter words.

Algorithm

  1. Build a hash set from all words and find the maximum word length.
  2. Use backtracking to generate all possible concatenations of words.
  3. When a concatenation has more than one word and exists in the word set, add it to results and remove it from the set to avoid duplicates.
  4. Prune branches where the current length exceeds the maximum word length.
  5. Return all found concatenated words.
class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        n = len(words)
        res = []
        wordSet = set(words)
        maxLen = 0
        for w in words:
            maxLen = max(maxLen, len(w))

        def dfs(concatWord, totLen):
            if len(concatWord) > 1:
                word = "".join(concatWord)
                if word in wordSet:
                    res.append(word)
                    wordSet.remove(word)

            for i in range(len(words)):
                if totLen + len(words[i]) > maxLen:
                    continue
                concatWord.append(words[i])
                dfs(concatWord, totLen + len(words[i]))
                concatWord.pop()

        dfs([], 0)
        return res

Time & Space Complexity

  • Time complexity: O(mnn)O(m * n ^ n)
  • Space complexity: O(mn)O(m * n)

Where nn is the size of the string array wordswords and mm is the length of the longest word in the array.


2. Recursion

Intuition

For each word, we check if it can be split into parts where each part exists in the word set. We try every possible prefix; if a prefix is in the set, we recursively check if the remaining suffix can also be decomposed (or is itself in the set).

Algorithm

  1. Build a hash set from all words for O(1) lookup.
  2. For each word, define a recursive function that tries to split it.
  3. At each position, check all possible prefixes. If a prefix exists in the set, check if the suffix also exists or can be recursively split.
  4. If the entire word can be decomposed into at least two parts from the set, add it to results.
  5. Return all concatenated words found.
class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        wordSet = set(words)

        def dfs(word):
            for i in range(1, len(word)):
                prefix, suffix = word[:i], word[i:]
                if ((prefix in wordSet and suffix in wordSet) or
                    (prefix in wordSet and dfs(suffix))
                ):
                    return True
            return False

        res = []
        for w in words:
            if dfs(w):
                res.append(w)
        return res

Time & Space Complexity

  • Time complexity: O(nm4)O(n * m ^ 4)
  • Space complexity: O(nm)O(n * m)

Where nn is the size of the string array wordswords and mm is the length of the longest word in the array.


3. Dynamic Programming (Top-Down)

Intuition

The recursive approach has overlapping subproblems since the same suffix may be checked multiple times. By memoizing results for each suffix, we avoid recomputing whether a substring can be decomposed.

Algorithm

  1. Build a hash set from all words and create a memoization dictionary.
  2. For each word, use a recursive function with memoization to check if it can be split.
  3. Before computing, check if the result for the current word exists in memo.
  4. Try all prefixes; if a prefix is in the word set and the suffix can be decomposed, cache and return true.
  5. Cache false results as well to avoid recomputation. Add words that return true to the result.
class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        wordSet = set(words)
        dp = {}

        def dfs(word):
            if word in dp:
                return dp[word]

            for i in range(1, len(word)):
                prefix, suffix = word[:i], word[i:]
                if ((prefix in wordSet and suffix in wordSet) or
                    (prefix in wordSet and dfs(suffix))
                ):
                    dp[word] = True
                    return True
            dp[word] = False
            return False

        res = []
        for w in words:
            if dfs(w):
                res.append(w)
        return res

Time & Space Complexity

  • Time complexity: O(nm3)O(n * m ^ 3)
  • Space complexity: O(nm)O(n * m)

Where nn is the size of the string array wordswords and mm is the length of the longest word in the array.


4. Dynamic Programming (Bottom-Up)

Intuition

For each word, we use a DP array where dp[i] indicates whether the substring from index 0 to i can be formed by concatenating words from the set. We build this array iteratively by checking all possible split points.

Algorithm

  1. Build a hash set from all words.
  2. For each word, create a boolean DP array of size m+1 with dp[0] = true.
  3. For each position i from 1 to m, check all previous positions j. If dp[j] is true and the substring from j to i exists in the set, set dp[i] = true.
  4. Skip the case where j = 0 and i = m to ensure the word is not just itself.
  5. If dp[m] is true, the word is concatenated; add it to results.
class Solution:
    def findAllConcatenatedWordsInADict(self, words: List[str]) -> List[str]:
        wordSet = set(words)
        res = []

        for word in words:
            m = len(word)

            dp = [False] * (m + 1)
            dp[0] = True

            for i in range(1, m + 1):
                for j in range(i):
                    if j == 0 and i == m:
                        continue
                    if dp[j] and word[j:i] in wordSet:
                        dp[i] = True
                        break

            if dp[m]:
                res.append(word)

        return res

Time & Space Complexity

  • Time complexity: O(nm3)O(n * m ^ 3)
  • Space complexity: O(nm)O(n * m)

Where nn is the size of the string array wordswords and mm is the length of the longest word in the array.


Common Pitfalls

Counting a Word as Its Own Concatenation

A concatenated word must be formed by at least two other words. If you check whether the entire word exists in the set without excluding it, every word would trivially match itself.

# Wrong: allows word to match itself
if word[j:i] in wordSet:
    dp[i] = True

# Correct: skip the case where the entire word is checked
if j == 0 and i == m:
    continue

Not Requiring Multiple Words

The problem requires concatenation of two or more words. Checking only that the word can be split into valid parts without ensuring at least two parts will incorrectly include single words that exist in the dictionary.

Missing Memoization Leading to TLE

Without memoization, the same suffix may be checked exponentially many times. For long words with many valid prefixes, this causes timeout.

# Wrong: no caching
def dfs(word):
    for i in range(1, len(word)):
        if prefix in wordSet and dfs(suffix):
            return True

# Correct: cache results
if word in dp:
    return dp[word]

Incorrect Substring Indices

Off-by-one errors when splitting the word into prefix and suffix are common. The prefix should be word[0:i] and suffix should be word[i:], where i ranges from 1 to len(word)-1.

Forgetting Empty String Edge Case

If the word list contains an empty string, it could match infinitely. Most solutions implicitly handle this since the loop starts at i=1, but explicitly filtering empty strings from the word set is safer.