1408. String Matching in an Array - Explanation

Problem Link

Description

You are given an array of string words, return all strings in words that are a substring of another word. You can return the answer in any order.

Note: A substring is a contiguous non-empty sequence of characters within a string.

Example 1:

Input: words = ["mass","as","hero","superhero"]

Output: ["as","hero"]

Explanation: "as" is substring of "mass" and "hero" is substring of "superhero". ["hero","as"] is also a valid answer.

Example 2:

Input: words = ["neetcode","neeet","neet","code"]

Output: ["neet","code"]

Explanation: "neet" and "code" are substrings of "neetcode".

Example 3:

Input: words = ["blue","green","bu"]

Output: []

Explanation: No string of words is substring of another string.

Constraints:

  • 1 <= words.length <= 100
  • 1 <= words[i].length <= 30
  • words[i] contains only lowercase English letters.
  • All the strings of words are unique.


Topics

Company Tags

Please upgrade to NeetCode Pro to view company tags.



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Substring Search - Checking if one string appears within another using built-in methods or algorithms
  • Sorting - Ordering elements by a custom criterion (e.g., string length) to optimize comparisons
  • String Matching Algorithms - KMP, Rabin-Karp, or Z-Algorithm for efficient pattern matching (for advanced solutions)
  • Trie Data Structure - Building a suffix trie to efficiently find substring relationships (for optimal solution)

1. Brute Force

Intuition

For each word, we check if it appears as a substring in any other word. If it does, we add it to our result. Since we need to check every word against every other word, we use two nested loops. Once we find that a word is a substring of another, we can stop checking and move to the next word.

Algorithm

  1. Initialize an empty result list.
  2. For each word at index i, iterate through all words at index j where j != i.
  3. Check if words[i] is a substring of words[j] using the built-in substring check.
  4. If found, add words[i] to the result and break the inner loop.
  5. Return the result list.
class Solution:
    def stringMatching(self, words: List[str]) -> List[str]:
        res = []

        for i in range(len(words)):
            for j in range(len(words)):
                if i == j:
                    continue

                if words[i] in words[j]:
                    res.append(words[i])
                    break

        return res

Time & Space Complexity

  • Time complexity: O(n2m2)O(n ^ 2 * m ^ 2)
  • Space complexity:
    • O(1)O(1) extra space.
    • O(nm)O(n * m) space for the output list.

Where nn is the number of words, and mm is the length of the longest word.


2. Sorting

Intuition

A shorter word can only be a substring of a longer word, never the other way around. By sorting words by length, we only need to check each word against longer words that come after it. This gives a minor optimization by reducing unnecessary comparisons, though the worst-case complexity remains the same.

Algorithm

  1. Sort the words array by length in ascending order.
  2. For each word at index i, check if it's a substring of any word at index j where j > i.
  3. If found, add the word to the result and break the inner loop.
  4. Return the result list.
class Solution:
    def stringMatching(self, words: List[str]) -> List[str]:
        res = []
        words.sort(key=len)

        for i in range(len(words)):
            for j in range(i + 1, len(words)):
                if words[i] in words[j]:
                    res.append(words[i])
                    break

        return res

Time & Space Complexity

  • Time complexity: O(n2m2)O(n ^ 2 * m ^ 2)
  • Space complexity:
    • O(1)O(1) or O(n)O(n) depending on the sorting algorithm.
    • O(nm)O(n * m) space for the output list.

Where nn is the number of words, and mm is the length of the longest word.


3. Knuth-Morris-Pratt (KMP) Algorithm

Intuition

Instead of using the built-in substring search, we can use the KMP algorithm which preprocesses the pattern to enable efficient matching. KMP builds a "longest proper prefix which is also suffix" (LPS) array that allows us to skip characters during mismatches rather than starting over. This improves substring matching to linear time in the combined length of the strings.

Algorithm

  1. Implement KMP: build the LPS array for the pattern, then scan the text using the LPS to avoid redundant comparisons.
  2. Sort words by length.
  3. For each word at index i, use KMP to check if it's a substring of any word at index j > i.
  4. If KMP returns a valid index (not -1), add the word to the result and break.
  5. Return the result list.
class Solution:
    def stringMatching(self, words: List[str]) -> List[str]:
        def kmp(word1: str, word2: str) -> int:
            lps = [0] * len(word2)
            prevLPS, i = 0, 1

            while i < len(word2):
                if word2[i] == word2[prevLPS]:
                    lps[i] = prevLPS + 1
                    prevLPS += 1
                    i += 1
                elif prevLPS == 0:
                    lps[i] = 0
                    i += 1
                else:
                    prevLPS = lps[prevLPS - 1]

            i = j = 0
            while i < len(word1):
                if word1[i] == word2[j]:
                    i += 1
                    j += 1
                else:
                    if j == 0:
                        i += 1
                    else:
                        j = lps[j - 1]

                if j == len(word2):
                    return i - len(word2)

            return -1

        res = []
        words.sort(key=len)

        for i in range(len(words)):
            for j in range(i + 1, len(words)):
                if kmp(words[j], words[i]) != -1:
                    res.append(words[i])
                    break

        return res

Time & Space Complexity

  • Time complexity: O(n2m)O(n ^ 2 * m)
  • Space complexity:
    • O(m)O(m) extra space.
    • O(1)O(1) or O(n)O(n) space depending on the sorting algorithm.
    • O(nm)O(n * m) space for the output list.

Where nn is the number of words, and mm is the length of the longest word.


4. Rabin-Karp Algorithm (Rolling Hash)

Intuition

Rabin-Karp uses hashing to speed up substring matching. We compute a hash of the pattern and then slide a window over the text, computing the hash of each window using a rolling hash technique. If the hashes match, we have a potential match (with possible false positives). Using double hashing with two different bases and moduli reduces false positive probability to near zero.

Algorithm

  1. Implement Rabin-Karp: compute the hash of the pattern, precompute the power values for the window size, then slide over the text updating the hash in O(1) per step.
  2. Use two independent hash functions to minimize false positives.
  3. Sort words by length.
  4. For each word at index i, use Rabin-Karp to check if it's a substring of any word at index j > i.
  5. If a match is found, add the word to the result and break.
  6. Return the result list.
class Solution:
    def stringMatching(self, words: List[str]) -> List[str]:
        def rabinKarp(word1: str, word2: str) -> int:
            base1, mod1 = 31, 768258391
            base2, mod2 = 37, 685683731
            n, m = len(word1), len(word2)

            power1, power2 = 1, 1
            for _ in range(m):
                power1 = (power1 * base1) % mod1
                power2 = (power2 * base2) % mod2

            word1_hash1 = word1_hash2 = 0
            word2_hash1 = word2_hash2 = 0

            for i in range(m):
                word1_hash1 = (word1_hash1 * base1 + ord(word2[i])) % mod1
                word1_hash2 = (word1_hash2 * base2 + ord(word2[i])) % mod2
                word2_hash1 = (word2_hash1 * base1 + ord(word1[i])) % mod1
                word2_hash2 = (word2_hash2 * base2 + ord(word1[i])) % mod2

            for i in range(n - m + 1):
                if word2_hash1 == word1_hash1 and word2_hash2 == word1_hash2:
                    return i

                if i + m < n:
                    word2_hash1 = (word2_hash1 * base1 - ord(word1[i]) * power1 + ord(word1[i + m])) % mod1
                    word2_hash2 = (word2_hash2 * base2 - ord(word1[i]) * power2 + ord(word1[i + m])) % mod2

                    word2_hash1 = (word2_hash1 + mod1) % mod1
                    word2_hash2 = (word2_hash2 + mod2) % mod2

            return -1

        res = []
        words.sort(key=len)

        for i in range(len(words)):
            for j in range(i + 1, len(words)):
                if rabinKarp(words[j], words[i]) != -1:
                    res.append(words[i])
                    break

        return res

Time & Space Complexity

  • Time complexity: O(n2m)O(n ^ 2 * m)
  • Space complexity:
    • O(1)O(1) or O(n)O(n) space depending on the sorting algorithm.
    • O(nm)O(n * m) space for the output list.

Where nn is the number of words, and mm is the length of the longest word.


5. Z-Algorithm

Intuition

The Z-algorithm builds a Z-array where Z[i] represents the length of the longest substring starting at position i that matches a prefix of the string. By concatenating the pattern, a separator, and the text, we can find all occurrences of the pattern by looking for positions where Z[i] equals the pattern length.

Algorithm

  1. Implement the Z-algorithm: concatenate pattern + "$" + text and compute the Z-array.
  2. Any position after the separator where Z[i] == len(pattern) indicates a match.
  3. Sort words by length.
  4. For each word at index i, use the Z-algorithm to check if it's a substring of any word at index j > i.
  5. If a match is found, add the word to the result and break.
  6. Return the result list.
class Solution:
    def stringMatching(self, words: List[str]) -> List[str]:
        def zAlgorithm(word1: str, word2: str) -> int:
            s = word2 + "$" + word1
            n = len(s)
            z = [0] * n
            l, r = 0, 0

            for i in range(1, n):
                if i <= r:
                    z[i] = min(r - i + 1, z[i - l])
                while i + z[i] < n and s[z[i]] == s[i + z[i]]:
                    z[i] += 1
                if i + z[i] - 1 > r:
                    l, r = i, i + z[i] - 1

            for i in range(len(word2) + 1, n):
                if z[i] == len(word2):
                    return i - len(word2) - 1

            return -1

        res = []
        words.sort(key=len)

        for i in range(len(words)):
            for j in range(i + 1, len(words)):
                if zAlgorithm(words[j], words[i]) != -1:
                    res.append(words[i])
                    break

        return res

Time & Space Complexity

  • Time complexity: O(n2m)O(n ^ 2 * m)
  • Space complexity:
    • O(m)O(m) extra space.
    • O(1)O(1) or O(n)O(n) space depending on the sorting algorithm.
    • O(nm)O(n * m) space for the output list.

Where nn is the number of words, and mm is the length of the longest word.


6. Trie

Intuition

We can use a suffix trie to solve this problem. For each word, we insert all its suffixes into the trie. Each node tracks how many times it has been visited. When searching for a word, if the terminal node has been visited more than once, the word appears as a substring in another word (since we inserted all suffixes, any substring of any word is a prefix of some suffix).

Algorithm

  1. Build a Trie with a count at each node.
  2. For each word, insert all its suffixes. For each character traversed, increment the count.
  3. For each word, search the trie by following the characters.
  4. If the count at the final node is greater than 1, the word is a substring of another word (or appears multiple times). Add it to the result.
  5. Return the result list.
class TrieNode:
    def __init__(self):
        self.children = [None] * 26
        self.cnt = 0

class Trie:
    def __init__(self):
        self.root = TrieNode()

    def insert_suffixes(self, word: str) -> None:
        for i in range(len(word)):
            node = self.root
            for j in range(i, len(word)):
                idx = ord(word[j]) - ord('a')
                if not node.children[idx]:
                    node.children[idx] = TrieNode()

                node = node.children[idx]
                node.cnt += 1

    def search(self, word: str) -> bool:
        node = self.root
        for c in word:
            idx = ord(c) - ord('a')
            node = node.children[idx]
        return node.cnt > 1

class Solution:
    def stringMatching(self, words: List[str]) -> List[str]:
        res = []
        trie = Trie()

        for word in words:
            trie.insert_suffixes(word)

        for word in words:
            if trie.search(word):
                res.append(word)

        return res

Time & Space Complexity

  • Time complexity: O(nm2)O(n * m ^ 2)
  • Space complexity:
    • O(nm2)O(n * m ^ 2) extra space.
    • O(nm)O(n * m) space for the output list.

Where nn is the number of words, and mm is the length of the longest word.


Common Pitfalls

Comparing a Word Against Itself

When checking if a word is a substring of another, you must skip the case where i == j (the same word). Without this check, every word would be found as a substring of itself and incorrectly added to the result.

Adding Duplicate Words to the Result

If a word appears as a substring in multiple other words, you should add it to the result only once. Using a break statement after finding the first match, or using a set to track added words, prevents duplicates in the output.