You are given an array of string words, return all strings in words that are a substring of another word. You can return the answer in any order.
Note: A substring is a contiguous non-empty sequence of characters within a string.
Example 1:
Input: words = ["mass","as","hero","superhero"]
Output: ["as","hero"]Explanation: "as" is substring of "mass" and "hero" is substring of "superhero". ["hero","as"] is also a valid answer.
Example 2:
Input: words = ["neetcode","neeet","neet","code"]
Output: ["neet","code"]Explanation: "neet" and "code" are substrings of "neetcode".
Example 3:
Input: words = ["blue","green","bu"]
Output: []Explanation: No string of words is substring of another string.
Constraints:
1 <= words.length <= 1001 <= words[i].length <= 30words[i] contains only lowercase English letters.words are unique.For each word, we check if it appears as a substring in any other word. If it does, we add it to our result. Since we need to check every word against every other word, we use two nested loops. Once we find that a word is a substring of another, we can stop checking and move to the next word.
i, iterate through all words at index j where j != i.words[i] is a substring of words[j] using the built-in substring check.words[i] to the result and break the inner loop.Where is the number of words, and is the length of the longest word.
A shorter word can only be a substring of a longer word, never the other way around. By sorting words by length, we only need to check each word against longer words that come after it. This gives a minor optimization by reducing unnecessary comparisons, though the worst-case complexity remains the same.
i, check if it's a substring of any word at index j where j > i.break the inner loop.Where is the number of words, and is the length of the longest word.
Instead of using the built-in substring search, we can use the KMP algorithm which preprocesses the pattern to enable efficient matching. KMP builds a "longest proper prefix which is also suffix" (LPS) array that allows us to skip characters during mismatches rather than starting over. This improves substring matching to linear time in the combined length of the strings.
i, use KMP to check if it's a substring of any word at index j > i.-1), add the word to the result and break.class Solution:
def stringMatching(self, words: List[str]) -> List[str]:
def kmp(word1: str, word2: str) -> int:
lps = [0] * len(word2)
prevLPS, i = 0, 1
while i < len(word2):
if word2[i] == word2[prevLPS]:
lps[i] = prevLPS + 1
prevLPS += 1
i += 1
elif prevLPS == 0:
lps[i] = 0
i += 1
else:
prevLPS = lps[prevLPS - 1]
i = j = 0
while i < len(word1):
if word1[i] == word2[j]:
i += 1
j += 1
else:
if j == 0:
i += 1
else:
j = lps[j - 1]
if j == len(word2):
return i - len(word2)
return -1
res = []
words.sort(key=len)
for i in range(len(words)):
for j in range(i + 1, len(words)):
if kmp(words[j], words[i]) != -1:
res.append(words[i])
break
return resWhere is the number of words, and is the length of the longest word.
Rabin-Karp uses hashing to speed up substring matching. We compute a hash of the pattern and then slide a window over the text, computing the hash of each window using a rolling hash technique. If the hashes match, we have a potential match (with possible false positives). Using double hashing with two different bases and moduli reduces false positive probability to near zero.
O(1) per step.i, use Rabin-Karp to check if it's a substring of any word at index j > i.break.class Solution:
def stringMatching(self, words: List[str]) -> List[str]:
def rabinKarp(word1: str, word2: str) -> int:
base1, mod1 = 31, 768258391
base2, mod2 = 37, 685683731
n, m = len(word1), len(word2)
power1, power2 = 1, 1
for _ in range(m):
power1 = (power1 * base1) % mod1
power2 = (power2 * base2) % mod2
word1_hash1 = word1_hash2 = 0
word2_hash1 = word2_hash2 = 0
for i in range(m):
word1_hash1 = (word1_hash1 * base1 + ord(word2[i])) % mod1
word1_hash2 = (word1_hash2 * base2 + ord(word2[i])) % mod2
word2_hash1 = (word2_hash1 * base1 + ord(word1[i])) % mod1
word2_hash2 = (word2_hash2 * base2 + ord(word1[i])) % mod2
for i in range(n - m + 1):
if word2_hash1 == word1_hash1 and word2_hash2 == word1_hash2:
return i
if i + m < n:
word2_hash1 = (word2_hash1 * base1 - ord(word1[i]) * power1 + ord(word1[i + m])) % mod1
word2_hash2 = (word2_hash2 * base2 - ord(word1[i]) * power2 + ord(word1[i + m])) % mod2
word2_hash1 = (word2_hash1 + mod1) % mod1
word2_hash2 = (word2_hash2 + mod2) % mod2
return -1
res = []
words.sort(key=len)
for i in range(len(words)):
for j in range(i + 1, len(words)):
if rabinKarp(words[j], words[i]) != -1:
res.append(words[i])
break
return resWhere is the number of words, and is the length of the longest word.
The Z-algorithm builds a Z-array where Z[i] represents the length of the longest substring starting at position i that matches a prefix of the string. By concatenating the pattern, a separator, and the text, we can find all occurrences of the pattern by looking for positions where Z[i] equals the pattern length.
pattern + "$" + text and compute the Z-array.Z[i] == len(pattern) indicates a match.i, use the Z-algorithm to check if it's a substring of any word at index j > i.break.class Solution:
def stringMatching(self, words: List[str]) -> List[str]:
def zAlgorithm(word1: str, word2: str) -> int:
s = word2 + "$" + word1
n = len(s)
z = [0] * n
l, r = 0, 0
for i in range(1, n):
if i <= r:
z[i] = min(r - i + 1, z[i - l])
while i + z[i] < n and s[z[i]] == s[i + z[i]]:
z[i] += 1
if i + z[i] - 1 > r:
l, r = i, i + z[i] - 1
for i in range(len(word2) + 1, n):
if z[i] == len(word2):
return i - len(word2) - 1
return -1
res = []
words.sort(key=len)
for i in range(len(words)):
for j in range(i + 1, len(words)):
if zAlgorithm(words[j], words[i]) != -1:
res.append(words[i])
break
return resWhere is the number of words, and is the length of the longest word.
We can use a suffix trie to solve this problem. For each word, we insert all its suffixes into the trie. Each node tracks how many times it has been visited. When searching for a word, if the terminal node has been visited more than once, the word appears as a substring in another word (since we inserted all suffixes, any substring of any word is a prefix of some suffix).
1, the word is a substring of another word (or appears multiple times). Add it to the result.class TrieNode:
def __init__(self):
self.children = [None] * 26
self.cnt = 0
class Trie:
def __init__(self):
self.root = TrieNode()
def insert_suffixes(self, word: str) -> None:
for i in range(len(word)):
node = self.root
for j in range(i, len(word)):
idx = ord(word[j]) - ord('a')
if not node.children[idx]:
node.children[idx] = TrieNode()
node = node.children[idx]
node.cnt += 1
def search(self, word: str) -> bool:
node = self.root
for c in word:
idx = ord(c) - ord('a')
node = node.children[idx]
return node.cnt > 1
class Solution:
def stringMatching(self, words: List[str]) -> List[str]:
res = []
trie = Trie()
for word in words:
trie.insert_suffixes(word)
for word in words:
if trie.search(word):
res.append(word)
return resWhere is the number of words, and is the length of the longest word.
When checking if a word is a substring of another, you must skip the case where i == j (the same word). Without this check, every word would be found as a substring of itself and incorrectly added to the result.
If a word appears as a substring in multiple other words, you should add it to the result only once. Using a break statement after finding the first match, or using a set to track added words, prevents duplicates in the output.