A supersequence must contain both strings as subsequences. The shortest one minimizes redundancy by sharing as many characters as possible between the two strings. This is directly related to the Longest Common Subsequence (LCS) problem.
Using recursion with memoization, at each position we have a choice: if the characters match, we include it once and move both pointers. If they differ, we try including either character and pick the shorter result. The base case handles when one string is exhausted, requiring us to append the remainder of the other.
dfs(i, j) that returns the shortest supersequence starting from indices i and j.i reaches the end of str1, return the remaining suffix of str2.j reaches the end of str2, return the remaining suffix of str1.str1[i] == str2[j], include this character once and recurse on (i+1, j+1).str1[i] and recurse on (i+1, j), or include str2[j] and recurse on (i, j+1). Pick the shorter result.dfs(0, 0).class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
cache = [[None] * (len(str2) + 1) for _ in range(len(str1) + 1)]
str1 = list(str1)
str2 = list(str2)
def dfs(i: int, j: int) -> list:
if cache[i][j] is not None:
return cache[i][j]
if i == len(str1):
cache[i][j] = str2[j:][::-1]
return cache[i][j]
if j == len(str2):
cache[i][j] = str1[i:][::-1]
return cache[i][j]
if str1[i] == str2[j]:
res = dfs(i + 1, j + 1) + [str1[i]]
else:
s1 = dfs(i + 1, j)
s2 = dfs(i, j + 1)
if len(s1) < len(s2):
res = s1 + [str1[i]]
else:
res = s2 + [str2[j]]
cache[i][j] = res
return res
return ''.join(reversed(dfs(0, 0)))Where and are the lengths of the strings and respectively.
Building the actual string during recursion can be expensive due to string concatenation overhead. A more efficient approach is to first compute only the lengths using DP, then reconstruct the string by tracing back through the DP table.
The DP table stores the length of the shortest supersequence from each state (i, j). After filling the table, we trace from (0, 0) to the end, at each step deciding which character to include based on the DP values.
dfs(i, j) that returns the length of the shortest supersequence starting from indices i and j.m - j or n - i when one string is exhausted.1 and recurse on (i+1, j+1). Otherwise, take the minimum of recursing on (i+1, j) or (i, j+1), plus 1.(0, 0).class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
n, m = len(str1), len(str2)
dp = [[-1] * (m + 1) for _ in range(n + 1)]
def dfs(i, j):
if dp[i][j] != -1:
return dp[i][j]
if i == n:
dp[i][j] = m - j
return dp[i][j]
if j == m:
dp[i][j] = n - i
return dp[i][j]
if str1[i] == str2[j]:
dp[i][j] = 1 + dfs(i + 1, j + 1)
else:
dp[i][j] = 1 + min(dfs(i + 1, j), dfs(i, j + 1))
return dp[i][j]
dfs(0, 0)
def build_scs(i, j):
res = []
while i < n or j < m:
if i == n:
res.extend(str2[j:])
break
if j == m:
res.extend(str1[i:])
break
if str1[i] == str2[j]:
res.append(str1[i])
i += 1
j += 1
elif dp[i + 1][j] < dp[i][j + 1]:
res.append(str1[i])
i += 1
else:
res.append(str2[j])
j += 1
return res
return ''.join(build_scs(0, 0))Where and are the lengths of the strings and respectively.
Instead of recursion, we can fill the DP table iteratively from smaller subproblems to larger ones. The entry dp[i][j] stores the actual shortest supersequence for the prefixes str1[0..i-1] and str2[0..j-1].
The base cases initialize the first row and column with prefixes of each string. For each cell, we either extend by a common character or choose the shorter option when characters differ.
dp of size (n+1) x (m+1) to store strings.dp[0][j] = str2[0..j-1] for all j.dp[i][0] = str1[0..i-1] for all i.str1[i-1] == str2[j-1], set dp[i][j] = dp[i-1][j-1] + str1[i-1].dp[i-1][j] + str1[i-1] or dp[i][j-1] + str2[j-1].dp[n][m].class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
n, m = len(str1), len(str2)
dp = [[""] * (m + 1) for _ in range(n + 1)]
for i in range(n + 1):
for j in range(m + 1):
if i == 0:
dp[i][j] = str2[:j]
elif j == 0:
dp[i][j] = str1[:i]
elif str1[i - 1] == str2[j - 1]:
dp[i][j] = dp[i - 1][j - 1] + str1[i - 1]
else:
if len(dp[i - 1][j]) < len(dp[i][j - 1]):
dp[i][j] = dp[i - 1][j] + str1[i - 1]
else:
dp[i][j] = dp[i][j - 1] + str2[j - 1]
return dp[n][m]Where and are the lengths of the strings and respectively.
Storing full strings in the DP table is memory-intensive. A more space-efficient approach stores only the lengths, then reconstructs the string by tracing backward through the table.
This is the standard approach for the problem: compute the DP table of lengths bottom-up, then trace from (n, m) back to (0, 0), building the result string in reverse.
dp of size (n+1) x (m+1) to store lengths.dp[0][j] = j and dp[i][0] = i.str1[i-1] == str2[j-1], set dp[i][j] = 1 + dp[i-1][j-1].dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1]).(n, m):class Solution:
def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
n, m = len(str1), len(str2)
dp = [[0] * (m + 1) for _ in range(n + 1)]
for i in range(n + 1):
for j in range(m + 1):
if i == 0:
dp[i][j] = j
elif j == 0:
dp[i][j] = i
elif str1[i - 1] == str2[j - 1]:
dp[i][j] = 1 + dp[i - 1][j - 1]
else:
dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1])
res = []
i, j = n, m
while i > 0 and j > 0:
if str1[i - 1] == str2[j - 1]:
res.append(str1[i - 1])
i -= 1
j -= 1
elif dp[i - 1][j] < dp[i][j - 1]:
res.append(str1[i - 1])
i -= 1
else:
res.append(str2[j - 1])
j -= 1
while i > 0:
res.append(str1[i - 1])
i -= 1
while j > 0:
res.append(str2[j - 1])
j -= 1
return ''.join(reversed(res))Where and are the lengths of the strings and respectively.
The Shortest Common Supersequence is related to but different from the Longest Common Subsequence. The SCS length equals len(str1) + len(str2) - LCS_length. Some solutions incorrectly try to directly apply LCS logic without understanding that SCS requires including all characters from both strings, not just the common ones.
When reconstructing the SCS from the DP table, you must trace backward from (n, m) to (0, 0) and then reverse the result. A common mistake is tracing forward, which produces a different (incorrect) supersequence. The direction of traceback must match the direction used to fill the DP table.
After the main traceback loop terminates (when either i or j reaches 0), there may still be remaining characters in one of the strings. Forgetting to append these remaining characters results in an incomplete supersequence that does not contain one of the input strings as a subsequence.
Building the result string by repeated concatenation in languages like Java or Python can lead to O(n * m * (n + m)) time complexity due to string immutability. Use a StringBuilder, list, or similar mutable structure and reverse at the end for optimal performance.
The DP table has dimensions (n+1) x (m+1) where indices 0 represent empty prefixes. When accessing characters, dp[i][j] corresponds to str1[i-1] and str2[j-1]. Mixing up 0-indexed string access with 1-indexed DP access causes incorrect character comparisons and wrong results.