1092. Shortest Common Supersequence - Explanation

Problem Link



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Dynamic Programming - This problem requires 2D DP with either memoization (top-down) or tabulation (bottom-up) to build optimal substructure
  • Longest Common Subsequence (LCS) - The SCS length equals len(str1) + len(str2) - LCS_length, so understanding LCS helps grasp the relationship
  • String Reconstruction from DP Tables - After computing DP values, you need to trace back through the table to construct the actual result string
  • Recursion with Memoization - The top-down approach uses recursive calls with caching to avoid redundant computations

1. Dynamic Programming (Top-Down)

Intuition

A supersequence must contain both strings as subsequences. The shortest one minimizes redundancy by sharing as many characters as possible between the two strings. This is directly related to the Longest Common Subsequence (LCS) problem.

Using recursion with memoization, at each position we have a choice: if the characters match, we include it once and move both pointers. If they differ, we try including either character and pick the shorter result. The base case handles when one string is exhausted, requiring us to append the remainder of the other.

Algorithm

  1. Define a recursive function dfs(i, j) that returns the shortest supersequence starting from indices i and j.
  2. Base cases:
    • If i reaches the end of str1, return the remaining suffix of str2.
    • If j reaches the end of str2, return the remaining suffix of str1.
  3. Recursive case:
    • If str1[i] == str2[j], include this character once and recurse on (i+1, j+1).
    • Otherwise, try both options: include str1[i] and recurse on (i+1, j), or include str2[j] and recurse on (i, j+1). Pick the shorter result.
  4. Memoize results to avoid recomputation.
  5. Return the result of dfs(0, 0).
class Solution:
    def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
        cache = [[None] * (len(str2) + 1) for _ in range(len(str1) + 1)]
        str1 = list(str1)
        str2 = list(str2)

        def dfs(i: int, j: int) -> list:
            if cache[i][j] is not None:
                return cache[i][j]
            if i == len(str1):
                cache[i][j] = str2[j:][::-1]
                return cache[i][j]
            if j == len(str2):
                cache[i][j] = str1[i:][::-1]
                return cache[i][j]

            if str1[i] == str2[j]:
                res = dfs(i + 1, j + 1) + [str1[i]]
            else:
                s1 = dfs(i + 1, j)
                s2 = dfs(i, j + 1)
                if len(s1) < len(s2):
                    res = s1 + [str1[i]]
                else:
                    res = s2 + [str2[j]]

            cache[i][j] = res
            return res

        return ''.join(reversed(dfs(0, 0)))

Time & Space Complexity

  • Time complexity: O(nmmin(n,m))O(n * m * min(n, m))
  • Space complexity: O(nmmin(n,m))O(n * m * min(n, m))

Where nn and mm are the lengths of the strings str1str1 and str2str2 respectively.


2. Dynamic Programming (Top-Down) + Tracing

Intuition

Building the actual string during recursion can be expensive due to string concatenation overhead. A more efficient approach is to first compute only the lengths using DP, then reconstruct the string by tracing back through the DP table.

The DP table stores the length of the shortest supersequence from each state (i, j). After filling the table, we trace from (0, 0) to the end, at each step deciding which character to include based on the DP values.

Algorithm

  1. Define a recursive function dfs(i, j) that returns the length of the shortest supersequence starting from indices i and j.
  2. Base cases: return m - j or n - i when one string is exhausted.
  3. Recursive case: if characters match, add 1 and recurse on (i+1, j+1). Otherwise, take the minimum of recursing on (i+1, j) or (i, j+1), plus 1.
  4. After computing the DP table, trace back:
    • Start at (0, 0).
    • If characters match, include it and move both pointers.
    • Otherwise, follow the direction with the smaller DP value.
  5. Append any remaining characters from either string.
  6. Return the constructed string.
class Solution:
    def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
        n, m = len(str1), len(str2)
        dp = [[-1] * (m + 1) for _ in range(n + 1)]

        def dfs(i, j):
            if dp[i][j] != -1:
                return dp[i][j]
            if i == n:
                dp[i][j] = m - j
                return dp[i][j]
            if j == m:
                dp[i][j] = n - i
                return dp[i][j]
            if str1[i] == str2[j]:
                dp[i][j] = 1 + dfs(i + 1, j + 1)
            else:
                dp[i][j] = 1 + min(dfs(i + 1, j), dfs(i, j + 1))
            return dp[i][j]

        dfs(0, 0)

        def build_scs(i, j):
            res = []
            while i < n or j < m:
                if i == n:
                    res.extend(str2[j:])
                    break
                if j == m:
                    res.extend(str1[i:])
                    break
                if str1[i] == str2[j]:
                    res.append(str1[i])
                    i += 1
                    j += 1
                elif dp[i + 1][j] < dp[i][j + 1]:
                    res.append(str1[i])
                    i += 1
                else:
                    res.append(str2[j])
                    j += 1
            return res

        return ''.join(build_scs(0, 0))

Time & Space Complexity

  • Time complexity: O(nm)O(n * m)
  • Space complexity: O(nm)O(n * m)

Where nn and mm are the lengths of the strings str1str1 and str2str2 respectively.


3. Dynamic Programming (Bottom-Up)

Intuition

Instead of recursion, we can fill the DP table iteratively from smaller subproblems to larger ones. The entry dp[i][j] stores the actual shortest supersequence for the prefixes str1[0..i-1] and str2[0..j-1].

The base cases initialize the first row and column with prefixes of each string. For each cell, we either extend by a common character or choose the shorter option when characters differ.

Algorithm

  1. Create a 2D table dp of size (n+1) x (m+1) to store strings.
  2. Fill base cases:
    • dp[0][j] = str2[0..j-1] for all j.
    • dp[i][0] = str1[0..i-1] for all i.
  3. Fill the table row by row:
    • If str1[i-1] == str2[j-1], set dp[i][j] = dp[i-1][j-1] + str1[i-1].
    • Otherwise, pick the shorter of dp[i-1][j] + str1[i-1] or dp[i][j-1] + str2[j-1].
  4. Return dp[n][m].
class Solution:
    def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
        n, m = len(str1), len(str2)
        dp = [[""] * (m + 1) for _ in range(n + 1)]

        for i in range(n + 1):
            for j in range(m + 1):
                if i == 0:
                    dp[i][j] = str2[:j]
                elif j == 0:
                    dp[i][j] = str1[:i]
                elif str1[i - 1] == str2[j - 1]:
                    dp[i][j] = dp[i - 1][j - 1] + str1[i - 1]
                else:
                    if len(dp[i - 1][j]) < len(dp[i][j - 1]):
                        dp[i][j] = dp[i - 1][j] + str1[i - 1]
                    else:
                        dp[i][j] = dp[i][j - 1] + str2[j - 1]

        return dp[n][m]

Time & Space Complexity

  • Time complexity: O(nmmin(n,m))O(n * m * min(n, m))
  • Space complexity: O(nmmin(n,m))O(n * m * min(n, m))

Where nn and mm are the lengths of the strings str1str1 and str2str2 respectively.


4. Dynamic Programming (Bottom-Up) + Tracing

Intuition

Storing full strings in the DP table is memory-intensive. A more space-efficient approach stores only the lengths, then reconstructs the string by tracing backward through the table.

This is the standard approach for the problem: compute the DP table of lengths bottom-up, then trace from (n, m) back to (0, 0), building the result string in reverse.

Algorithm

  1. Create a 2D table dp of size (n+1) x (m+1) to store lengths.
  2. Fill base cases: dp[0][j] = j and dp[i][0] = i.
  3. Fill the table:
    • If str1[i-1] == str2[j-1], set dp[i][j] = 1 + dp[i-1][j-1].
    • Otherwise, set dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1]).
  4. Trace back from (n, m):
    • If characters match, include it and move diagonally.
    • Otherwise, follow the smaller value (up or left) and include that character.
  5. Append remaining characters from either string.
  6. Reverse the result and return.
class Solution:
    def shortestCommonSupersequence(self, str1: str, str2: str) -> str:
        n, m = len(str1), len(str2)
        dp = [[0] * (m + 1) for _ in range(n + 1)]

        for i in range(n + 1):
            for j in range(m + 1):
                if i == 0:
                    dp[i][j] = j
                elif j == 0:
                    dp[i][j] = i
                elif str1[i - 1] == str2[j - 1]:
                    dp[i][j] = 1 + dp[i - 1][j - 1]
                else:
                    dp[i][j] = 1 + min(dp[i - 1][j], dp[i][j - 1])

        res = []
        i, j = n, m
        while i > 0 and j > 0:
            if str1[i - 1] == str2[j - 1]:
                res.append(str1[i - 1])
                i -= 1
                j -= 1
            elif dp[i - 1][j] < dp[i][j - 1]:
                res.append(str1[i - 1])
                i -= 1
            else:
                res.append(str2[j - 1])
                j -= 1

        while i > 0:
            res.append(str1[i - 1])
            i -= 1

        while j > 0:
            res.append(str2[j - 1])
            j -= 1

        return ''.join(reversed(res))

Time & Space Complexity

  • Time complexity: O(nm)O(n * m)
  • Space complexity: O(nm)O(n * m)

Where nn and mm are the lengths of the strings str1str1 and str2str2 respectively.


Common Pitfalls

Confusing SCS with LCS

The Shortest Common Supersequence is related to but different from the Longest Common Subsequence. The SCS length equals len(str1) + len(str2) - LCS_length. Some solutions incorrectly try to directly apply LCS logic without understanding that SCS requires including all characters from both strings, not just the common ones.

Incorrect Traceback Direction

When reconstructing the SCS from the DP table, you must trace backward from (n, m) to (0, 0) and then reverse the result. A common mistake is tracing forward, which produces a different (incorrect) supersequence. The direction of traceback must match the direction used to fill the DP table.

Missing Characters After Traceback Loop

After the main traceback loop terminates (when either i or j reaches 0), there may still be remaining characters in one of the strings. Forgetting to append these remaining characters results in an incomplete supersequence that does not contain one of the input strings as a subsequence.

Inefficient String Concatenation

Building the result string by repeated concatenation in languages like Java or Python can lead to O(n * m * (n + m)) time complexity due to string immutability. Use a StringBuilder, list, or similar mutable structure and reverse at the end for optimal performance.

Off-by-One Errors in DP Indices

The DP table has dimensions (n+1) x (m+1) where indices 0 represent empty prefixes. When accessing characters, dp[i][j] corresponds to str1[i-1] and str2[j-1]. Mixing up 0-indexed string access with 1-indexed DP access causes incorrect character comparisons and wrong results.