1531. String Compression II - Explanation

Problem Link



1. Dynamic Programming (Top-Down)

Intuition

We want to delete at most k characters to minimize the run-length encoded length. The key observation is that the encoded length only increases at certain thresholds: going from 1 to 2 characters adds a digit, going from 9 to 10 adds another digit, and going from 99 to 100 adds yet another. We use recursion with memoization, tracking the current position, remaining deletions, the previous character, and its count. At each step, we either extend a run (if the current character matches the previous) or start a new run (keeping or deleting the current character).

Algorithm

  1. Define count(i, k, prev, prev_cnt) where i is current index, k is remaining deletions, prev is the previous character, and prev_cnt is how many times it has appeared consecutively.
  2. Base cases: if k < 0, return infinity (invalid). If i == n, return 0.
  3. If s[i] == prev, extend the current run. Add 1 to the result only if prev_cnt is 1, 9, or 99 (thresholds where encoded length increases).
  4. If s[i] != prev, choose the minimum of: deleting s[i] (using one deletion), or keeping s[i] (starting a new run with length contribution of 1).
  5. Memoize with a 4D cache.
  6. Return count(0, k, "", 0).
class Solution:
    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
        cache = {}

        def count(i, k, prev, prev_cnt):
            if (i, k, prev, prev_cnt) in cache:
                return cache[(i, k, prev, prev_cnt)]
            if k < 0:
                return float("inf")
            if i == len(s):
                return 0

            if s[i] == prev:
                incr = 1 if prev_cnt in [1, 9, 99] else 0
                res = incr + count(i + 1, k, prev, prev_cnt + 1)
            else:
                res = min(
                    count(i + 1, k - 1, prev, prev_cnt),  # delete s[i]
                    1 + count(i + 1, k, s[i], 1)  # don't delete
                )

            cache[(i, k, prev, prev_cnt)] = res
            return res

        return count(0, k, "", 0)

Time & Space Complexity

  • Time complexity: O(kn2)O(k * n ^ 2)
  • Space complexity: O(kn2)O(k * n ^ 2)

Where nn is the length of the string ss and kk is the maximum number of characters that can be deleted from the string.


2. Dynamic Programming (Top-Down Optimized)

Intuition

Instead of tracking the previous character and its count explicitly, we can think of the problem differently. At each position, we decide to either delete the current character or make it the start of a new run. If we start a new run, we scan forward and try to extend it by keeping matching characters and deleting non-matching ones (within our deletion budget). This reduces the state space to just position and remaining deletions.

Algorithm

  1. Define dfs(i, k) where i is the current index and k is the remaining deletion budget.
  2. Base case: if n - i <= k, we can delete all remaining characters, so return 0.
  3. Option 1: delete s[i] if k > 0, giving dfs(i + 1, k - 1).
  4. Option 2: start a run with s[i]. Scan forward, counting matching characters and deleting non-matching ones. Track the compressed length (which increases at counts 1, 9, 99). For each endpoint, compute comp_len + dfs(j + 1, k - delCnt).
  5. Take the minimum across all options.
  6. Memoize with a 2D cache dp[n][k+1].
  7. Return dfs(0, k).
class Solution:
    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
        n = len(s)
        dp = {}

        def dfs(i, k):
            if n - i <= k:
                return 0
            if (i, k) in dp:
                return dp[(i, k)]

            res = 150
            if k > 0:
                res = dfs(i + 1, k - 1)

            freq = delCnt = 0
            comp_len = 1
            for j in range(i, n):
                if s[i] == s[j]:
                    if freq in [1, 9, 99]:
                        comp_len += 1
                    freq += 1
                else:
                    delCnt += 1
                    if delCnt > k:
                        break
                res = min(res, comp_len + dfs(j + 1, k - delCnt))
            dp[(i, k)] = res
            return res

        return dfs(0, k)

Time & Space Complexity

  • Time complexity: O(n2k)O(n ^ 2 * k)
  • Space complexity: O(nk)O(n * k)

Where nn is the length of the string ss and kk is the maximum number of characters that can be deleted from the string.


3. Dynamic Programming (Bottom-Up)

Intuition

We convert the optimized top-down solution to bottom-up form. We process positions from right to left, computing for each position and deletion budget the minimum encoded length. This iterative approach fills the DP table systematically and avoids recursion overhead.

Algorithm

  1. Create a 2D DP array dp[n+1][k+1] initialized to a large value (e.g., 150), with dp[n][*] = 0 as base cases.
  2. Iterate i from n-1 down to 0, and for each rem_k from 0 to k.
  3. Option 1: if rem_k > 0, set dp[i][rem_k] = dp[i+1][rem_k-1] (delete current character).
  4. Option 2: scan forward from i, counting frequency of s[i] and deletions of other characters. Track compressed length (starts at 1, increases at thresholds 1, 9, 99). Update dp[i][rem_k] = min(dp[i][rem_k], comp_len + dp[j+1][rem_k - delCnt]).
  5. Stop scanning when delCnt > rem_k.
  6. Return dp[0][k].
class Solution:
    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
        n = len(s)
        dp = [[150] * (k + 1) for _ in range(n)]
        dp.append([0] * (k + 1))

        for i in range(n - 1, -1, -1):
            for rem_k in range(k + 1):
                if rem_k > 0:
                    dp[i][rem_k] = dp[i + 1][rem_k - 1]

                freq = delCnt = 0
                comp_len = 1
                for j in range(i, n):
                    if s[i] == s[j]:
                        if freq in [1, 9, 99]:
                            comp_len += 1
                        freq += 1
                    else:
                        delCnt += 1
                        if delCnt > rem_k:
                            break
                    dp[i][rem_k] = min(dp[i][rem_k], comp_len + dp[j + 1][rem_k - delCnt])

        return dp[0][k]

Time & Space Complexity

  • Time complexity: O(n2k)O(n ^ 2 * k)
  • Space complexity: O(nk)O(n * k)

Where nn is the length of the string ss and kk is the maximum number of characters that can be deleted from the string.