1531. String Compression II - Explanation

Problem Link



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Dynamic Programming (Memoization) - Caching results for states defined by position, remaining deletions, and run information
  • Run-Length Encoding - Understanding how consecutive character runs are compressed (e.g., "aaa" becomes "a3")
  • Multi-dimensional State Management - Tracking position, deletion budget, previous character, and count simultaneously
  • Optimization Thresholds - Recognizing that encoded length only increases at specific counts (1, 9, 99)

1. Dynamic Programming (Top-Down)

Intuition

We want to delete at most k characters to minimize the run-length encoded length. The key observation is that the encoded length only increases at certain thresholds: going from 1 to 2 characters adds a digit, going from 9 to 10 adds another digit, and going from 99 to 100 adds yet another. We use recursion with memoization, tracking the current position, remaining deletions, the previous character, and its count. At each step, we either extend a run (if the current character matches the previous) or start a new run (keeping or deleting the current character).

Algorithm

  1. Define count(i, k, prev, prev_cnt) where i is current index, k is remaining deletions, prev is the previous character, and prev_cnt is how many times it has appeared consecutively.
  2. Base cases: if k < 0, return infinity (invalid). If i == n, return 0.
  3. If s[i] == prev, extend the current run. Add 1 to the result only if prev_cnt is 1, 9, or 99 (thresholds where encoded length increases).
  4. If s[i] != prev, choose the minimum of: deleting s[i] (using one deletion), or keeping s[i] (starting a new run with length contribution of 1).
  5. Memoize with a 4D cache.
  6. Return count(0, k, "", 0).
class Solution:
    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
        cache = {}

        def count(i, k, prev, prev_cnt):
            if (i, k, prev, prev_cnt) in cache:
                return cache[(i, k, prev, prev_cnt)]
            if k < 0:
                return float("inf")
            if i == len(s):
                return 0

            if s[i] == prev:
                incr = 1 if prev_cnt in [1, 9, 99] else 0
                res = incr + count(i + 1, k, prev, prev_cnt + 1)
            else:
                res = min(
                    count(i + 1, k - 1, prev, prev_cnt),  # delete s[i]
                    1 + count(i + 1, k, s[i], 1)  # don't delete
                )

            cache[(i, k, prev, prev_cnt)] = res
            return res

        return count(0, k, "", 0)

Time & Space Complexity

  • Time complexity: O(kn2)O(k * n ^ 2)
  • Space complexity: O(kn2)O(k * n ^ 2)

Where nn is the length of the string ss and kk is the maximum number of characters that can be deleted from the string.


2. Dynamic Programming (Top-Down Optimized)

Intuition

Instead of tracking the previous character and its count explicitly, we can think of the problem differently. At each position, we decide to either delete the current character or make it the start of a new run. If we start a new run, we scan forward and try to extend it by keeping matching characters and deleting non-matching ones (within our deletion budget). This reduces the state space to just position and remaining deletions.

Algorithm

  1. Define dfs(i, k) where i is the current index and k is the remaining deletion budget.
  2. Base case: if n - i <= k, we can delete all remaining characters, so return 0.
  3. Option 1: delete s[i] if k > 0, giving dfs(i + 1, k - 1).
  4. Option 2: start a run with s[i]. Scan forward, counting matching characters and deleting non-matching ones. Track the compressed length (which increases at counts 1, 9, 99). For each endpoint, compute comp_len + dfs(j + 1, k - delCnt).
  5. Take the minimum across all options.
  6. Memoize with a 2D cache dp[n][k+1].
  7. Return dfs(0, k).
class Solution:
    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
        n = len(s)
        dp = {}

        def dfs(i, k):
            if n - i <= k:
                return 0
            if (i, k) in dp:
                return dp[(i, k)]

            res = 150
            if k > 0:
                res = dfs(i + 1, k - 1)

            freq = delCnt = 0
            comp_len = 1
            for j in range(i, n):
                if s[i] == s[j]:
                    if freq in [1, 9, 99]:
                        comp_len += 1
                    freq += 1
                else:
                    delCnt += 1
                    if delCnt > k:
                        break
                res = min(res, comp_len + dfs(j + 1, k - delCnt))
            dp[(i, k)] = res
            return res

        return dfs(0, k)

Time & Space Complexity

  • Time complexity: O(n2k)O(n ^ 2 * k)
  • Space complexity: O(nk)O(n * k)

Where nn is the length of the string ss and kk is the maximum number of characters that can be deleted from the string.


3. Dynamic Programming (Bottom-Up)

Intuition

We convert the optimized top-down solution to bottom-up form. We process positions from right to left, computing for each position and deletion budget the minimum encoded length. This iterative approach fills the DP table systematically and avoids recursion overhead.

Algorithm

  1. Create a 2D DP array dp[n+1][k+1] initialized to a large value (e.g., 150), with dp[n][*] = 0 as base cases.
  2. Iterate i from n-1 down to 0, and for each rem_k from 0 to k.
  3. Option 1: if rem_k > 0, set dp[i][rem_k] = dp[i+1][rem_k-1] (delete current character).
  4. Option 2: scan forward from i, counting frequency of s[i] and deletions of other characters. Track compressed length (starts at 1, increases at thresholds 1, 9, 99). Update dp[i][rem_k] = min(dp[i][rem_k], comp_len + dp[j+1][rem_k - delCnt]).
  5. Stop scanning when delCnt > rem_k.
  6. Return dp[0][k].
class Solution:
    def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
        n = len(s)
        dp = [[150] * (k + 1) for _ in range(n)]
        dp.append([0] * (k + 1))

        for i in range(n - 1, -1, -1):
            for rem_k in range(k + 1):
                if rem_k > 0:
                    dp[i][rem_k] = dp[i + 1][rem_k - 1]

                freq = delCnt = 0
                comp_len = 1
                for j in range(i, n):
                    if s[i] == s[j]:
                        if freq in [1, 9, 99]:
                            comp_len += 1
                        freq += 1
                    else:
                        delCnt += 1
                        if delCnt > rem_k:
                            break
                    dp[i][rem_k] = min(dp[i][rem_k], comp_len + dp[j + 1][rem_k - delCnt])

        return dp[0][k]

Time & Space Complexity

  • Time complexity: O(n2k)O(n ^ 2 * k)
  • Space complexity: O(nk)O(n * k)

Where nn is the length of the string ss and kk is the maximum number of characters that can be deleted from the string.


Common Pitfalls

Miscounting Run-Length Encoding Length

The encoded length does not increase linearly with count. It is 1 for a single character (e.g., "a"), 2 for counts 2-9 (e.g., "a5"), 3 for counts 10-99 (e.g., "a12"), and 4 for count 100 (e.g., "a100"). The length only increases at thresholds 1, 9, and 99. Missing these thresholds leads to incorrect length calculations.

Not Considering All Deletion Strategies

A greedy approach of only deleting characters that break runs fails. Sometimes deleting characters within a run or deleting characters to merge two separate runs of the same character produces a shorter encoding. The DP must explore both keeping and deleting each character.

Incorrect State Representation

Tracking only position and remaining deletions is insufficient for the naive approach. You also need to track the previous character and its count to properly compute the contribution to encoded length. The optimized approach avoids this by scanning forward to form complete runs.

Integer Overflow with Infinity Values

Using INT_MAX or Integer.MAX_VALUE as infinity and then adding to it causes overflow. Either use a smaller sentinel value (like 150, which exceeds the maximum possible answer) or check for infinity before performing arithmetic operations.

Off-by-One Errors in Loop Termination

When scanning forward to extend a run, ensure you correctly handle the boundary when j reaches n. The base case when all remaining characters can be deleted (n - i <= k) should return 0. Failing to check bounds before accessing s[j] causes runtime errors.