Before attempting this problem, you should be comfortable with:
We want to delete at most k characters to minimize the run-length encoded length. The key observation is that the encoded length only increases at certain thresholds: going from 1 to 2 characters adds a digit, going from 9 to 10 adds another digit, and going from 99 to 100 adds yet another. We use recursion with memoization, tracking the current position, remaining deletions, the previous character, and its count. At each step, we either extend a run (if the current character matches the previous) or start a new run (keeping or deleting the current character).
count(i, k, prev, prev_cnt) where i is current index, k is remaining deletions, prev is the previous character, and prev_cnt is how many times it has appeared consecutively.k < 0, return infinity (invalid). If i == n, return 0.s[i] == prev, extend the current run. Add 1 to the result only if prev_cnt is 1, 9, or 99 (thresholds where encoded length increases).s[i] != prev, choose the minimum of: deleting s[i] (using one deletion), or keeping s[i] (starting a new run with length contribution of 1).count(0, k, "", 0).class Solution:
def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
cache = {}
def count(i, k, prev, prev_cnt):
if (i, k, prev, prev_cnt) in cache:
return cache[(i, k, prev, prev_cnt)]
if k < 0:
return float("inf")
if i == len(s):
return 0
if s[i] == prev:
incr = 1 if prev_cnt in [1, 9, 99] else 0
res = incr + count(i + 1, k, prev, prev_cnt + 1)
else:
res = min(
count(i + 1, k - 1, prev, prev_cnt), # delete s[i]
1 + count(i + 1, k, s[i], 1) # don't delete
)
cache[(i, k, prev, prev_cnt)] = res
return res
return count(0, k, "", 0)Where is the length of the string and is the maximum number of characters that can be deleted from the string.
Instead of tracking the previous character and its count explicitly, we can think of the problem differently. At each position, we decide to either delete the current character or make it the start of a new run. If we start a new run, we scan forward and try to extend it by keeping matching characters and deleting non-matching ones (within our deletion budget). This reduces the state space to just position and remaining deletions.
dfs(i, k) where i is the current index and k is the remaining deletion budget.n - i <= k, we can delete all remaining characters, so return 0.s[i] if k > 0, giving dfs(i + 1, k - 1).s[i]. Scan forward, counting matching characters and deleting non-matching ones. Track the compressed length (which increases at counts 1, 9, 99). For each endpoint, compute comp_len + dfs(j + 1, k - delCnt).dp[n][k+1].dfs(0, k).class Solution:
def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
n = len(s)
dp = {}
def dfs(i, k):
if n - i <= k:
return 0
if (i, k) in dp:
return dp[(i, k)]
res = 150
if k > 0:
res = dfs(i + 1, k - 1)
freq = delCnt = 0
comp_len = 1
for j in range(i, n):
if s[i] == s[j]:
if freq in [1, 9, 99]:
comp_len += 1
freq += 1
else:
delCnt += 1
if delCnt > k:
break
res = min(res, comp_len + dfs(j + 1, k - delCnt))
dp[(i, k)] = res
return res
return dfs(0, k)Where is the length of the string and is the maximum number of characters that can be deleted from the string.
We convert the optimized top-down solution to bottom-up form. We process positions from right to left, computing for each position and deletion budget the minimum encoded length. This iterative approach fills the DP table systematically and avoids recursion overhead.
dp[n+1][k+1] initialized to a large value (e.g., 150), with dp[n][*] = 0 as base cases.i from n-1 down to 0, and for each rem_k from 0 to k.rem_k > 0, set dp[i][rem_k] = dp[i+1][rem_k-1] (delete current character).i, counting frequency of s[i] and deletions of other characters. Track compressed length (starts at 1, increases at thresholds 1, 9, 99). Update dp[i][rem_k] = min(dp[i][rem_k], comp_len + dp[j+1][rem_k - delCnt]).delCnt > rem_k.dp[0][k].class Solution:
def getLengthOfOptimalCompression(self, s: str, k: int) -> int:
n = len(s)
dp = [[150] * (k + 1) for _ in range(n)]
dp.append([0] * (k + 1))
for i in range(n - 1, -1, -1):
for rem_k in range(k + 1):
if rem_k > 0:
dp[i][rem_k] = dp[i + 1][rem_k - 1]
freq = delCnt = 0
comp_len = 1
for j in range(i, n):
if s[i] == s[j]:
if freq in [1, 9, 99]:
comp_len += 1
freq += 1
else:
delCnt += 1
if delCnt > rem_k:
break
dp[i][rem_k] = min(dp[i][rem_k], comp_len + dp[j + 1][rem_k - delCnt])
return dp[0][k]Where is the length of the string and is the maximum number of characters that can be deleted from the string.
The encoded length does not increase linearly with count. It is 1 for a single character (e.g., "a"), 2 for counts 2-9 (e.g., "a5"), 3 for counts 10-99 (e.g., "a12"), and 4 for count 100 (e.g., "a100"). The length only increases at thresholds 1, 9, and 99. Missing these thresholds leads to incorrect length calculations.
A greedy approach of only deleting characters that break runs fails. Sometimes deleting characters within a run or deleting characters to merge two separate runs of the same character produces a shorter encoding. The DP must explore both keeping and deleting each character.
Tracking only position and remaining deletions is insufficient for the naive approach. You also need to track the previous character and its count to properly compute the contribution to encoded length. The optimized approach avoids this by scanning forward to form complete runs.
Using INT_MAX or Integer.MAX_VALUE as infinity and then adding to it causes overflow. Either use a smaller sentinel value (like 150, which exceeds the maximum possible answer) or check for infinity before performing arithmetic operations.
When scanning forward to extend a run, ensure you correctly handle the boundary when j reaches n. The base case when all remaining characters can be deleted (n - i <= k) should return 0. Failing to check bounds before accessing s[j] causes runtime errors.