271. Encode and Decode Strings - Explanation

Problem Link

Description

Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.

Machine 1 (sender) has the function:

string encode(vector<string> strs) {
    // ... your code
    return encoded_string;
}

Machine 2 (receiver) has the function:

vector<string> decode(string s) {
    //... your code
    return strs;
}

So Machine 1 does:

string encoded_string = encode(strs);

and Machine 2 does:

vector<string> strs2 = decode(encoded_string);

strs2 in Machine 2 should be the same as strs in Machine 1.

Implement the encode and decode methods.

Example 1:

Input: dummy_input = ["Hello","World"]

Output: ["Hello","World"]

Explanation:
Machine 1:
Codec encoder = new Codec();
String msg = encoder.encode(strs);
Machine 1 ---msg---> Machine 2

Machine 2:
Codec decoder = new Codec();
String[] strs = decoder.decode(msg);

Example 2:

Input: dummy_input = [""]

Output: [""]

Constraints:

  • 0 <= strs.length < 100
  • 0 <= strs[i].length < 200
  • strs[i] contains any possible characters out of 256 valid ASCII characters.

Follow up: Could you write a generalized algorithm to work on any possible set of characters?



Recommended Time & Space Complexity

You should aim for a solution with O(m) time for each encode() and decode() call and O(m+n) space, where m is the sum of lengths of all the strings and n is the number of strings.


Hint 1

A naive solution would be to use a non-ascii character as a delimiter. Can you think of a better way?


Hint 2

Try to encode and decode the strings using a smart approach based on the lengths of each string. How can you differentiate between the lengths and any numbers that might be present in the strings?


Hint 3

We can use an encoding approach where we start with a number representing the length of the string, followed by a separator character (let's use # for simplicity), and then the string itself. To decode, we read the number until we reach a #, then use that number to read the specified number of characters as the string.


Company Tags

Please upgrade to NeetCode Pro to view company tags.



1. Encoding & Decoding

Intuition

To encode a list of strings into a single string, we need a way to store each string so that we can later separate them correctly during decoding.
A simple and reliable strategy is to record the length of each string first, followed by a special separator, and then append all the strings together.
During decoding, we can read the recorded lengths to know exactly how many characters to extract for each original string.
This avoids any issues with special characters, commas, or symbols inside the strings because the lengths tell us precisely where each string starts and ends.

Algorithm

Encoding

  1. If the input list is empty, return an empty string.
  2. Create an empty list to store the sizes of each string.
  3. For each string, append its length to the sizes list.
  4. Build a single string by:
    • Writing all sizes separated by commas.
    • Adding a '#' to mark the end of the size section.
    • Appending all the actual strings in order.
  5. Return the final encoded string.

Decoding

  1. If the encoded string is empty, return an empty list.
  2. Read characters from the start until reaching '#' to extract all recorded sizes:
    • Parse each size by reading until a comma.
  3. After the '#', extract substrings according to the sizes list:
    • For each size, read that many characters and append the substring to the result.
  4. Return the list of decoded strings.
class Solution:
    def encode(self, strs: List[str]) -> str:
        if not strs:
            return ""
        sizes, res = [], ""
        for s in strs:
            sizes.append(len(s))
        for sz in sizes:
            res += str(sz)
            res += ','
        res += '#'
        for s in strs:
            res += s
        return res

    def decode(self, s: str) -> List[str]:
        if not s:
            return []
        sizes, res, i = [], [], 0
        while s[i] != '#':
            cur = ""
            while s[i] != ',':
                cur += s[i]
                i += 1
            sizes.append(int(cur))
            i += 1
        i += 1
        for sz in sizes:
            res.append(s[i:i + sz])
            i += sz
        return res

Time & Space Complexity

  • Time complexity: O(m)O(m) for each encode()encode() and decode()decode() function calls.
  • Space complexity: O(m+n)O(m + n) for each encode()encode() and decode()decode() function calls.

Where mm is the sum of lengths of all the strings and nn is the number of strings.


2. Encoding & Decoding (Optimal)

Intuition

Instead of storing all string lengths first and then appending the strings, we can directly attach each string to its length.
For every string, we write length#string.
The # character acts as a clear boundary between the length and the actual content, and using the length ensures we know exactly how many characters to read—no matter what characters appear in the string itself.
During decoding, we simply read characters until we reach # to find the length, then extract exactly that many characters as the string.
This approach is both simpler and more efficient because it avoids building separate sections for lengths and content.

Algorithm

Encoding

  1. Initialize an empty result string.
  2. For each string in the list:
    • Compute its length.
    • Append "length#string" to the result.
  3. Return the final encoded string.

Decoding

  1. Initialize an empty list for the decoded strings and a pointer i = 0.
  2. While i is within the bounds of the encoded string:
    • Move a pointer j forward until it finds '#' — this segment represents the length.
    • Convert the substring s[i:j] into an integer length.
    • Move i to the character right after '#'.
    • Extract the next length characters — this is the original string.
    • Append the extracted string to the result list.
    • Move i forward by length to continue decoding the next segment.
  3. Return the list of decoded strings.
class Solution:

    def encode(self, strs: List[str]) -> str:
        res = ""
        for s in strs:
            res += str(len(s)) + "#" + s
        return res

    def decode(self, s: str) -> List[str]:
        res = []
        i = 0

        while i < len(s):
            j = i
            while s[j] != '#':
                j += 1
            length = int(s[i:j])
            i = j + 1
            j = i + length
            res.append(s[i:j])
            i = j

        return res

Time & Space Complexity

  • Time complexity: O(m)O(m) for each encode()encode() and decode()decode() function calls.
  • Space complexity: O(m+n)O(m + n) for each encode()encode() and decode()decode() function calls.

Where mm is the sum of lengths of all the strings and nn is the number of strings.


Common Pitfalls

Using a Delimiter That Can Appear in the Strings

Choosing a simple delimiter like a comma or space will break decoding if that character appears inside the original strings. The length-prefixing approach avoids this by using the length to know exactly how many characters to read, making the content irrelevant.

Not Handling Empty Strings or Empty Lists

Edge cases like an empty input list or strings that are themselves empty ("") require careful handling. Ensure your encoding distinguishes between an empty list and a list containing one empty string, and that decoding correctly reconstructs zero-length strings.

Parsing Length Incorrectly for Multi-Digit Numbers

When the length of a string is 10 or more, the length prefix becomes multi-digit. Ensure you read all digits before the # separator rather than just assuming a single digit. Using a loop to collect characters until reaching # handles this correctly.