Design an algorithm to encode a list of strings to a string. The encoded string is then sent over the network and is decoded back to the original list of strings.
Machine 1 (sender) has the function:
string encode(vector<string> strs) {
// ... your code
return encoded_string;
}Machine 2 (receiver) has the function:
vector<string> decode(string s) {
//... your code
return strs;
}So Machine 1 does:
string encoded_string = encode(strs);and Machine 2 does:
vector<string> strs2 = decode(encoded_string);strs2 in Machine 2 should be the same as strs in Machine 1.
Implement the encode and decode methods.
Example 1:
Input: dummy_input = ["Hello","World"]
Output: ["Hello","World"]
Explanation:
Machine 1:
Codec encoder = new Codec();
String msg = encoder.encode(strs);
Machine 1 ---msg---> Machine 2
Machine 2:
Codec decoder = new Codec();
String[] strs = decoder.decode(msg);Example 2:
Input: dummy_input = [""]
Output: [""]Constraints:
0 <= strs.length < 1000 <= strs[i].length < 200strs[i] contains any possible characters out of 256 valid ASCII characters.Follow up: Could you write a generalized algorithm to work on any possible set of characters?
You should aim for a solution with O(m) time for each encode() and decode() call and O(m+n) space, where m is the sum of lengths of all the strings and n is the number of strings.
A naive solution would be to use a non-ascii character as a delimiter. Can you think of a better way?
Try to encode and decode the strings using a smart approach based on the lengths of each string. How can you differentiate between the lengths and any numbers that might be present in the strings?
We can use an encoding approach where we start with a number representing the length of the string, followed by a separator character (let's use # for simplicity), and then the string itself. To decode, we read the number until we reach a #, then use that number to read the specified number of characters as the string.
To encode a list of strings into a single string, we need a way to store each string so that we can later separate them correctly during decoding.
A simple and reliable strategy is to record the length of each string first, followed by a special separator, and then append all the strings together.
During decoding, we can read the recorded lengths to know exactly how many characters to extract for each original string.
This avoids any issues with special characters, commas, or symbols inside the strings because the lengths tell us precisely where each string starts and ends.
'#' to mark the end of the size section.'#' to extract all recorded sizes:'#', extract substrings according to the sizes list:class Solution:
def encode(self, strs: List[str]) -> str:
if not strs:
return ""
sizes, res = [], ""
for s in strs:
sizes.append(len(s))
for sz in sizes:
res += str(sz)
res += ','
res += '#'
for s in strs:
res += s
return res
def decode(self, s: str) -> List[str]:
if not s:
return []
sizes, res, i = [], [], 0
while s[i] != '#':
cur = ""
while s[i] != ',':
cur += s[i]
i += 1
sizes.append(int(cur))
i += 1
i += 1
for sz in sizes:
res.append(s[i:i + sz])
i += sz
return resWhere is the sum of lengths of all the strings and is the number of strings.
Instead of storing all string lengths first and then appending the strings, we can directly attach each string to its length.
For every string, we write length#string.
The # character acts as a clear boundary between the length and the actual content, and using the length ensures we know exactly how many characters to read—no matter what characters appear in the string itself.
During decoding, we simply read characters until we reach # to find the length, then extract exactly that many characters as the string.
This approach is both simpler and more efficient because it avoids building separate sections for lengths and content.
"length#string" to the result.i = 0.i is within the bounds of the encoded string:j forward until it finds '#' — this segment represents the length.s[i:j] into an integer length.i to the character right after '#'.length characters — this is the original string.i forward by length to continue decoding the next segment.class Solution:
def encode(self, strs: List[str]) -> str:
res = ""
for s in strs:
res += str(len(s)) + "#" + s
return res
def decode(self, s: str) -> List[str]:
res = []
i = 0
while i < len(s):
j = i
while s[j] != '#':
j += 1
length = int(s[i:j])
i = j + 1
j = i + length
res.append(s[i:j])
i = j
return resWhere is the sum of lengths of all the strings and is the number of strings.
Choosing a simple delimiter like a comma or space will break decoding if that character appears inside the original strings. The length-prefixing approach avoids this by using the length to know exactly how many characters to read, making the content irrelevant.
Edge cases like an empty input list or strings that are themselves empty ("") require careful handling. Ensure your encoding distinguishes between an empty list and a list containing one empty string, and that decoding correctly reconstructs zero-length strings.
When the length of a string is 10 or more, the length prefix becomes multi-digit. Ensure you read all digits before the # separator rather than just assuming a single digit. Using a loop to collect characters until reaching # handles this correctly.