1233. Remove Sub-Folders from the Filesystem - Explanation

Problem Link

Description

You are given a list of folders folder, return the folders after removing all sub-folders in those folders. You may return the answer in any order.

If a folder[i] is located within another folder[j], it is called a sub-folder of it. A sub-folder of folder[j] must start with folder[j], followed by a "/". For example, "/a/b" is a sub-folder of "/a", but "/b" is not a sub-folder of "/a/b".

The format of a path is one or more concatenated strings of the form: '/' followed by one or more lowercase English letters.

For example, "/neetcode" and "/neetcode/practice" are valid paths while an empty string "" and "/" are not.

Example 1:

Input: folder = ["/a","/a/b","/c/d","/c/d/e","/c/f"]

Output: ["/a","/c/d","/c/f"]

Explanation: Folders "/a/b" is a subfolder of "/a" and "/c/d/e" is inside of folder "/c/d" in our filesystem.

Example 2:

Input: folder = ["/a","/a/b/c","/a/b/d"]

Output: ["/a"]

Explanation: Folders "/a/b/c" and "/a/b/d" will be removed because they are subfolders of "/a".

Example 3:

Input: folder = ["/a/b/c","/a/b/ca","/a/b/d"]

Output: ["/a/b/c","/a/b/ca","/a/b/d"]

Constraints:

  • 1 <= folder.length <= 40,000
  • 2 <= folder[i].length <= 100
  • folder[i] contains only lowercase letters and '/'.
  • folder[i] always starts with the character '/'.
  • Each folder name is unique.


Topics

Company Tags

Please upgrade to NeetCode Pro to view company tags.



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Hash Sets - Using sets for O(1) lookups to check membership efficiently
  • String Manipulation - Parsing and splitting strings, working with substrings and prefixes
  • Sorting - Understanding how lexicographic sorting groups related strings together
  • Trie Data Structure - Building and traversing prefix trees for hierarchical string matching

1. Hash Set

Intuition

A folder is a subfolder if any of its ancestor paths exist in the input. We can check this efficiently by storing all folder paths in a hash set. For each folder, we examine every prefix ending at a / character. If any such prefix exists in the set, the current folder is a subfolder and should be excluded. In code, we use i to iterate through the path and check if f[:i] exists in the folder set.

Algorithm

  1. Store all folder paths in a hash set for O(1) lookups.
  2. For each folder path:
    • Add it to the res list.
    • Scan through the path and check every prefix that ends just before a /.
    • If any prefix exists in the set, remove the folder from res and move on.
  3. Return the res list.
class Solution:
    def removeSubfolders(self, folder: List[str]) -> List[str]:
        res = []
        folder_set = set(folder)

        for f in folder:
            res.append(f)
            for i in range(len(f)):
                if f[i] == "/" and f[:i] in folder_set:
                    res.pop()
                    break

        return res

Time & Space Complexity

  • Time complexity: O(nm2)O(n * m ^ 2)
  • Space complexity: O(nm)O(n * m)

Where nn is the size of the string array folderfolder and mm is the length of each string.


2. Sorting

Intuition

When folder paths are sorted lexicographically, parent folders always appear before their subfolders. This means if we iterate through the sorted list, a folder is a subfolder only if it starts with the most recently added result folder followed by /. We only need to compare against the last folder in res, not all of them.

Algorithm

  1. Sort the folder array lexicographically.
  2. Add the first folder to res.
  3. For each subsequent folder:
    • Check if it starts with the last res folder plus /.
    • If not, add it to res.
  4. Return the res list.
class Solution:
    def removeSubfolders(self, folder: List[str]) -> List[str]:
        folder.sort()
        res = [folder[0]]

        for i in range(1, len(folder)):
            if not folder[i].startswith(res[-1] + "/"):
                res.append(folder[i])

        return res

Time & Space Complexity

  • Time complexity: O(nm2)O(n * m ^ 2)
  • Space complexity: O(nm)O(n * m)

Where nn is the size of the string array folderfolder and mm is the length of each string.


3. Trie

Intuition

A trie naturally represents hierarchical folder structures. Each node corresponds to a folder name segment, and we mark nodes that represent complete folder paths. When checking if a folder is a subfolder, we traverse the trie along its path. If we encounter a marked node before reaching the end, a parent folder exists, meaning this is a subfolder.

Algorithm

  1. Build a trie by inserting all folder paths:
    • Split each path by / and traverse or create nodes for each segment.
    • Mark the final node as end_of_folder.
  2. For each folder, search the trie:
    • Traverse the path segment by segment using an index i from 0 to len(folders) - 1.
    • If any node before the last is marked as end_of_folder, skip this folder.
  3. Add non-subfolder paths to res.
  4. Return the res list.
class Trie:
    def __init__(self):
        self.children = {}  # string -> Trie
        self.end_of_folder = False

    def add(self, path: str) -> None:
        cur = self
        for f in path.split("/"):
            if f not in cur.children:
                cur.children[f] = Trie()
            cur = cur.children[f]
        cur.end_of_folder = True

    def prefix_search(self, path: str) -> bool:
        cur = self
        folders = path.split("/")
        for i in range(len(folders) - 1):
            cur = cur.children[folders[i]]
            if cur.end_of_folder:
                return True
        return False


class Solution:
    def removeSubfolders(self, folder: List[str]) -> List[str]:
        trie = Trie()
        for f in folder:
            trie.add(f)

        res = []
        for f in folder:
            if not trie.prefix_search(f):
                res.append(f)
        return res

Time & Space Complexity

  • Time complexity: O(nm)O(n * m)
  • Space complexity: O(nm)O(n * m)

Where nn is the size of the string array folderfolder and mm is the length of each string.


Common Pitfalls

Matching Partial Folder Names as Subfolders

A folder /a/bc is not a subfolder of /a/b, even though /a/b is a prefix of /a/bc. The subfolder relationship requires matching at a / boundary. Always append / when checking prefixes or compare path segments individually to avoid false positives.

Forgetting to Handle the Root Slash

Every path starts with /, which creates an empty string when splitting by /. Failing to skip or handle this empty segment causes index errors or incorrect trie traversal. Ensure your parsing logic accounts for the leading slash.

Incorrect Sorting Assumptions

When using the sorting approach, you must compare against the last added result folder, not any arbitrary previous folder. Since sorted order ensures parents come before children, only the most recent result can be the parent of the current folder. Comparing against all previous results is both unnecessary and error-prone.