652. Find Duplicate Subtrees - Explanation

Problem Link



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Binary Trees - Understanding tree structure, nodes, and recursive tree definitions
  • Depth-First Search (DFS) - Traversing trees using pre-order, in-order, or post-order traversal
  • Tree Serialization - Converting tree structures into string representations for comparison
  • Hash Maps - Using dictionaries to store and lookup values efficiently for duplicate detection

1. Brute Force (DFS)

Intuition

The most straightforward way to find duplicate subtrees is to compare every subtree with every other subtree. We first collect all nodes using DFS, then for each pair of nodes, we check if their subtrees are structurally identical with matching values. When we find a match, we add one representative to the result and mark both as seen to avoid duplicates.

Algorithm

  1. Traverse the tree with DFS to collect all nodes into a list.
  2. For each pair of nodes (i, j) where j > i:
    • Skip if either node has already been identified as a duplicate.
    • Recursively check if the subtrees rooted at these nodes are identical (same structure and values).
    • If they match, add one node to the result and mark both as seen.
  3. Return the list of duplicate subtree roots.
# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right
class Solution:
    def findDuplicateSubtrees(self, root: Optional[TreeNode]) -> List[Optional[TreeNode]]:
        def same(node1, node2):
            if not node1 and not node2:
                return True
            if not node1 or not node2:
                return False
            return (node1.val == node2.val and
                    same(node1.left, node2.left) and
                    same(node1.right, node2.right))

        subTree = []
        def dfs(root):
            if not root:
                return
            subTree.append(root)
            dfs(root.left)
            dfs(root.right)

        dfs(root)
        res = []
        seen = set()

        for i in range(len(subTree)):
            if subTree[i] in seen:
                continue
            for j in range(i + 1, len(subTree)):
                if subTree[j] in seen:
                    continue

                if same(subTree[i], subTree[j]):
                    if subTree[i] not in seen:
                        res.append(subTree[i])
                        seen.add(subTree[i])
                    seen.add(subTree[j])
        return res

Time & Space Complexity

  • Time complexity: O(n3)O(n ^ 3)
  • Space complexity: O(n)O(n)

2. DFS + Serialization

Intuition

Instead of comparing subtrees pairwise, we can serialize each subtree into a unique string representation. Two subtrees are identical if and only if they produce the same serialization. By storing these strings in a hash map, we can detect duplicates in a single pass through the tree. When we see the same serialization for the second time, we know we have found a duplicate.

Algorithm

  1. Perform a post-order DFS traversal.
  2. For each node, create a serialized string: "value,left_serialization,right_serialization". Use "null" for empty nodes.
  3. Store each serialization in a hash map with a list of nodes that produced it.
  4. When a serialization appears for the second time (list size becomes 2), add the current node to the result.
  5. Return the serialization string so parent nodes can build upon it.
# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right
class Solution:
    def findDuplicateSubtrees(self, root: Optional[TreeNode]) -> List[Optional[TreeNode]]:
        subtrees = defaultdict(list)
        res = []

        def dfs(node):
            if not node:
                return "null"
            s = ",".join([str(node.val), dfs(node.left), dfs(node.right)])
            if len(subtrees[s]) == 1:
                res.append(node)
            subtrees[s].append(node)
            return s

        dfs(root)
        return res

Time & Space Complexity

  • Time complexity: O(n2)O(n ^ 2)
  • Space complexity: O(n2)O(n ^ 2)

3. Depth First Search (Optimal)

Intuition

The serialization approach has quadratic space complexity because string concatenation creates long strings for large trees. We can optimize by assigning a unique integer ID to each distinct subtree structure. Instead of storing full serialization strings, we represent each subtree as a tuple of (left_id, value, right_id) and map this tuple to a unique ID. This keeps the key size constant regardless of subtree depth.

Algorithm

  1. Perform a post-order DFS traversal, returning -1 for null nodes.
  2. For each node, create a tuple: (left_child_id, node_value, right_child_id).
  3. If this tuple is new, assign it a fresh unique ID.
  4. Track how many times each ID has been seen.
  5. When an ID is seen for the second time, add the current node to the result.
  6. Return the current subtree's ID for use by parent nodes.
# Definition for a binary tree node.
# class TreeNode:
#     def __init__(self, val=0, left=None, right=None):
#         self.val = val
#         self.left = left
#         self.right = right
class Solution:
    def findDuplicateSubtrees(self, root: Optional[TreeNode]) -> List[Optional[TreeNode]]:
        id_map = {}
        count = defaultdict(int)
        res = []

        def dfs(node):
            if not node:
                return -1
            cur = (dfs(node.left), node.val, dfs(node.right))
            if cur not in id_map:
                id_map[cur] = len(id_map) + 1

            curId = id_map[cur]
            if count[curId] == 1:
                res.append(node)
            count[curId] += 1
            return curId

        dfs(root)
        return res

Time & Space Complexity

  • Time complexity: O(n)O(n)
  • Space complexity: O(n)O(n)

Common Pitfalls

Not Distinguishing Left and Right Children in Serialization

A serialization like "1,2,null" is ambiguous without structure markers. The value 2 could be a left child or right child. Always use a consistent format that distinguishes left from right, such as "value,left_serialization,right_serialization" with explicit null markers for missing children.

Adding Duplicates Multiple Times to the Result

When the same subtree structure appears three or more times, you should only add one representative to the result list. Track how many times each serialization has been seen and only add to the result on the second occurrence, not subsequent ones.

Quadratic Space from String Concatenation

Naive string serialization creates strings proportional to subtree size, leading to O(n^2) total space for all serializations. The optimized approach assigns unique integer IDs to each subtree structure, keeping the representation size constant regardless of subtree depth.