295. Find Median From Data Stream - Explanation

Problem Link

Description

The median is the middle value in a sorted list of integers. For lists of even length, there is no middle value, so the median is the mean of the two middle values.

For example:

  • For arr = [1,2,3], the median is 2.
  • For arr = [1,2], the median is (1 + 2) / 2 = 1.5

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far.

Example 1:

Input:
["MedianFinder", "addNum", "1", "findMedian", "addNum", "3" "findMedian", "addNum", "2", "findMedian"]

Output:
[null, null, 1.0, null, 2.0, null, 2.0]

Explanation:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1);    // arr = [1]
medianFinder.findMedian(); // return 1.0
medianFinder.addNum(3);    // arr = [1, 3]
medianFinder.findMedian(); // return 2.0
medianFinder.addNum(2);    // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0

Constraints:

  • -100,000 <= num <= 100,000
  • findMedian will only be called after adding at least one integer to the data structure.


Topics

Recommended Time & Space Complexity

You should aim for a solution with O(logn) time for addNum(), O(1) time for findMedian(), and O(n) space, where n is the current number of elements.


Hint 1

A naive solution would be to store the data stream in an array and sort it each time to find the median, resulting in O(nlogn) time for each findMedian() call. Can you think of a better way? Perhaps using a data structure that allows efficient insertion and retrieval of the median can make the solution more efficient.


Hint 2

If we divide the array into two parts, we can find the median in O(1) if the left half can efficiently return the maximum and the right half can efficiently return the minimum. These values determine the median. However, the process changes slightly if the total number of elements is odd — in that case, the median is the element from the half with the larger size. Can you think of a data structure which is suitable to implement this?


Hint 3

We can use a Heap (Max-Heap for the left half and Min-Heap for the right half). Instead of dividing the array, we store the elements in these heaps as they arrive in the data stream. But how can you maintain equal halves of elements in these two heaps? How do you implement this?


Hint 4

We initialize a Max-Heap and a Min-Heap. When adding an element, if the element is greater than the minimum element of the Min-Heap, we push it into the Min-Heap; otherwise, we push it into the Max-Heap. If the size difference between the two heaps becomes greater than one, we rebalance them by popping an element from the larger heap and pushing it into the smaller heap. This process ensures that the elements are evenly distributed between the two heaps, allowing us to retrieve the middle element or elements in O(1) time.


Company Tags

Please upgrade to NeetCode Pro to view company tags.



Prerequisites

Before attempting this problem, you should be comfortable with:

  • Heap / Priority Queue - The optimal solution uses two heaps (max-heap and min-heap) to efficiently track the median as elements stream in
  • Sorting - Understanding how sorting enables finding the median helps grasp the brute force approach and why heaps improve upon it
  • Data Stream Design - This problem requires designing a class that handles continuous data input while maintaining efficient query operations

1. Sorting

Intuition

The simplest way to find the median is to keep all numbers in a list and
sort them whenever we need the median. After sorting, the numbers are in
increasing order, making it easy to pick the "middle" value(s).

  • If we have an odd number of elements, the median is the middle element.
  • If we have an even number of elements, the median is the average
    of the two middle elements.

This approach is slow because sorting happens every time we query the median,
but it is the easiest to understand and implement.

Algorithm

  1. Initialize

    • Create an empty list data.
  2. addNum(x)

    • Append x to the list.
  3. findMedian()

    • Sort the list.
    • Let n = length of data.
    • If n is odd:
      • Return data[n // 2].
    • Else:
      • Return (data[n // 2] + data[n // 2 - 1]) / 2.
class MedianFinder:

    def __init__(self):
        self.data = []

    def addNum(self, num: int) -> None:
        self.data.append(num)

    def findMedian(self) -> float:
        self.data.sort()
        n = len(self.data)
        return (self.data[n // 2] if (n & 1) else
                (self.data[n // 2] + self.data[n // 2 - 1]) / 2)

Time & Space Complexity

  • Time complexity: O(m)O(m) for addNum()addNum(), O(mnlogn)O(m * n \log n) for findMedian()findMedian().
  • Space complexity: O(n)O(n)

Where mm is the number of function calls and nn is the length of the array.


2. Heap

Intuition

To efficiently find the median while numbers keep coming, we split the
stream into two halves:

  • A max-heap (small) that stores the smaller half of the numbers.
    • The largest number of this half is on top.
  • A min-heap (large) that stores the larger half of the numbers.
    • The smallest number of this half is on top.

The goal:

  • Ensure both heaps are balanced in size (difference at most 1).
  • Ensure all numbers in small are ≤ all numbers in large.

This setup allows:

  • Median = top of the bigger heap (if odd count)
  • Median = average of both tops (if even count)

This gives O(log n) insert and O(1) median lookup.

Algorithm

  1. Initialize

    • Create two heaps:
      • small -> max-heap for lower half
      • large -> min-heap for upper half
  2. addNum(x)

    • If large is not empty and x is greater than the smallest element in large,
      insert into large.
    • Otherwise insert into small.
    • Rebalance:
      • If one heap becomes larger than the other by more than 1,
        move the top element to the other heap.
  3. findMedian()

    • If one heap has more elements:
      • Median = top of that heap.
    • If both heaps have equal elements:
      • Median = average of both heap tops.
class MedianFinder:
    def __init__(self):
        # two heaps, large, small, minheap, maxheap
        # heaps should be equal size
        self.small, self.large = [], []

    def addNum(self, num: int) -> None:
        if self.large and num > self.large[0]:
            heapq.heappush(self.large, num)
        else:
            heapq.heappush(self.small, -1 * num)

        if len(self.small) > len(self.large) + 1:
            val = -1 * heapq.heappop(self.small)
            heapq.heappush(self.large, val)
        if len(self.large) > len(self.small) + 1:
            val = heapq.heappop(self.large)
            heapq.heappush(self.small, -1 * val)

    def findMedian(self) -> float:
        if len(self.small) > len(self.large):
            return -1 * self.small[0]
        elif len(self.large) > len(self.small):
            return self.large[0]
        return (-1 * self.small[0] + self.large[0]) / 2.0

Time & Space Complexity

  • Time complexity: O(mlogn)O(m * \log n) for addNum()addNum(), O(m)O(m) for findMedian()findMedian().
  • Space complexity: O(n)O(n)

Where mm is the number of function calls and nn is the length of the array.


Common Pitfalls

Using Wrong Heap Types

The two-heap solution requires a max-heap for the smaller half and a min-heap for the larger half. Swapping these or using two min-heaps produces incorrect medians. In languages like Python where heapq only provides min-heaps, you must negate values to simulate a max-heap for the smaller half.

Failing to Maintain Heap Balance

The heaps must differ in size by at most one element. Forgetting to rebalance after insertions leads to incorrect median calculations. After every addNum call, check if one heap has more than one extra element and transfer the top element to the other heap.

Incorrect Median Calculation for Even Count

When both heaps have equal size, the median is the average of both tops, not just one of them. Returning only the top of one heap or using integer division instead of floating-point division produces wrong results. Always check heap sizes and compute (smallTop + largeTop) / 2.0 for the even case.

Integer Overflow in Sum Calculation

When computing the average of two heap tops, adding two large integers can cause overflow in languages like Java or C++. Cast to a larger type before adding, or compute as a + (b - a) / 2.0 to avoid overflow while still getting the correct floating-point result.

Inserting Into Wrong Heap Initially

The first element must go into one of the heaps, but subsequent elements must be compared against the appropriate heap top to determine placement. Inserting all elements into one heap first and then rebalancing works, but directly inserting into the wrong heap without proper comparison breaks the invariant that all elements in the small heap are less than or equal to all elements in the large heap.