The median is the middle value in a sorted list of integers. For lists of even length, there is no middle value, so the median is the mean of the two middle values.
For example:
arr = [1,2,3], the median is 2.arr = [1,2], the median is (1 + 2) / 2 = 1.5Implement the MedianFinder class:
MedianFinder() initializes the MedianFinder object.void addNum(int num) adds the integer num from the data stream to the data structure.double findMedian() returns the median of all elements so far.Example 1:
Input:
["MedianFinder", "addNum", "1", "findMedian", "addNum", "3" "findMedian", "addNum", "2", "findMedian"]
Output:
[null, null, 1.0, null, 2.0, null, 2.0]
Explanation:
MedianFinder medianFinder = new MedianFinder();
medianFinder.addNum(1); // arr = [1]
medianFinder.findMedian(); // return 1.0
medianFinder.addNum(3); // arr = [1, 3]
medianFinder.findMedian(); // return 2.0
medianFinder.addNum(2); // arr[1, 2, 3]
medianFinder.findMedian(); // return 2.0Constraints:
-100,000 <= num <= 100,000findMedian will only be called after adding at least one integer to the data structure.
You should aim for a solution with O(logn) time for addNum(), O(1) time for findMedian(), and O(n) space, where n is the current number of elements.
A naive solution would be to store the data stream in an array and sort it each time to find the median, resulting in O(nlogn) time for each findMedian() call. Can you think of a better way? Perhaps using a data structure that allows efficient insertion and retrieval of the median can make the solution more efficient.
If we divide the array into two parts, we can find the median in O(1) if the left half can efficiently return the maximum and the right half can efficiently return the minimum. These values determine the median. However, the process changes slightly if the total number of elements is odd — in that case, the median is the element from the half with the larger size. Can you think of a data structure which is suitable to implement this?
We can use a Heap (Max-Heap for the left half and Min-Heap for the right half). Instead of dividing the array, we store the elements in these heaps as they arrive in the data stream. But how can you maintain equal halves of elements in these two heaps? How do you implement this?
We initialize a Max-Heap and a Min-Heap. When adding an element, if the element is greater than the minimum element of the Min-Heap, we push it into the Min-Heap; otherwise, we push it into the Max-Heap. If the size difference between the two heaps becomes greater than one, we rebalance them by popping an element from the larger heap and pushing it into the smaller heap. This process ensures that the elements are evenly distributed between the two heaps, allowing us to retrieve the middle element or elements in O(1) time.
Before attempting this problem, you should be comfortable with:
The simplest way to find the median is to keep all numbers in a list and
sort them whenever we need the median. After sorting, the numbers are in
increasing order, making it easy to pick the "middle" value(s).
This approach is slow because sorting happens every time we query the median,
but it is the easiest to understand and implement.
Initialize
data.addNum(x)
x to the list.findMedian()
n = length of data.n is odd:data[n // 2].(data[n // 2] + data[n // 2 - 1]) / 2.class MedianFinder:
def __init__(self):
self.data = []
def addNum(self, num: int) -> None:
self.data.append(num)
def findMedian(self) -> float:
self.data.sort()
n = len(self.data)
return (self.data[n // 2] if (n & 1) else
(self.data[n // 2] + self.data[n // 2 - 1]) / 2)Where is the number of function calls and is the length of the array.
To efficiently find the median while numbers keep coming, we split the
stream into two halves:
small) that stores the smaller half of the numbers.large) that stores the larger half of the numbers.The goal:
small are ≤ all numbers in large.This setup allows:
This gives O(log n) insert and O(1) median lookup.
Initialize
small -> max-heap for lower halflarge -> min-heap for upper halfaddNum(x)
large is not empty and x is greater than the smallest element in large,large.small.1,findMedian()
class MedianFinder:
def __init__(self):
# two heaps, large, small, minheap, maxheap
# heaps should be equal size
self.small, self.large = [], []
def addNum(self, num: int) -> None:
if self.large and num > self.large[0]:
heapq.heappush(self.large, num)
else:
heapq.heappush(self.small, -1 * num)
if len(self.small) > len(self.large) + 1:
val = -1 * heapq.heappop(self.small)
heapq.heappush(self.large, val)
if len(self.large) > len(self.small) + 1:
val = heapq.heappop(self.large)
heapq.heappush(self.small, -1 * val)
def findMedian(self) -> float:
if len(self.small) > len(self.large):
return -1 * self.small[0]
elif len(self.large) > len(self.small):
return self.large[0]
return (-1 * self.small[0] + self.large[0]) / 2.0Where is the number of function calls and is the length of the array.
The two-heap solution requires a max-heap for the smaller half and a min-heap for the larger half. Swapping these or using two min-heaps produces incorrect medians. In languages like Python where heapq only provides min-heaps, you must negate values to simulate a max-heap for the smaller half.
The heaps must differ in size by at most one element. Forgetting to rebalance after insertions leads to incorrect median calculations. After every addNum call, check if one heap has more than one extra element and transfer the top element to the other heap.
When both heaps have equal size, the median is the average of both tops, not just one of them. Returning only the top of one heap or using integer division instead of floating-point division produces wrong results. Always check heap sizes and compute (smallTop + largeTop) / 2.0 for the even case.
When computing the average of two heap tops, adding two large integers can cause overflow in languages like Java or C++. Cast to a larger type before adding, or compute as a + (b - a) / 2.0 to avoid overflow while still getting the correct floating-point result.
The first element must go into one of the heaps, but subsequent elements must be compared against the appropriate heap top to determine placement. Inserting all elements into one heap first and then rebalancing works, but directly inserting into the wrong heap without proper comparison breaks the invariant that all elements in the small heap are less than or equal to all elements in the large heap.