Find Median from Data Stream
Master this topic with zero to advance depth.
Find Median from Data Stream
The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.
Implement the MedianFinder class:
MedianFinder()initializes theMedianFinderobject.void addNum(int num)adds the integernumfrom the data stream to the data structure.double findMedian()returns the median of all elements so far. Answers within10^-5of the actual answer will be accepted.
Visual Representation
Stream: [2, 3, 4]
addNum(2) -> [2] (Median = 2.0)
addNum(3) -> [2, 3] (Median = (2+3)/2 = 2.5)
addNum(4) -> [2, 3, 4] (Median = 3.0)
Two Heaps State:
Lower (Max-Heap) | Upper (Min-Heap)
[2] | []
[2] | [3]
[2, 3](pop->[2]) | [3, 4](push 3)
[2, 3] | [4]Examples
Level I: Array and Insertion Sort
Intuition
Maintain a dynamically growing array. Every time a new number is added, keep the array sorted by placing the new number in its correct position. When calculating the median, simply access the middle element(s).
Thought Process
- Store numbers in a standard list or array.
- On
addNum(n):- Find the correct position to insert
nso the collection remains sorted. - Insert
n.
- Find the correct position to insert
- On
findMedian():- If the size is odd, return
list[size / 2]. - If the size is even, return
(list[size / 2 - 1] + list[size / 2]) / 2.0.
- If the size is odd, return
Detailed Dry Run
Calls: add(1), add(2), find(), add(3), find()
- add(1): Array = [1]
- add(2): Insert 2. Array = [1, 2]
- find(): Even size 2. Return (1+2)/2.0 = 1.5
- add(3): Insert 3. Array = [1, 2, 3]
- find(): Odd size 3. Middle is index 1. Return 2.0
Level II: TreeMap of Counts
Intuition
Use a sorted map (TreeMap in Java, SortedDict in Python) to store the frequency of each number. To find the median, we iterate through the map to find the middle element(s) based on the total count.
Detailed Dry Run
Calls: add(1), add(2), find(), add(3), find()
- add(1): map={1:1}, total=1
- add(2): map={1:1, 2:1}, total=2
- find(): total=2. Median is at pos 1, 2. Summing counts: at 1, count=1(pos 0-0), at 2, count=1(pos 1-1). Avg of 1 and 2 is 1.5.
- add(3): map={1:1, 2:1, 3:1}, total=3
- find(): total=3. Median is at pos 1. Summing: pos 1 is val 2. Return 2.0
Level III: Two Heaps (Optimal)
Intuition
To find the median instantly, we only need access to the middle one or two values. We can split the stream into a lower half (stored in a Max-Heap) and an upper half (stored in a Min-Heap). The Max-Heap holds the maximum of the lower values, while the Min-Heap holds the minimum of the upper values.
Thought Process
low: A Max-Heap to store the smaller half of the numbers.high: A Min-Heap to store the larger half of the numbers.- When adding
num:- Always push to
low, then pop the largest fromlowand push it tohigh(to ensure every element inlowis smaller than elements inhigh). - Balance the heaps: If
highhas more elements thanlow, pophighand push tolow.
- Always push to
- When finding median:
- If sizes are unequal, the extra element is in
low, so return the top oflow. - If equal, return
(low.top() + high.top()) / 2.0.
- If sizes are unequal, the extra element is in
Detailed Dry Run
add(1): low pulls 1, pushes to high; high size > low, wait, step-by-step:
- Add 1: Push to low -> low=[1]. Pop low (1) -> Push to high. high=[1]. high size(1) > low size(0) -> Pop high(1) -> push to low. low=[1], high=[].
- Add 2: Push to low -> low=[2,1]. Pop low(2) -> push to high. high=[2], low=[1]. Sizes equal.
- Median: (1 + 2)/2.0 = 1.5
- Add 3: Push to low -> low=[3,1]. Pop low(3) -> push to high. high=[2,3], low=[1]. high size(2) > low size(1) -> Pop high(2) -> push to low. low=[2,1], high=[3].
- Median: low has extra. Return 2.0.
Found an issue or have a suggestion?
Help us improve! Report bugs or suggest new features on our Telegram group.