Home/dsa/Heap / Priority Queue/Find Median from Data Stream

Find Median from Data Stream

Master this topic with zero to advance depth.

Find Median from Data Stream

The median is the middle value in an ordered integer list. If the size of the list is even, there is no middle value, and the median is the mean of the two middle values.

Implement the MedianFinder class:

  • MedianFinder() initializes the MedianFinder object.
  • void addNum(int num) adds the integer num from the data stream to the data structure.
  • double findMedian() returns the median of all elements so far. Answers within 10^-5 of the actual answer will be accepted.

Visual Representation

Stream: [2, 3, 4] addNum(2) -> [2] (Median = 2.0) addNum(3) -> [2, 3] (Median = (2+3)/2 = 2.5) addNum(4) -> [2, 3, 4] (Median = 3.0) Two Heaps State: Lower (Max-Heap) | Upper (Min-Heap) [2] | [] [2] | [3] [2, 3](pop->[2]) | [3, 4](push 3) [2, 3] | [4]
Hard

Examples

Input: ["MedianFinder", "addNum", "addNum", "findMedian", "addNum", "findMedian"] [[], [1], [2], [], [3], []]
Output: [null, null, null, 1.5, null, 2.0]
Approach 1

Level I: Array and Insertion Sort

Intuition

Maintain a dynamically growing array. Every time a new number is added, keep the array sorted by placing the new number in its correct position. When calculating the median, simply access the middle element(s).

Thought Process

  1. Store numbers in a standard list or array.
  2. On addNum(n):
    • Find the correct position to insert n so the collection remains sorted.
    • Insert n.
  3. On findMedian():
    • If the size is odd, return list[size / 2].
    • If the size is even, return (list[size / 2 - 1] + list[size / 2]) / 2.0.
O(N) for `addNum` (to shift elements during insertion), O(1) for `findMedian`💾 O(N) to store the stream

Detailed Dry Run

Calls: add(1), add(2), find(), add(3), find()

  1. add(1): Array = [1]
  2. add(2): Insert 2. Array = [1, 2]
  3. find(): Even size 2. Return (1+2)/2.0 = 1.5
  4. add(3): Insert 3. Array = [1, 2, 3]
  5. find(): Odd size 3. Middle is index 1. Return 2.0
java
import java.util.*;

class MedianFinder {
    List<Integer> list;

    public MedianFinder() {
        list = new ArrayList<>();
    }
    
    public void addNum(int num) {
        // Uses Binary Search to find insertion index, then shifts elements O(N)
        int pos = Collections.binarySearch(list, num);
        if (pos < 0) pos = -(pos + 1);
        list.add(pos, num);
    }
    
    public double findMedian() {
        int n = list.size();
        if (n % 2 == 1) return list.get(n / 2);
        return (list.get(n / 2 - 1) + list.get(n / 2)) / 2.0;
    }
}

public class Main {
    public static void main(String[] args) {
        MedianFinder mf = new MedianFinder();
        mf.addNum(1);
        mf.addNum(2);
        System.out.println(mf.findMedian()); // 1.5
        mf.addNum(3);
        System.out.println(mf.findMedian()); // 2.0
    }
}
Approach 2

Level II: TreeMap of Counts

Intuition

Use a sorted map (TreeMap in Java, SortedDict in Python) to store the frequency of each number. To find the median, we iterate through the map to find the middle element(s) based on the total count.

O(log N) for `addNum`, O(N) for `findMedian` (worst case if we iterate the whole map)💾 O(K) where K is the number of unique elements

Detailed Dry Run

Calls: add(1), add(2), find(), add(3), find()

  1. add(1): map={1:1}, total=1
  2. add(2): map={1:1, 2:1}, total=2
  3. find(): total=2. Median is at pos 1, 2. Summing counts: at 1, count=1(pos 0-0), at 2, count=1(pos 1-1). Avg of 1 and 2 is 1.5.
  4. add(3): map={1:1, 2:1, 3:1}, total=3
  5. find(): total=3. Median is at pos 1. Summing: pos 1 is val 2. Return 2.0
java
import java.util.*;

class MedianFinder {
    TreeMap<Integer, Integer> counts;
    int totalCount;

    public MedianFinder() {
        counts = new TreeMap<>();
        totalCount = 0;
    }
    
    public void addNum(int num) {
        counts.put(num, counts.getOrDefault(num, 0) + 1);
        totalCount++;
    }
    
    public double findMedian() {
        int midLeft = (totalCount + 1) / 2;
        int midRight = totalCount / 2 + 1;
        int currentPos = 0;
        Integer leftVal = null, rightVal = null;

        for (Map.Entry<Integer, Integer> entry : counts.entrySet()) {
            currentPos += entry.getValue();
            if (leftVal == null && currentPos >= midLeft) leftVal = entry.getKey();
            if (rightVal == null && currentPos >= midRight) rightVal = entry.getKey();
            if (leftVal != null && rightVal != null) break;
        }
        return (leftVal + rightVal) / 2.0;
    }
}
Approach 3

Level III: Two Heaps (Optimal)

Intuition

To find the median instantly, we only need access to the middle one or two values. We can split the stream into a lower half (stored in a Max-Heap) and an upper half (stored in a Min-Heap). The Max-Heap holds the maximum of the lower values, while the Min-Heap holds the minimum of the upper values.

Thought Process

  1. low: A Max-Heap to store the smaller half of the numbers.
  2. high: A Min-Heap to store the larger half of the numbers.
  3. When adding num:
    • Always push to low, then pop the largest from low and push it to high (to ensure every element in low is smaller than elements in high).
    • Balance the heaps: If high has more elements than low, pop high and push to low.
  4. When finding median:
    • If sizes are unequal, the extra element is in low, so return the top of low.
    • If equal, return (low.top() + high.top()) / 2.0.
O(log N) for `addNum`, O(1) for `findMedian`💾 O(N) to store the elements in two heaps

Detailed Dry Run

add(1): low pulls 1, pushes to high; high size > low, wait, step-by-step:

  • Add 1: Push to low -> low=[1]. Pop low (1) -> Push to high. high=[1]. high size(1) > low size(0) -> Pop high(1) -> push to low. low=[1], high=[].
  • Add 2: Push to low -> low=[2,1]. Pop low(2) -> push to high. high=[2], low=[1]. Sizes equal.
  • Median: (1 + 2)/2.0 = 1.5
  • Add 3: Push to low -> low=[3,1]. Pop low(3) -> push to high. high=[2,3], low=[1]. high size(2) > low size(1) -> Pop high(2) -> push to low. low=[2,1], high=[3].
  • Median: low has extra. Return 2.0.
java
import java.util.*;

class MedianFinder {
    private PriorityQueue<Integer> low;
    private PriorityQueue<Integer> high;

    public MedianFinder() {
        low = new PriorityQueue<>(Collections.reverseOrder()); // Max-Heap
        high = new PriorityQueue<>(); // Min-Heap
    }
    
    public void addNum(int num) {
        low.offer(num);
        high.offer(low.poll());
        
        if (low.size() < high.size()) {
            low.offer(high.poll());
        }
    }
    
    public double findMedian() {
        if (low.size() > high.size()) return low.peek();
        return (low.peek() + high.peek()) / 2.0;
    }
}

public class Main {
    public static void main(String[] args) {
        MedianFinder mf = new MedianFinder();
        mf.addNum(1);
        mf.addNum(2);
        System.out.println(mf.findMedian()); // 1.5
        mf.addNum(3);
        System.out.println(mf.findMedian()); // 2.0
    }
}