Edit Distance
Master this topic with zero to advance depth.
Edit Distance
Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.
You have the following three operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character
Visual Representation
word1 = "horse", word2 = "ros"
Steps:
1. horse -> rorse (replace 'h' with 'r')
2. rorse -> rose (remove 'r')
3. rose -> ros (remove 'e')
Result: 3
DP Strategy:
dp[i][j] = min distance between word1[0...i] and word2[0...j]Examples
horse -> rorse (replace 'h' with 'r') -> rose (remove 'r') -> ros (remove 'e')
Level I: Dynamic Programming (2D)
Intuition
We build a 2D table where dp[i][j] is the minimum operations to convert word1[0...i] to word2[0...j].
Thought Process
- If
word1[i-1] == word2[j-1], no operation is needed for the current character:dp[i][j] = dp[i-1][j-1]. - If they are different, we take the minimum of three operations:
- Insert:
1 + dp[i][j-1] - Delete:
1 + dp[i-1][j] - Replace:
1 + dp[i-1][j-1]
- Insert:
- Base cases: Converting string to empty string requires
i(orj) deletions.
Detailed Dry Run
word1 = "abc", word2 = "abd"
| i\j | "" | 'a' | 'b' | 'd' |
|---|---|---|---|---|
| "" | 0 | 1 | 2 | 3 |
| 'a' | 1 | 0 (match) | 1 | 2 |
| 'b' | 2 | 1 | 0 (match) | 1 |
| 'c' | 3 | 2 | 1 | 1 (replace 'c' with 'd') |
Level II: Memoization (Top-Down 2D)
Intuition
The recursive brute force for edit distance has overlapping subproblems (i, j). Cache results to avoid recomputation.
Visual
word1='horse', word2='ros'
solve(0,0): 'h'!='r' -> min(solve(1,1), solve(1,0), solve(0,1))
solve(1,1): cached after first call!Detailed Dry Run
word1="ab", word2="a". solve(0,0): 'a'=='a' -> solve(1,1). solve(1,1): 'b'!='', return 1+solve(2,1)=1.
Level III: Space Optimized (1D)
Intuition
To compute dp[i][j], we only need the current row and the previous row. Specifically, dp[i-1][j-1] (diagonal), dp[i-1][j] (up), and dp[i][j-1] (left).
Logic Steps
- Initialize
currarray of sizen+1with base case values0...n. - Iterate through
word1. For eachi:- Cache
dp[i-1][j-1]in aprevvariable. - Update the
curr[0]for the base case (conversions from word1[0...i] to empty string). - Inner loop through
word2to updatecurr[j].
- Cache
- Return
curr[n].
Detailed Dry Run
word1 = "abc", word2 = "abd" Focus on space reduction where we store only 1 row. Diag (prev) cache is critical to keep the value from the 'upper-left' cell before it was overwritten.
Found an issue or have a suggestion?
Help us improve! Report bugs or suggest new features on our Telegram group.