Sunteți pe pagina 1din 26

Spell Checking Problem

Given a string exponen that is not in the dictionary, how should


a spell checker suggest a nearby string?

What does nearness mean?

Question: Given two strings x1 x2 . . . xn and y1 y2 . . . ym what is a


distance between them?

Edit Distance: minimum number of edits to transform x into y .


Edit Distance

Edit Distance: minimum number of edits to transform x into y .

Edit operations:
delete a letter
add a letter
substitute a letter with another letter
Why is substitute not delete plus add?

In general different edits can have different costs and using


substitution as a edit allows a single operation as far as distance is
concerned
Edit Distance Problem

Input Two strings x = x1 x2 . . . xn and y = y1 y2 . . . ym over


some fixed alphabet
Goal Find edit distance between x and y : minimum
number of edits to transform x into y

Note: EditDist(x,y) = EditDist(y,x)


Recursive Solution

Letters of x are mapped to letters of y (but some are not).


Case 1 xi is mapped to yj . Then x1 x2 . . . xi1 is mapped to
y1 y2 . . . yj1
Case 2a xi is deleted, and yj is to the left of the deleted letter. Then
x1 x2 . . . xi1 is mapped to y1 y2 . . . yj .
Case 2b yj is inserted, and xi is to the left of the inserted letter. Then
x1 x2 . . . xi is mapped to y1 y2 . . . yj1 .

Subproblems involve the edit distance between prefixes of the two


strings.
Find edit distance between prefix x[1..i] of x and prefix y [1..j] of y
EditDist(x,y) is the distance between x[1..n] and y [1..m]
Recursive Solution

E [i, j]: edit distance between x[1..i] and y [1..j]

Case 1 xi mapped to yj .

E [i, j] = diff (xi , yj ) + E [i 1, j 1]

where diff (xi , yj ) = 0 if xi = yj , otherwise diff (xi , yj ) = 1


Case 2a xi is deleted and yj is left of deleted letter

E [i, j] = 1 + E [i 1, j]

Case 2b yj is inserted, and xi is left of inserted letter

E [i, j] = 1 + E [i, j 1]
Recursive Solution

E [i, j] = min{diff (xi , yj )+E [i 1, j 1], 1+E [i 1, j], 1+E [i, j 1]

Base cases:
E [i, 0] = i for i 0
E [0, j] = j for i 0
How many subproblems? O(mn)
Iterative Solution

What is table? E is a two-dimensional array of size (n + 1)(m + 1)


How do we order the computation?
To compute E [i, j] need to have computed
E [i 1, j 1], E [i 1, j], E [i, j 1].
for i = 0 to n
E [i, 0] = i
for j = 0 to m
E [0, j] = j
for i = 1 to n do
for j = 1 to m do
E [i, j] = min{diff (xi , yj ) + E [i 1, j 1], 1 + E [i 1, j], 1 + E [i, j 1]}

Running Time: O(nm)


Space: O(nm)
Can reduce space to O(n + m) if only distance is needed (but its
not obvious how to actually compute the edits).
Where is the DAG?

one node for each (i, j), 1 = 0 i n, 0 j m.


Edges for node (i, j): from (i 1, j 1) of cost diff (xi , yj ),
from (i 1, j) of cost 1, from (i, j 1) of cost 1
find shortest path from (0, 0) to (n, m)
Binary Search Trees

Given n totally ordered keys a1 < a2 < . . . < an .


Data structure to store the keys so that one can answer dictionary
queries: is a one of the keys?

Binary Search Tree:


a full binary tree T
keys stored at the leaves of the tree
leaves in left to right order give sorted order a1 , a2 , . . . , an
internal nodes store relevant information to guide the search

Given a key a, we can walk down the tree to check if a is in the


tree or not.
Balanced Binary Search Trees

General setting: keys are dynamic with insertions, deletions, etc.

Dynamic search trees: keep tree balanced so that height of tree is


O(log n). Search/insertion/deletion take O(log n) time.
Static Setting with Statistical Information

Static setting:
keys a1 , a2 , . . . , an known in advance
no insertions or deletions, only search queries
also know frequencies of search queries: pi probability of
querying ai

Problem: design a binary search tree T so as to minimize the


average search time
Xn
pi sT (ai )
i=1

where sT (ai ) is the search time for ai in T .

What is sT (ai )? depth of ai in T denoted by dT (ai )


Real Problem

Can search for any key a


Statistical information: q0 , p1 , q1 , p2 , q2 , . . . , pn , qn
pi : probability that ai is searched for
qi : probability that a number a in the range (ai , ai+1 ) is
searched for

Simpler problem ideas can be extended to the above real problem.


Optimal Binary Search Trees: Recursive Solution?

Can we solve the problem recursively?

S(i, j): optimum cost of a binary search tree for ai , ai+1 , . . . , aj


with probabilities pi , pi+1 , . . . , pj
Want S(1, n)

Recurrence for S(i, j)


j
!
X
S(i, j) = min S(i, k) + S(k + 1, j) + pk
ik<j
k=i

Base case: S(i, i) = pi for 1 i n


Iterative Algorithm

j
!
X
S(i, j) = min S(i, k) + S(k + 1, j) + pk
ik<j
k=i

Base case: S(i, i) = pi for 1 i n

How many subproblems? O(n2 )


Pj
Precomputation: P(i, j) = k=i pk in O(n2 ) time.
Iterative Algorithm

S(i, j) = min (S(i, k) + S(k + 1, j) + P(i, j))


ik<j

Base case: S(i, i) = pi for 1 i n

for i = 1 to n do
S[i, i] = P[i, i]

for d = 1 to n 1 do
for i = 1 to n d do
j =i +d
S[i, j] = minik<j (S[i, k] + S[k + 1, j] + P[i, j])

Running time: O(n3 )


Space: O(n2 )
Computing the Table: Alternative 1

for i = 1 to n do
S[i, i] = P[i, i]

for i = n downto 1 do
for j = i + 1 to n do
S[i, j] = minik<j (S[i, k] + S[k + 1, j] + P[i, j])
Computing the Table: Alternative 2

for i = 1 to n do
S[i, i] = P[i, i]

for j = 1 to n do
for i = j 1 downto 1 do
S[i, j] = minik<j (S[i, k] + S[k + 1, j] + P[i, j])
Knapsack Problem

Input n items. Each item i has a positive integer size


si and a positive integer profit pi .
a knapsack of integer capacity B.
Goal Pack a maximum profit subset of items into
knapsack.
Towards a Recursive Solution

Observation
Consider an optimal solution O
Case item n O Then O {n} is an optimum solution for items
1 to n 1 in knapsack of capacity B sn
Case item n 6 O O is an optimal solution to items 1 to n 1

Subproblems depend also on remaining capacity.

OPT (i, C ): optimum profit for items 1 to i in knapsack of size C

Goal: compute OPT (n, B)


Recursive Solution

OPT (i, C ): optimum profit for items 1 to i in knapsack of size C


pi + OPT (i 1, C si ) if si C
OPT (i, C ) = max 0 if si > C
OPT (i 1, C )

Base case: OPT (i, 0) = 0 for i = 1 to n.

How many subproblems? O(nB)


Iterative Algorithm

for i = 0 to n do
OPT [i, 0] = 0

for i = 1 to n do
for C = 1 to B do
if si C then
OPT [i, C ] = max(OPT [i 1, C ], pi + OPT [i 1, C si ])
else
OPT [i, C ] = OPT [i 1, C ]

Output OPT [n, B]

Running time: O(nB)


Space: O(nB)
Knapsack Algorithm and Polynomial time

Pn
Input size for Knapsack: O(n) + log B + i=1 (log si + log pi )

Running time of dynamic programming algorithm: O(nB)

Not a polynomial time algorithm!

Example: B = 2n and si , pi [1..2n ].


Input size is O(n), running time is O(n2n ).

Algorithm is called a pseudo-polynomial time algorithm because


running time is polynomial if numbers in input are of size
polynomial in combinatorial size of problem.

Knapsack is NP-hard if numbers are not polynomial in n!


Traveling Salesman Problem

Input A graph G = (V , E ) with non-negative edge


costs/lengths. c(e) for edge e
Goal Find a tour of minimum cost that visits each node.

No polynomial time algorithm known. Problem is NP-Hard.


An Exponential Time Algorithm

How many different tours are there? n!


Stirlings formula: n! ' n(n/e)n which is (2cn log n ) for some
constant c > 1

Can we do better? Can we get a 2O(n) time algorithm?


A More General Path Problem

Given G and nodes vi , vj find a minimum cost path from vi to vj


that visits every node exactly once.

Can solve TSP using above. Do you see how?

Let f (i, j, V ) be minimum cost path from vi to vj that visits all


nodes.
Can we express this as a recursive solution?

What is the next node in the optimum path from i to j? Suppose


it vk . Then what is f (i, j)?

f (i, j, V ) = c(vi , vk ) + f (k, j, V {i})


A Recursive Solution

f (i, j, V ) = min (c(vi , vk ) + f (k, j, V {i}))


k6=i,j

Why is f (k, j, V {i}) a subproblem?


What are the subproblems?
f (a, b, S) for a = 1, 2, . . . , n, b = 1, 2, . . . , n, S V .

How many subproblems? O(n2 2n )


Exercise: Show that one can compute TSP using above dynamic
program in O(n3 2n ) time and O(n2 2n ) space.

Disadvantage of dynamic programming solution: memory!

S-ar putea să vă placă și