Documente Academic
Documente Profesional
Documente Cultură
For all letters i in a string, 1≤i≤n, we define OPT(i) to be the optimal segmentation for
the prefix x1x2....xi, and we let Q(i) denote the value of the total quality of OPT(i).
Before we can create our algorithm, we need to define our Q(i).
In other words, we are trying to show that
Q(i) = MAX0≤j<n{quality( y_ j+1.... yi) + Q(j) }, which is the maximum total quality
obtained by using the segmentation OPT(j) with the final block y_ j+1 y_ j+2 ].... yi
From this, we know that OPT(i) must consist of OPT(k) and the block y_k+1....yi.
Thus Q(i) = Q(k) + quality(y_k+1...yi)
As a generalization, we see that for any j, the value quality(y_ j+1....yi)+Q(j) is obtained
by using the segmentation OPT(j) together with the last block of letters (y_ j+1... yi). We
know that this value equals Q(i) for some k, and so for all other values of j, this value
can only be at most Q(i). Thus, we’ve shown that
Q(i) = MAX0≤j<n{quality( y_ j+1 .... yi) + Q(j) },
The algorithm that produces the the quality of the optimal segmentation, as well as the
optimal segmentation, is:
P[n] is the quality value, and L contains all the indices where the string is to be
segmented, which are indicated by 1.
This algorithm runs in n^2 time, since we’re only using a double for loop, each of
which runs at most n iterations. Inside the two for loops, the time for each assignment
statement is constant.
We prove correctness of this algorithm using induction. We claim that P[i] = Q(i)
for all i. We know this will suffice, since we are returning P[n], which is equal to Q(n),
which holds by definition of the optimal solution.