Documente Academic
Documente Profesional
Documente Cultură
MATCHING
Brute Force, Rabin-Karp, Knuth-Morris-Pratt
Whats up?
tetththeheehthtehtheththehehtht
the
tetththeheehthtehtheththehehtht
the
tetththeheehthtehtheththehehtht
the
tetththeheehthtehtheththehehtht
the
tetththeheehthtehtheththehehtht
the
tetththeheehthtehtheththehehtht
the
1) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH 5 comparisons made
2) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH 5 comparisons made
3) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH 5 comparisons made
4) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH 5 comparisons made
5) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH 5 comparisons made
....
N) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
5 comparisons made AAAAH
1) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAA 5 comparisons made
1) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
OOOOH 1 comparison made
2) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
OOOOH 1 comparison made
3) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
OOOOH 1 comparison made
4) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
OOOOH 1 comparison made
5) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
OOOOH 1 comparison made
...
N) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
1 comparison made OOOOH
Heavenly
Homemade
Hashish
1) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH
37100 1 comparison made
2) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH
37100 1 comparison made
3) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH
37100 1 comparison made
4) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH
37100 1 comparison made
...
N) AAAAAAAAAAAAAAAAAAAAAAAAAAAH
AAAAH
6 comparisons made 100=100
do
if (hash_p == hash_t)
brute force comparison of pattern
and selected section of text
hash_t = hash value of next section of
text, one character over
while (end of text or
brute force comparison == true)
do
if (hash_p == hash_t)
brute force comparison of pattern
and selected section of text
hash_t = hash value of next section of
text, one character over
while (end of text or
brute force comparison == true)
Algorithm KMPMatch(T,P)
Input: Strings T (text) with n characters and P
(pattern) with m characters.
Output: Starting index of the first substring of T
matching P, or an indication that P is not a
substring of T.
Algorithm KMPFailureFunction(P);
Input: String P (pattern) with m characters
Ouput: The faliure function f for P, which maps j to
the length of the longest prefix of P that is a suffix
of P[1,..,j]
i1
j0
while i m-1 do
if P[j] = T[j] then
{we have matched j + 1 characters}
f(i) j + 1
ii+1
jj+1
else if j > 0 then
{j indexes just after a prefix of P that matches}
j f(j-1)
else
{there is no match}
f(i) 0
ii+1
a b a c a a b a c c a b a c a b a a
1 2 3 4 5 6
a b a c a b
7
a b a c a b
8 9 10 11 12
a b a c a b
no comparison
needed here 13
a b a c a b
14 15 16 17 18 19
a b a c a b
a
b b
a b a
0 1 2 3
a
2 3
a
1 6
b a,b
b
4 5
a b a b
b a b a b
a a b a a
b b b a b
1 2 3 4 5
a b a b
b a b a b
a a b a a
b b b a b
1 2 3 4 5
a b
1 2 3 aaa bab
4 5
a b a b
search
b a b stops a b
here
a a b a a
b b b a b
1 2 3 4 5
a b insert(bbaabb)
a b a b
b a b a b
a a b a a
b b b a b b
1 2 3 4 b 5
a b
1 2 3 aaa bab
search stops here
4 5
insert(bbaabb)
a b
1 2 3 aa bab
a bb 5
Trie: 0
h o w _ n r i t
1 2 3 4 5 8 12 14
w b c _
6 7 10 13
n _
9 11
.
15
A = 010
0 1
B = 11
0 1 0 1
C = 00
D B
C D = 10
0 1
A R R = 011
A = 010
0 1
B = 11
0 1 0 1
C = 00
D B
C D = 10
0 1
A R R = 011
encoded text:
01011011010000101001011011010
text:
A B R A C A D A B R A
0 1
0 1 0 1
O R
0 1 0 1
0 1 0 1 0 1 0 1
S W T B E C K N
1000011111001001100011101111000101010011010100
0 1
0 1 0 1
C D B
0 1
A R
ABRACADABRA
01011011010000101001011011010
29 bits
0 1
0 1 0 1
A B R
0 1
C D
ABRACADABRA
001011000100001100101100
24 bits
5 2 2 2
A B R
C 1 D 1
5 4 2
A
B 2 R 2 C 1 D 1
5 6
A
4 2
2 2 1 1
B R C D
5 6
A
4 2
2 2 1 1
B R C D
11
0 1
5 6
A 0 1
4 2
0 1 0 1
2 2 1 1
B R C D
11
0 1
5 6
A 0 1
4 2
0 1 0 1
2 2 1 1
B R C D
A B R A C A D A B R A
0 100 101 0 110 0 111 0 100 101 0
23 bits
ABRACADABRA
character A B R C D
frequency 5 2 2 1 1
5 2 2 2
A B R
C 1 D 1
5 2
4
A B
2 2
R
C 1 D 1
5 2
4
A B
2 2
R
C 1 D 1
5 6
A
2 4
B
2 2
R
C 1 D 1
6
5
A
2 4
B
2 2
R
C 1 D 1
11
5 6
A
2 4
B
2 2
R
C 1 D 1
11
0 1
5 6
0 1
A
2 4
B 0 1
2 2
0 1
R
C 1 D 1
A B R A C A D A B R A
0 10 110 0 1100 0 1111 0 10 110 0
23 bits
Algorithm Huffman(X):
Input: String X of length n
Output: Encoding trie for X
0 1
0 1 0 1
000 111
0 1 0 1
010 101
0 1 0 1
011 110 001 100