Sunteți pe pagina 1din 25

Programare procedurala

Algoritmi pentru cautare exacta


in siruri de caractere
Grigore ALBEANU
http://www.ad-astra.ro/galbeanu/

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 1
Cuprins

z Preliminarii
z Cautare naiva (Brute Force)
z Hashing (algoritmul Rabin-Karp)
z Algoritmul Knuth-Morris-Pratt
z Alte tehnici
z http://www.cs.princeton.edu/courses/arc
hive/spr03/cs226/lectures/string.pdf

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 2
Preliminarii
z Un ir este o secven finit de caractere
(c[0],c[1],,c[n-1]) unde: c[i] este un caracter, iar n
precizeaz lungimea secvenei.
z Cnd n=0 secvena corespunztoare desemneaz
irul vid.
z Se cere s se determine poziia primei apariii n irul s
a modelului (subsirului) p. n cutarea direct modelul
este deplasat paralel cu irul, cu cte o poziie pn
la gsirea lui sau pn la cnd numrul poziiilor
netestate din ir e mai mic dect lungimea modelului.
z Aplicaii practice ale irurilor de caractere: baze de
date (cmpuri de tip ir de caractere); procesoare de
texte (Word, PageMaker, etc); compilatoare,
interpretoare (textul surs al programelor sunt vzute
ca iruri de caractere !!!)
Grigore Albeanu, Programare
Versiunea 2010 procedurala, CY 3
Cautare naiva

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 4
Metoda BF exemplu (pas cu pas)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 5
BF-Exemplu (continuare)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 6
BF exemplu (continuare)

http://www.itl.nist.gov/div897/sqg/dads
/HTML/bruteForceStringSearch.html

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 7
Metoda Brute Force (codificare in C)

http://www-igm.univ-mlv.fr/~lecroq/string/node3.html
Grigore Albeanu, Programare
Versiunea 2010 procedurala, CY 8
Hashing Algoritmul Karp-Rabin
z Algoritmul Karp-Rabin este un algoritm de cutare n iruri de
caractere, creat de Michael Rabin i Richard Karp i care
folosete hashingul pentru a gsi un subir al irului de cutat.
z Pentru un text de lungime n i un ablon de lungime m,
complexitatea n timp cea mai bun i cea medie este de O(n),
dar n cazurile cele mai rele, ea este de O(mn), i de aceea
algoritmul nu este folosit pe scar larg. Totui, el prezint
avantajul c are aceeai complexitate indiferent de numrul de
abloane cutate. O utilizare practic a acestui algoritm este
detecia plagiatului. Cu ajutorul algoritmului Karp-Rabin se pot
cuta rapid mai multe propoziii din documentul surs n acelai
timp n documentul suspect. Din cauza numrului mare de iruri
care se caut, algoritmii de cutare care ofer performane la
cutarea unui singur ir sunt nepractici.
z KARP R.M., RABIN M.O., 1987, Efficient randomized pattern-
matching algorithms. IBM J. Res. Dev. 31(2):249-260.
http://www-igm.univ-mlv.fr/~lecroq/string/node5.html#SECTION0050

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 9
KR - Descriere

http://users.wpi.edu/~n_abhi/mahim/KarpRabin/analysis_of_karp_rabin_algorithm.htm

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 10
KR Exemplificare (1)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 11
KR Exemplificare (2)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 12
KR Exemplificare (3)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 13
KR Cod C

http://www.cse.iitk.ac.in/users/dsrkg/cs210/html/strings.html

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 14
KR-Bibliografie suplimentara
z AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook of
Theoretical Computer Science, Volume A, Algorithms and complexity, J. van
Leeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam.
z CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L., 1990. Introduction to Algorithms,
Chapter 34, pp 853-885, MIT Press.
z CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in
Algorithms and Theory of Computation Handbook, M.J. Atallah ed., Chapter 11, pp
11-1--11-28, CRC Press Inc., Boca Raton, FL.
z GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook of Algorithms and Data
Structures in Pascal and C, 2nd Edition, Chapter 7, pp. 251-288, Addison-Wesley
Publishing Company.
z HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche
d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France.
z CROCHEMORE, M., LECROQ, T., 1996, Pattern matching and text compression
algorithms, in CRC Computer Science and Engineering Handbook, A. Tucker ed.,
Chapter 8, pp 162-202, CRC Press Inc., Boca Raton, FL.
z SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp. 277-292, Addison-Wesley
Publishing Company.
z SEDGEWICK, R., 1988, Algorithms in C, Chapter 19, Addison-Wesley Publishing
Company.
z STEPHEN, G.A., 1994, String Searching Algorithms, World Scientific.

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 15
Algoritmul Knuth-Morris - Pratt

http://campion.edu.ro/arhiva/www/arhiva_2009/papers/paper11.pdf
Grigore Albeanu, Programare
Versiunea 2010 procedurala, CY 16
KMP preliminarii

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 17
KMP preliminarii (2)

http://turing.cs.pub.ro/sptr_08/SPTR_Lect_1.ppt
Grigore Albeanu, Programare
Versiunea 2010 procedurala, CY 18
KMP preliminarii (3)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 19
KMP preliminarii (4)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 20
KMP preliminarii (5)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 21
KMP implementare (cautare)

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 22
KMP - preprocesare

http://www-igm.univ-mlv.fr/~lecroq/string/node8.html#SECTION0080
Grigore Albeanu, Programare
Versiunea 2010 procedurala, CY 23
KMP - Bibliografie
z Adina Magda Florea, Sisteme de programe pentru timp real, Universitatea Politehnica din Bucuresti, 2007-2008.
z AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook of Theoretical Computer Science, Volume A, Algorithms and complexity, J.
van Leeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam.
z AOE, J.-I., 1994, Computer algorithms: string pattern matching strategies, IEEE Computer Society Press.
z BAASE, S., VAN GELDER, A., 1999, Computer Algorithms: Introduction to Design and Analysis, 3rd Edition, Chapter 11, pp. ??-??, Addison-Wesley
Publishing Company.
z BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B., 1999, Indexing and Searching, in Modern Information Retrieval, Chapter 8, pp 191-228,
Addison-Wesley.
z BEAUQUIER, D., BERSTEL, J., CHRTIENNE, P., 1992, lments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris.
z CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L., 1990. Introduction to Algorithms, Chapter 34, pp 853-885, MIT Press.
z CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53,
Oxford University Press.
z CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in Algorithms and Theory of Computation Handbook, M.J. Atallah ed.,
Chapter 11, pp 11-1--11-28, CRC Press Inc., Boca Raton, FL.
z CROCHEMORE, M., LECROQ, T., 1996, Pattern matching and text compression algorithms, in CRC Computer Science and Engineering Handbook,
A. Tucker ed., Chapter 8, pp 162-202, CRC Press Inc., Boca Raton, FL.
z CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press.
z GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook of Algorithms and Data Structures in Pascal and C, 2nd Edition, Chapter 7, pp. 251-288,
Addison-Wesley Publishing Company.
z GOODRICH, M.T., TAMASSIA, R., 1998, Data Structures and Algorithms in JAVA, Chapter 11, pp 441-467, John Wiley & Sons.
z GUSFIELD, D., 1997, Algorithms on strings, trees, and sequences: Computer Science and Computational Biology, Cambridge University Press.
z HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Thorie des Automates et Applications,
Actes des 2e Journes Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110.
z HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France.
z KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on
Computing 6(1):323-350.
z SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp. 277-292, Addison-Wesley Publishing Company.
z SEDGEWICK, R., 1988, Algorithms in C, Chapter 19, Addison-Wesley Publishing Company.
z SEDGEWICK, R., FLAJOLET, P., 1996, An Introduction to the Analysis of Algorithms, Chapter 7, Addison-Wesley Publishing Company.
z STEPHEN, G.A., 1994, String Searching Algorithms, World Scientific.
z WATSON, B.W., 1995, Taxonomies and Toolkits of Regular Language Algorithms, Ph. D. Thesis, Eindhoven University of Technology, The
Netherlands.
z WIRTH, N., 1986, Algorithms & Data Structures, Chapter 1, pp. 17-72, Prentice-Hall.

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 24
Alte tehnici

http://www-igm.univ-mlv.fr/~lecroq/string/node1.html

Grigore Albeanu, Programare


Versiunea 2010 procedurala, CY 25

S-ar putea să vă placă și