Documente Academic
Documente Profesional
Documente Cultură
5.2 T RIES
‣ R-way tries
‣ ternary search tries
‣ character-based operations
Algorithms F O U R T H E D I T I O N
http://algs4.cs.princeton.edu
Summary of the performance of symbol-table implementations
typical case
ordered operations
implementation
operations on keys
search insert delete
equals()
hash table 1 † 1 † 1 † no
hashCode()
Q. Can we do better?
A. Yes, if we can avoid examining the entire key, as with string sorting.
2
String symbol table basic API
void put(String key, Value val) put key-value pair into the symbol table
hashing
L L L 4N to 16N 0.76 40.6
(linear probing)
http://algs4.cs.princeton.edu
Tries
b s t
y 4 e h h
key value
by 4 a 6 l e 0 o e 5
sea 6
value for she in node
sells 1 l l r corresponding to last
she 0 key character
shells 3 s 1 l e 7
shore 7
the 5 s 3
6
Search in a trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
b s t
y 4 e h h
a 6 l e 0 o e 5
s 1 l e 7
s 3
8
Search in a trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
no value associated
with last key character
s 3
(return null)
9
Search in a trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
no link to t
s 3
(return null)
10
Insertion into a trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
s 3
11
Trie construction demo
trie
Trie construction demo
trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
s 3
Trie representation: Java implementation
l
l each node has
an array of links
s and a value
s 1 1
Trie representation
14
R-way trie: Java implementation
⋮
15
R-way trie: Java implementation (continued)
⋮
public boolean contains(String key)
{ return get(key) != null; }
}
16
Trie performance
Search miss.
・Could have mismatch on first character.
・Typical case: examine only a few characters (sublinear).
Space. R null links at each leaf.
(but sublinear space possible if many short strings share common prefixes)
l
l each node has
an array of links
s and a value
s 1 1
Trie representation
Bottom line. Fast search hit and even faster search miss, but wastes space.
17
Deletion in an R-way trie
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
b s t
y 4 e h h
a 6 l e 0 o e 5
l l r
s 1 l e 7
hashing
L L L 4N to 16N 0.76 40.6
(linear probing)
out of
R-way trie L log R N L (R+1) N 1.12
memory
R-way trie.
・Method of choice for small R.
・Too much memory for large R.
Challenge. Use less memory, e.g., 65,536-way trie for Unicode!
20
5.2 T RIES
‣ R-way tries
‣ ternary search tries
‣ character-based operations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
Ternary search tries
link to TST for all keys link to TST for all keys
that start with that start with s
a letter before s
s
b t
a b s t a
r y 4 e h u h r y 4 h h
e u
e 12 a 14 l e 10 o r 0 e 8 e 12 14 l e 10 r 0 e 8
a o
s 15 y 13 s 15 y 13
23
Search hit in a TST
get("sea")
b t
y 4 h h
l e 0 e 5
6 a o
l l r
return value associated
with last key character
s 1 l e 7
s 3
24
Search miss in a TST
get("shelter")
b t
y 4 h h
l e 0 e 5
6 a o
l l r
s 1 l e 7
s 3 no link to t
(return null)
25
Ternary search trie construction demo
26
Ternary search trie construction demo
b t
y 4 h h
l e 0 e 5
6 a o
l l r
s 1 l e 7
s 3
27
Search in a TST
r y 4 h h
e u
e 12 l e 10 r e 8
14 a o
l l r e
return value s 11 l e 7 l
associated with
last key character
s 15 y 13
30
TST: Java implementation
31
TST: Java implementation (continued)
⋮
public boolean contains(String key)
{ return get(key) != null; }
32
String symbol table implementation cost summary
hashing
L L L 4 N to 16 N 0.76 40.6
(linear probing)
out of
R-way trie L log R N L (R + 1) N 1.12
memory
Bottom line. TST is as fast as hashing (for string keys), space efficient.
33
TST with R2 branching at root
aa ab ac zy zz
hashing
L L L 4 N to 16 N 0.76 40.6
(linear probing)
out of
R-way trie L log R N L (R + 1) N 1.12
memory
Hashing.
・Need to examine entire key.
・Search hits and misses cost about the same.
・Performance relies on hash function.
・Does not support ordered symbol table operations.
TSTs.
・Works only for strings (or digital keys).
・Only examines just enough key characters.
・Search miss may involve only a few characters.
・Supports ordered symbol table operations (plus others!).
Bottom line. TSTs are:
・Faster than hashing (especially for search misses).
・More flexible than red-black BSTs. [stay tuned]
36
5.2 T RIES
‣ R-way tries
‣ ternary search tries
‣ character-based operations
Algorithms
R OBERT S EDGEWICK | K EVIN W AYNE
http://algs4.cs.princeton.edu
String symbol table API
by 4
sea 6
sells 1
she 0
shells 3
shore 7
the 5
Prefix match. Keys with prefix sh: she, shells, and shore.
void put(String key, Value val) put key-value pair into the symbol table
⋮
Iterable<String> keys() all keys
Remark. Can also add other ordered ST methods, e.g., floor() and rank().
39
Warmup: ordered iteration
41
Prefix matches
42
Prefix matches in an R-way trie
keysWithPrefix("sh");
keysWithPrefix("sh");
key qkey q
sh sh
b b s s t t b b s s t t she she she
shel shel
y 4 ye 4 eh h h h y 4 ye 4 eh h h h shell shell
shellsshells
she shells
she she
a 6 l a 6 el0 o e 0e o5 e 5 a 6 la 6 el0 o e 0e o5 e 5 sho sho
find subtrie
find for
subtrie
all for all shor shor
keys beginning with "sh"
keys beginning l
with "sh" ll r l r l ll rl r shore shore
she shells
she she
sh
s 1 ls 1 e l7 e 7 s 1 ls 1 e l7 e 7collect keys
collect keys
s 3 s 3 s 3 s 3
in that subtrie
in that subtrie
Prefix match
Prefix in
match
a triein a trie
keysWithPrefix("sh");
public Iterable<String> keysWithPrefix(String prefix) key
key queue
q
{ sh
b s t b s t she she
Queue<String> queue = new Queue<String>();
shel
Node x =y get(root,
4 e h 0); y 4
h prefix, e h h shell
shells she shells
collect(x, aprefix, 6 l 0 o e 5
e queue); a 6 l e 0 o e 5 sho
find subtrie for
return all
queue; root of subtrie for all strings shor
keys beginning with "sh" l l r l with
beginning l given
r prefix shore she shells shore
}
s 1 l e 7 s 1 l e 7 collect keys
s 3 s 3
in that subtrie 43
Longest prefix
represented as 32-bit
"128"
binary number for IPv4
"128.112" (instead of string)
"128.112.055"
"128.112.055.15"
"128.112.136" longestPrefixOf("128.112.136.11") = "128.112.136"
longestPrefixOf("128.112.100.16") = "128.112"
"128.112.155.11"
longestPrefixOf("128.166.123.45") = "128"
"128.112.155.13"
"128.222"
"128.222.136"
44
Longest prefix in an R-way trie
"she"
"she""she" "shell"
"shell"
"shell" "shellsort"
"shellsort"
"shellsort" "s
s s s s s s
s s s
e e e h h h e e e h h h
e e e h h h
a a e e
a l2 l
2 2l 0 0e 0 a a e e
a l2 l
2 2l search
0 0e 0search ends
search
ends at at
atends
endend of string
of end
string
of string a a e e
a l2 l
value
value is is null
value
null is null 2 2l 0 0e 0
l l ll l l l l ll l l return returnshe sheshe
return
search
search ends
search
ends at at
atends (last
(last keykey
(last on
onkey path)
path)on path) l l ll l l
s s
1 1s
l 1l l endend of string
of end
string
of string s s
1 1s
l 1l l
value
value is not
isvalue
not nullnullnull
is not s s
1 1s
l 1l l
s s3 3s 3 return
return sheshe
return she s s
3 3s 3
search
search ends
search
ends at at
atends
s s
3 3s 3 null null linklink
link
null
return
return shells
shells
return shells
(last
(last keykey
(last on
onkey path)
path)
on path)
Possibilities
Possibilities forfor
for longestPrefixOf()
Possibilities
Possibilities for
longestPrefixOf()
longestPrefixOf()
longestPrefixOf()
45
Longest prefix in an R-way trie: Java implementation
46
T9 texting
Multi-tap input. Enter a letter by repeatedly pressing a key until the desired
letter appears.
"a much faster and more fun way to enter text"
T9 text input.
・Find all words that correspond to given sequence of numbers.
・Press 0 to see all completion options.
Ex. hello
・Multi-tap: 4 4 3 3 5 5 5 5 5 5 6 6 6
・T9: 4 3 5 5 6
standard no one-way
trie branching
shell
Applications. s s 1 fish 2
・Database search. h
i external
one-way
branching
s
h 2
Suffix tree.
・Patricia trie of suffixes of a string.
・Linear-time construction: beyond this course.
suffix tree for BANANAS
BANANAS A NA S
NA S S NAS
NAS S
Applications.
・Linear-time: longest repeated substring, longest common substring,
longest palindromic substring, substring search, tandem repeats, ….
・Computational biology databases (BLAST, FASTA).
49
String symbol tables summary
Red-black BST.
・Performance guarantee: log N key compares.
・Supports ordered symbol table API.
Hash tables.
・Performance guarantee: constant number of probes.
・Requires good hash function for key type.
Tries. R-way, TST.
・Performance guarantee: log N characters accessed.
・Supports character-based operations.
Bottom line. You can get at anything by examining 50-100 bits (!!!)
50