Sunteți pe pagina 1din 16

CSCE 3110

Data Structures
& Algorithm Analysis

Rada Mihalcea
http://www.cs.unt.edu/~rada/CSCE3110
Trees Applications
Trees: A Review (again? )
General trees
one parent, N children
Binary tree
ISA General tree
+ max 2 children
Binary search tree
ISA Binary tree
+ left subtree < parent < right subtree
AVL tree
ISA Binary search tree
+ | height left subtree – height right subtree | ≤ 1
Trees: A Review (cont’d)
Multi-way search tree
ISA General tree
+ Each node has K keys and K+1 children
+ All keys in child K < key K < all keys in child K+1
2-4 Tree
ISA Multi-way search tree
+ All nodes have at most 3 keys / 4 children
+ All leaves are at the same level
B-Tree
ISA Multi-way search tree
+ All nodes have at least T keys, at most 2T(+1) keys
+ All leaves are at the same level
Tree Applications
Data Compression
Huffman tree

Automatic Learning
Decision trees
Huffman code
Very often used for text compression
Do you know how gzip or winzip works?
 Compression methods

ASCII code uses codes of equal length for all


letters  how many codes?
Today’s alternative to ASCII?

Idea behind Huffman code: use shorter length


codes for letters that are more frequent
Huffman Code
Build a list of letters and frequencies
“have a great day today”

Build a Huffman Tree bottom up, by grouping


letters with smaller occurrence frequencies
Huffman Codes
Write the Huffman codes for the strings
“abracadabra”

“Veni Vidi Vici”


Huffman Code
Running time?
Suppose N letters in input string, with L unique
letters

What is the most important factor for obtaining


highest compression?
Compare: [assume a text with a total of 1000
characters]
I. Three different characters, each occurring the same
number of times
II. 20 different characters, 19 of them occurring only once,
and the 20st occurring the rest of the time
One More Application
Heuristic Search
Decision Trees

Given a set of examples, with an associated


decision (e.g. good/bad, +/-, pass/fail,
caseI/caseII/caseIII, etc.)
Attempt to take (automatically) a decision
when a new example is presented
Predict the behavior in new cases!
Data Records
Name A B CDE FG
1. Jeffrey B. 1 0 1 0 1 0 1-
2. Paul S. 0 1 1 0 0 0 1-
3. Daniel C. 0 0 1 0 0 0 0-
4. Gregory P. 1 0 1 0 1 0 0-
5. Michael N. 0 0 1 1 0 0 0-

6. Corinne N. 1 1 1 0 1 0 1+
7. Mariyam M. 0 1 0 1 0 0 1+
8. Stephany D. 1 1 1 1 1 1 1+
9. Mary D. 1 1 1 1 1 1 1+
10. Jamie F. 1 1 1 0 0 1 1+
Fields in the Record
A: First name ends in a vowel?
B: Neat handwriting?
C: Middle name listed?
D: Senior?
E: Got extra-extra credit?
F: Google brings up home page?
G: Google brings up reference?
Build a Classification Tree
Internal nodes: features
Leaves: classification

F
0 1

A D

2,3,7 1,4,5,6 10
A
Error: 30% 8,9
Different Search Problem
Given a set of data records with their
classifications, pick a decision tree: search
problem!
Challenges:
Scoring function?
Large space of trees.

What’s a good tree?


Low error on given set of records
Small
“Perfect” Decision Tree

C middle name?
0 1

0 E EEC?
1
Google? F B Neat?
0 0
1 1

Training set Error: 0%


(can always do this?)
Search For a Classification
Classify new records

New1. Mike M. 1 0 1 1 0 0 1 ?
New2. Jerry K. 0 1 0 1 0 0 0 ?
The very last tree for
this class

S-ar putea să vă placă și