Sunteți pe pagina 1din 2

Department of Computer Science, The University of Hong Kong

2009–2010 Second semester


CSIS7103 Data Mining
Assignment 2: Model evaluation

Name: KAN Chun Kin


Email:ckkan@cs.hku.hk
University Number: 94093344

1 Leave-one-out cross-validation

a play = yes

b resubstitution error = 1/5 x 100% = 20%

c Let {.} be the set containing data marked by row number 1,2,3,4 and 5 in assignment sheet:
Model No. Training set Testing set 0R Model errortesting
1 {2,3,4,5} {1} play = yes 0/1 x 100% = 0%
2 {1,3,4,5} {2} play = yes 0/1 x 100% = 0%
3 {1,2,4,5} {3} play = yes 1/1 x 100% = 100%
4 {1,2,3,5} {4} play = yes 0/1 x 100% = 0%
5 {1,2,3,4} {5} play = yes 0/1 x 100% = 0%

error rate of model built by 0R algorithm using leave-one-out cross-validation


= (0% + 0% + 100% + 0% + 0%) / 5 = 20%

d Three possible models:


(temp ;1/5 error (humidity ;1/5 error (humidity ;1/5 error
((=mild) yes) ;1/4 error ((=high) yes) ;0/3 error ((=high) yes) ;0/3 error
((=cool) yes) ;0/1 error ((=normal) yes);1/2 error ((=normal) no);1/2 error
) ) )

e resubstitution error = 1/5 x 100% = 20%

f Model 1: Training set {2,3,4,5} Testing set {1}


Model 2: Training set {1,3,4,5} Testing set {2}
Model 4: Training set {1,2,3,5} Testing set {4}
Three possible models:
(temp ;1/4 error (humidity ;1/4 error (humidity ;1/4 error
((=mild) yes) ;1/3 error ((=high) yes) ;0/2 error ((=high) yes) ;0/2 error
((=cool) yes) ;0/1 error ((=normal) yes) ;1/2 error ((=normal) no) ;1/2 error
) ) )
errortesting = 0% errortesting = 0% errortesting = 0%

Model 3: Training set {1,2,4,5} Testing set {3}


Two possible models:
(temp ;0/4 error (humidity ;0/4 error
((=mild) yes) ;0/3 error ((=high) yes) ;0/3 error
((=cool) yes) ;0/1 error ((=normal) yes) ;0/1 error
) )
errortesting= 100% errortesting= 100%

Model 5: Training set {1,2,3,4} Testing set {5}


One possible model: (selected model is bolded)
(temp ;1/4 error (humidity ;0/4 error
((=mild) yes) ;1/4 error ((=high) yes) ;0/3 error
) ((=normal) no) ;0/1 error
)
errortesting= 100% (due to missing errortesting = 100%
data in training set for temp =
cool)
error rate of models built by 1R algorithm using leave-one-out cross-validation
= (0% + 0% + 100% + 0% + 100%) / 5 = 40%

g Since resubstitution error rates for 0R and 1R are the same but the leave-one-out cross-validation
error rate of 0R is less than that of 1R, 0R model (play = yes) is chosen as final model.

2 Bootstrapping

a Testing set 1 {1,3}


Testing set 2 {1,2}
Testing set 3 {5}
b
Training 0R Model errortraining errortesting errorcombined
set (selected model is = e-1 x errortraining +
bolded) (1 - e-1) x errortesting
1 play = yes ; error = 0/5 0/5 x 100% = 0% 1/2 x 100% = 50% 31.6%
play = no ; error = 5/5
2 play = yes ; error = 2/5 2/5 x 100 % = 40% 0/2 x 100% = 0% 14.7%
play = no ; error = 3/5
3 play = yes ; error = 1/5 1/5 x 100% = 20% 0/1 x 100% = 0% 7.4%
play = no ; error = 4/5

error rate of the model built by 0R algorithm using bootstrap


= (31.6% + 14.7% + 7.4%) / 3 = 17.9%
c
Training 1R Model errortraining errortesting errorcombined
set (selected model is bolded) = e-1 x errortraining +
(1 - e-1) x errortesting
1 (temp ;0/5 error 0/5 x 100% 1/2 x 100% 31.6%
((=mild) yes) ;0/4 error = 0% = 50%
(Two ((=cool) yes) ;0/1 error
possible )
models) (humidity ;0/5 error 0/5 x 100% 1/2 x 100% 31.6%
((=high) yes) ;0/4 error = 0% = 50%
((=normal) yes) ;0/1 error
)
2 (temp ;1/5 error 1/5 x 100 % 2/2 x 100% 70.6%
((=mild) no) ;1/3 error = 20% = 100%
(One ((=cool) yes) ;0/2 error
model) )
(humidity ;2/5 error
((=high) yes) ;0/1 error
((=normal) yes) ;2/4 error
)
(humidity ;2/5 error
((=high) yes) ;0/1 error
((=normal) no) ;2/4 error
)
3 (temp ;1/5 error
((=mild) yes) ;1/5 error
(One )
model) (humidity ;0/5 error 0/5 x 100% 1/1 x 100% 63.2%
((=high) yes) ;0/4 error = 0% = 100%
((=normal) no) ;0/1 error
)

error rate of the model built by 1R algorithm using bootstrap


= (31.6% + 70.6% + 63.2%) / 3 = 55.1%

d Assumptions: The true error rate is 50% for any prediction rule.
A scheme that memorizes the training set to give a resubstitution score of 100%
exists.
Calculation: (1 - e-1) x 50% + e-1 x 0% = 31.6%

S-ar putea să vă placă și