Documente Academic
Documente Profesional
Documente Cultură
Carolina Ruiz
ruiz@cs.wpi.edu http://www.cs.wpi.edu/~ruiz
Problem Definition:
Given: a dataset of instances and a target concept Find: a model (e.g. set of association rules, decision tree, neural network) that helps in predicting the classification of unseen instances. The model should be stable (i.e. shouldnt depend too much on input data used to construct it) The model should be a good predictor (difficult to achieve when input dataset is small)
Difficulties:
Two Approaches
Boosting
Bagging
Model Creation:
Create bootstrap replicates of the dataset and fit a model to each one Average/vote predictions of each model
Prediction:
Advantages
Bagging Algorithm
1. Create k bootstrap replicates of the dataset 2. Fit a model to each of the replicates 3. Average/vote the predictions of the k models
Boosting
Construct a sequence of datasets and models in such a way that a dataset in the sequence weights an instance heavily when the previous model has misclassified it.
Prediction:
Advantages:
2.1. Fit a model to current dataset 2.2. Upweight poorly predicted instances 2.3 Downweight well-predicted instances
Boosted nave Bayes tied for first place in KDD-cup 1997 Reference: