Sunteți pe pagina 1din 4

AdaBoost Theory and Application

Ying Qin AdaBoost, short for Adaptive Boosting, is a machine learning algorithm formulated by Freund and Shapire. AdaBoost is adaptive in the sense that subsequent classifiers are built to focus on the instances misclassified by previous classifiers. In this report, I am going to summarize basic theory about boosting and AdaBoost and introduce the application of AdaBoost in music information retrieval (MIR).

1 Boosting
There is an old saying that states there is strength in numbers, which means, the result of a group can be higher than the simple sum of its parts. This is also, in some extent, true for machine learning. Boosting is such a supervised machine learning approach that builds a strong classifier from weak ones. Each weak classifier receives an input and returns a positive or negative vote, and the final strong classifier output the weighted voting where the weights depend on quality of weak classifiers. In this way, every added weak classifier contributes to or improves the outcome. The development of boosting algorithm can date back to 1988, when Kearns and Valiant first explored the potential of boosting a weak classifier (slightly better than chance) into a strong classifier. Later in 1990, Schapire showed that a learner, even if rough an moderately inaccurate, could always improve its performance by training two additional classifiers on filtered version of the input data stream. The first provable polynomial-time boosting algorithm was discussed in Schapires work The strength of weak learnability . Inspired by Shapire, Freund, in 1995, proposed a far more efficient algorithm by combining a large number of hypotheses. However, this algorithm has practical drawbacks for it assumes that each hypothesis has a fixed error rate. Finally the AdaBoost algorithm was introduced in 1997 by Freund and Schapire, which solved many of the practical difficulties of the previous boosting algorithms (de Haan 2010).

2 AdaBoost
AdaBoost needs no prior knowledge of the accuracies of the weak classifier. Rather, it iteratively applies a learning algorithm to the same training data and adds versions to final classifier. At each iteration, it generates a confidence parameter that changes according to the error of the weak hypothesis. This is the basis of its name: Ada is short for adaptive.

2.1 Understanding AdaBoost


Given a binary classification case, the training set will have both typical and rare samples, and we

usually have no idea about the importance of samples. Thus, we might just give them equal weights to initiate training. The classification error is calculated to reweigh the data for next classifier, and the aim of re-weighting is to make both the correct and error at the rate of 50%. Since we have a null hypothesis that the error rate is always smaller than , the reweighting will reduce the weights of correct samples and increase the weights of error samples. In other words, the weak classifier at first iteration (or the first classifier) is good on average training samples, and the classifier at second iteration (or the second classifier) is good on errors in the first classifier. After a number of iterations, the sample weight focuses the attention of the weak learner on the hard examples near the boundary of two classes. During the whole training process, once the weak classifier has been received, AdaBoost assigns a confidence parameter to , which is directly related to its error. In this way, we give more weight to classifier with lower error and this choice can decrease overall error. The strong classifier results as a weighted linear combination of weak classifiers, whose weights are determined by error of itself. The algorithm must terminate if 0 which is equivalent to 1/2. A step-by-step illustration of AdaBoost algorithm can be formulated as, 1. Build distribution 1, assuming all samples equally important 2. For t = 1,,T (rounds of boosting) - Select weak classifier with the lowest error from a group - Check if error larger than (YES: terminate; NO: go on) - Calculate confidence parameter, weight of sub-classifier 1 1 = ln >0 2 - Re-weight data samples to give poorly classified samples an increased weight = +1 = Where is the normalization factor 3. At the end (tth round), the final strong classifier results =

When using AdaBoost, the training data must represent reality, and the total number of samples must be relatively large compared to the number of features. Since AdaBoost is particularly suitable to work with many features, it prefers enormous databases. Besides, it is important to remember that the weak classifier must be weak enough, otherwise the resulting strong learner might overfit easily. In fact boosting seems to be especially susceptible to noise in such cases. The most popular choices for weak classifier is decision trees or decision stumps (decision trees with two leaves), if there is no a-priori knowledge available on the domain of the learning problem.

2.2 AdaBoost extensions


It is possible to extend the basic AdaBoost algorithm to obtain better performance. Two major extensions include abstention and regularization. As we have seen, in the typical AdaBoost the binary weak learner : {1, +1}, and is therefore forced to give an option for each examples . This is not always desirable as the weak learner might have not suited to classify every x . The solution of this problem is the abstention base classifier which knows when to abstain and has the form : {1, 0, +1}. Adaboost might overfit in some cases if it is run long enough. To solve this problem, the general way is to validate the number of iterations on a validation set. While, regularization introduces an edge offset parameter 0 in the confidence formula 1 1 1 1 = ln ln . 2 2 This formula shows how the confidence is decreased by a constant term every iteration, suggesting a mechanism similar to weight decay for reducing the effect of the overfitting.

2.3 Multi-class AdaBoost


The binary AdaBoost is a simple and well understood scenario, but we still need to extend it to deal with multi-class problems. AdaBoost.M1 is the most simple and straightforward way, proposed by Freund and Schapire. In this approach, the weak learner is a full multi-class algorithm itself. The AdaBoost algorithm does not need to be modified in any sense. However, this method fails if the weak learner cannot achieve at least 50% accuracy on all classes when run on hard problems. In AdaBoost.MH, proposed by Schapire and Singer, the weak learner receives a distribution of weights which is on the data and the classes , . In general, this weight will express how hard it is to classify into its correct class (if ,=1 ). Schapire and Singer also proposed AdaBoost.MO, which partitions the multi-class problem into a set of binary problems. This method can be implemented by use of error correcting output codes (ECOC) decomposition (Casagrande 2005).

3 AdaBoost in MIR
AdaBoost has been used in a number of MIR problems in recent years. Dixon et al. presented a method of genre classification with automatically rhythmic using AdaBoost (Dixon et al. 2004). Casagrande described the approach of using multi-class AdaBoost to classify audio files based on extracted features (Casagrande 2005). Bergstra et al. presented an algorithm that predicts musical genre and artist from an audio waveform, using ADABOOST to select from a set of audio features (Bergstra et al. 2006). Eck et al. proposed a method for predicting the social tags for music recommendation directly from MP3 files using Adaboost (Eck et al. 2007). Bertin-Mahieux et al. extended the work of Eck et al. by replacing the AdaBoost batch learning algorithm with the FilterBoost, an online version of AdaBoost (Bertin-Mahieux et al. 2008). Overall, AdaBoost has

been proved to be an effective machine learning algorithm for music classification.

Reference
1. De Haan, Gerard. Digital Video Post Processing. Eindhoven, 2010. 2. Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., 2006. 3. Bergstra, J., N. Casagrande, D. Erhan, D. Eck, and B. Kgl. Aggregate Features and AdaBoost for Music Classification. Machine Learning 65, no. 2 (2006): 473-84. 4. Bertin-Mahieux, T., D. Eck, F. Maillet, and P. Lamere. "Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases." Journal of New Music Research 37, no. 2 (2008): 115-35. 5. Casagrande, Norman. 2005. Automatic music classification using boosting algorithms and auditory features. Computer and operational research Department University of Montreal Montreal PhD Thesis. 6. Dixon, S., F. Gouyon, and G. Widmer. 2004. "Towards characterization of music via rhythmic patterns." In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR) 509-516. 7. Eck, D., P. Lamere, T. Bertin-Mahieux, and S. Green. "Automatic Generation of Social Tags for Music Recommendation." Advances in neural information processing systems 20, no. 20 (2007): 1-8.

S-ar putea să vă placă și