Documente Academic
Documente Profesional
Documente Cultură
Ying Qin AdaBoost, short for Adaptive Boosting, is a machine learning algorithm formulated by Freund and Shapire. AdaBoost is adaptive in the sense that subsequent classifiers are built to focus on the instances misclassified by previous classifiers. In this report, I am going to summarize basic theory about boosting and AdaBoost and introduce the application of AdaBoost in music information retrieval (MIR).
1 Boosting
There is an old saying that states there is strength in numbers, which means, the result of a group can be higher than the simple sum of its parts. This is also, in some extent, true for machine learning. Boosting is such a supervised machine learning approach that builds a strong classifier from weak ones. Each weak classifier receives an input and returns a positive or negative vote, and the final strong classifier output the weighted voting where the weights depend on quality of weak classifiers. In this way, every added weak classifier contributes to or improves the outcome. The development of boosting algorithm can date back to 1988, when Kearns and Valiant first explored the potential of boosting a weak classifier (slightly better than chance) into a strong classifier. Later in 1990, Schapire showed that a learner, even if rough an moderately inaccurate, could always improve its performance by training two additional classifiers on filtered version of the input data stream. The first provable polynomial-time boosting algorithm was discussed in Schapires work The strength of weak learnability . Inspired by Shapire, Freund, in 1995, proposed a far more efficient algorithm by combining a large number of hypotheses. However, this algorithm has practical drawbacks for it assumes that each hypothesis has a fixed error rate. Finally the AdaBoost algorithm was introduced in 1997 by Freund and Schapire, which solved many of the practical difficulties of the previous boosting algorithms (de Haan 2010).
2 AdaBoost
AdaBoost needs no prior knowledge of the accuracies of the weak classifier. Rather, it iteratively applies a learning algorithm to the same training data and adds versions to final classifier. At each iteration, it generates a confidence parameter that changes according to the error of the weak hypothesis. This is the basis of its name: Ada is short for adaptive.
usually have no idea about the importance of samples. Thus, we might just give them equal weights to initiate training. The classification error is calculated to reweigh the data for next classifier, and the aim of re-weighting is to make both the correct and error at the rate of 50%. Since we have a null hypothesis that the error rate is always smaller than , the reweighting will reduce the weights of correct samples and increase the weights of error samples. In other words, the weak classifier at first iteration (or the first classifier) is good on average training samples, and the classifier at second iteration (or the second classifier) is good on errors in the first classifier. After a number of iterations, the sample weight focuses the attention of the weak learner on the hard examples near the boundary of two classes. During the whole training process, once the weak classifier has been received, AdaBoost assigns a confidence parameter to , which is directly related to its error. In this way, we give more weight to classifier with lower error and this choice can decrease overall error. The strong classifier results as a weighted linear combination of weak classifiers, whose weights are determined by error of itself. The algorithm must terminate if 0 which is equivalent to 1/2. A step-by-step illustration of AdaBoost algorithm can be formulated as, 1. Build distribution 1, assuming all samples equally important 2. For t = 1,,T (rounds of boosting) - Select weak classifier with the lowest error from a group - Check if error larger than (YES: terminate; NO: go on) - Calculate confidence parameter, weight of sub-classifier 1 1 = ln >0 2 - Re-weight data samples to give poorly classified samples an increased weight = +1 = Where is the normalization factor 3. At the end (tth round), the final strong classifier results =
When using AdaBoost, the training data must represent reality, and the total number of samples must be relatively large compared to the number of features. Since AdaBoost is particularly suitable to work with many features, it prefers enormous databases. Besides, it is important to remember that the weak classifier must be weak enough, otherwise the resulting strong learner might overfit easily. In fact boosting seems to be especially susceptible to noise in such cases. The most popular choices for weak classifier is decision trees or decision stumps (decision trees with two leaves), if there is no a-priori knowledge available on the domain of the learning problem.
3 AdaBoost in MIR
AdaBoost has been used in a number of MIR problems in recent years. Dixon et al. presented a method of genre classification with automatically rhythmic using AdaBoost (Dixon et al. 2004). Casagrande described the approach of using multi-class AdaBoost to classify audio files based on extracted features (Casagrande 2005). Bergstra et al. presented an algorithm that predicts musical genre and artist from an audio waveform, using ADABOOST to select from a set of audio features (Bergstra et al. 2006). Eck et al. proposed a method for predicting the social tags for music recommendation directly from MP3 files using Adaboost (Eck et al. 2007). Bertin-Mahieux et al. extended the work of Eck et al. by replacing the AdaBoost batch learning algorithm with the FilterBoost, an online version of AdaBoost (Bertin-Mahieux et al. 2008). Overall, AdaBoost has
Reference
1. De Haan, Gerard. Digital Video Post Processing. Eindhoven, 2010. 2. Bishop, Christopher M. Pattern Recognition and Machine Learning. Springer-Verlag New York, Inc., 2006. 3. Bergstra, J., N. Casagrande, D. Erhan, D. Eck, and B. Kgl. Aggregate Features and AdaBoost for Music Classification. Machine Learning 65, no. 2 (2006): 473-84. 4. Bertin-Mahieux, T., D. Eck, F. Maillet, and P. Lamere. "Autotagger: A Model for Predicting Social Tags from Acoustic Features on Large Music Databases." Journal of New Music Research 37, no. 2 (2008): 115-35. 5. Casagrande, Norman. 2005. Automatic music classification using boosting algorithms and auditory features. Computer and operational research Department University of Montreal Montreal PhD Thesis. 6. Dixon, S., F. Gouyon, and G. Widmer. 2004. "Towards characterization of music via rhythmic patterns." In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR) 509-516. 7. Eck, D., P. Lamere, T. Bertin-Mahieux, and S. Green. "Automatic Generation of Social Tags for Music Recommendation." Advances in neural information processing systems 20, no. 20 (2007): 1-8.