Sunteți pe pagina 1din 5

IEEE-International Conference on Recent Trends in Information Technology, ICRTIT 2011 MIT, Anna University, Chennai.

June 3-5, 2011

Classifier Ensembles using Boosting with Mixed Learner Models (BMLM)


P.Shunmugapriya#1, S. Kanmani *2, Shyam Prasath.B $3, Bathala Vikas$3, Siva Prasad Naidu.N $3 Yuvaraj.K $3
Research Scholar, Department of Computer Science, * Professor, Department of Information Technology, $ Final Year IT, Department of Information Technology, Pondicherry Engineering College, Puducherry, India
1

pshunmugapriya@gmail.com

Abstract Bagging and Boosting are most famous classifier ensemble methods which have been used in a number of Pattern Classification applications. In this paper, an alternative approach for Classifier Ensembles by using Boosting method has been proposed. Usually boosting is used to boost the performance of a single base classifier. Boosting (BMLM) used to enhance the performance of the base classifier that is trained with the split numerical and categorical features of the same dataset has been carried out in this paper. BMLM is applied to the classification of three UCI (University of California, Irvine) datasets. Diversity between the base learners has also been calculated which holds good on increasing the recognition rates of the classifiers. It is seen that, the results of BMLM have shown up to 3% increase in classification accuracy than that of base classifier and single boosted classifier. Keywords Classifier, Learner, Classifier Ensemble, Multiple Classifier System, Diversity, Feature Selection, Boosting

I. INTRODUCTION Classifier Ensemble (CE) is an important and dynamically growing research area of Machine Learning [8]. So far numerous classifier ensembles have been proposed, experimented and shown to be efficient and successful. It is inferred from [1], [3], [4], [5], [11] and [15] that all works of classifier ensembles have been carried out in the following directions: i. Newly proposed ensembles give very good classification accuracies compared to the existing ensemble methods. ii. Application domains that have not been explored using classifier ensembles are explored with the newly proposed ensemble methods. iii. New classifier ensemble methods are proposed by making modifications to the existing methods. iv. Proposing classifier combinations considering diversity and feature selection. Ensemble of three classifiers has been proposed for the

prediction of three different tasks in the KDD Cup 2009 [2]. Three different ensemble methods have been experimented in this work: maximum entropy model that converts three binary classification tasks into joint multi class classification, heterogeneous adaboost that handles numerical and categorical features separately and selective nave-bayesian model that automatically groups categorical features and discretizes numerical features [2]. All these ensemble models were performimng good and assured higher accuracy for the prediction tasks. In this paper, the concept of heterogeneous model [2] is considered and based on the heterogeneous model, an alternative ensemble method named Boosting with Mixed Learner Models (BMLM) has been proposed and implemented. Feature set is split into two subsets: numerical and categorical. Classifiers are trained separately with the feature subsets. Then boosting is used to boost the two trained classifier models instead of boosting a single learner. Then, the results of the two boosted models are combined. Diversity between the base classifiers is also measured. This paper is organized as follows: Section II describes the background and related works of classifier ensembles. Classifier Ensembles, decision tree classifier, boosting, diversity and heterogeneous model of KDD method are explained briefly in this section. Section III introduces the datasets used for the proposed study. The proposed method BMLM is explained in section IV. The experimental detail of BMLM with 3 datasets is explained in section V. Section VI concludes the study. II. BACKGROUND AND RELATED WORKS A. Classifier Ensemble In Pattern classification, when individual classifiers do not hold good for a particular application, the results of the classifiers are combined to form a Classifier Ensemble (CE) or Multiple Classifier System (MCS) for better classification. A Classifier Ensemble is a learning paradigm where several classifiers are combined to increase the generalization ability of a single classifier [3]. Ensembles are usually formed by either classifier selection method or classifier fusion method.

978-1-4577-0590-8/11/$26.00 2011 IEEE

151

IEEE-ICRTIT 2011
In classifier selection, the final prediction for any given instance is the decision given by the classifier that is closest to the feature vicinity space of the given instance. In Classifier Fusion, the final prediction for any given instance is always the combined prediction of all of the classifiers. A number of combination methods have been proposed so far to form the ensembles [6]. They are Majority Voting, Nave Bayes Combination, Dempster Shafer, Decision Template, Behavioral Knowledge space and Werneckes method. Apart from these combinations, techniques like bagging, boosting, stacking, Rotation Forest, Random Forest that enhances the performance of a classification algorithm are also available and new CE methods are proposed by making modifications to the existing methods. B. Decision Tree Classifier Decision tree induction is one among the most popular classification methods [10]. Decision trees try to find an optimal partitioning of the space of possible observations, mainly by means of subsequent recursive splits. This induction process is mostly done in a top-bottom approach. This classifier returns a crisp label that represents the predicted class of the given instance. C. Boosting Boosting is a technique used for creating an ensemble by boosting the performance of a single classifier. A randomly chosen subset of the training set is given to the base learner or classifier and the trained classifier is named C1. To form the second classifier C2, the base classifier is trained with the training set such that, half of the set are correctly classified instances by C1 and the other half are incorrectly classified instances by C1. When the base classifier is trained with the training set consisting of instances on which C1 and C2 disagree, C3 results. The ensemble is formed by Majority Voting of the three classifiers C1, C2 and C3 [3]. The main reason for success of bagging and boosting relies on the minimal fact that they build a diverse set of classifiers. In both bagging and boosting, different learning models are formed from the base learner by considering different subsets of the training data [4] and [14]. To measure the diversity between the classifiers, there are two kinds of measures: Pair wise measures and Non pair wise measures. Pairwise measures are taken by considering two classifiers at a time. Non pair wise measures are taken for any number of classifiers at the same time [5]. In this work, boosting has been used for creating the diverse classifiers and pairwise measures have been used for measuring the diversity between the classifiers. E. Heterogeneous Model of Ensemble The heterogeneous model of KDD cup is a combined model of heterogeneous base learner and Adaboost [2]. The heterogeneous base learner is a combination of numerical and categorical base learners. Decision tree is the base learner used. The Adaboost model utilizes the base learner to form the more accurate predictive model by calling the learner repeatedly on different distributions over the training instances. For each iteration, the model searches for all features and selects only one feature (numerical or categorical) to grow a (numerical or categorical) tree. During this, the numerical learner tries to identify a threshold of numerical features and the nodes in the categorical tree are branched according to a decision on a certain category is matched or not. The missing value can also be treated as a category. The numerical or categorical feature selection is done using wrapper method [2]. The work Boosting with Mixed Learner Models that has been carried out in this paper is based on the ideas of heterogeneous model of KDD cup. The concept of separate numerical and categorical learner is taken from the Heterogeneous ensemble model. The idea of considering missing feature as a separate category is also taken from the heterogeneous ensemble model [2] and has been implemented in BMLM.

III. BOOSTING WITH MIXED LEARNER MODELS (BMLM) On the basis of the heterogeneous model, Boosting with mixed model has been proposed in this paper. The base classifier that has been used is Decision tree. The features of any given dataset are split into two subsets: numerical features and categorical features as shown in Fig 1. When decision D. Diversity of Classifiers tree is applied to the two feature subsets, two learner models Diversity has been recognized as a very important called numerical tree and categorical tree are formed. characteristic in classifier combination [4]. The diversity Boosting is then applied to both classifier models. As a result, among the combination of classifiers is defined as: if one three classifier models for numerical tree and three classifier classifier has some errors, then for combination, we look for models for categorical tree will be obtained. Any instance classifiers which have errors on different objects [1]. This that is to be classified is given to both of the boosted models means, that classifiers with same performance are not of numerical and categorical decision tree. The results from preferred to be the components of an ensemble. This is these models are ensembled to give a final classification label because, when same type of classifiers are combined, there to the test instance. The methods used for ensembling are will be no significant improvement in the predictions of the Maximum (MAX), Minimum (MIN), Product (PRO), classifiers. When diverse classifiers are put into combination, Averaging (AVG) or Nave Bayes Combination. if one classifier makes error on a particular instance, it will be The difference between the heterogeneous model of KDD taken care of by other diverse classifier in the ensemble. cup and the method that has been proposed is that: In Diverse classifiers can be obtained through a number of ways heterogeneous model, when the learner is iteratively called to like bagging, boosting, re sampling the original training set etc. train on different distributions, it selects numerical or

152

Classifier Ensembles using Boosting with Mixed Learner Models (BMLM)


categorical feature to grow in the numerical or categorical tree respectively. In BMLM, boosting of both the numerical and categorical learner is done. Boosted models for both numerical features and categorical features are available. Final classification label for any given test instance is obtained by the combination of both of these models. IV. DATASETS USED IN THIS STUDY The datasets considered for this work are Wine, Heart-C, and Haberman survival. All these datasets have been taken from UCI Machine Learning Repository [9]. The features of the datasets are given in Table I.Heart-C dataset has 303 instances and 14 features including class labels. It is a 2 class dataset consisting of 6 numerical and 7 categorical features. Haberman Survival Dataset has 307 instances and 4 features including the class label. This is also a 2 class dataset full of numerical features and it is used for finding out the survival of a person after undergoing a major surgery. Wine dataset has 178 instances and has 14 attributes including class label. Wine database comes from the chemical analysis of 13 constituents in wines obtained from three cultivars. This is a multi-class dataset with 3 classes for identifying the quality of wine. None of these datasets have missing features.
TABLE I DETAILS OF UCI DATASETS USED FOR ENSEMBLE OF CLASSIFIERS

The results from both the boosted models are taken from the Majority Voting of numerical and categorical boosted models The final class label for any particular instance is obtained by combining (BMLM) the results from the two models. The methods used for combination are the statistical methods MAX, MIN, PRO or AVG or the Nave Bayesian Ensemble. Diversity between the boosted models are also measured using pair wise diversity measures. Diversity is good enough for combining the two boosted models.
BOOSTING

Feature Set

NUMERICAL FEATURES

NUMERICAL BASE LEARNER

BASE CLASSIFIER

ENSEMBLE OF LEARNERS

CATEGORI CAL FEATURES

CATEGORICAL BASE LEARNER

Final Decision

Dataset Cleaveland Heart Haberman Wine

No.of Class 2 2 3

Train ing Set 203 146 147

Test Set 100 32 33

Featu res 14 4 13

Missing Features Nil Nil Nil

BOOSTING

Fig. 1 Boosting With Mixed Learner Models (BMLM)

V. EXPERIMENTAL RESULTS AND DISCUSSION The modules of the newly proposed method BMLM is as shown in Fig 1. First the tree classifier is trained with the numerical features and categorical features separately to form two different learners. Then, boosting is used for enhancing the performance of the two classifiers. The reason for selecting boosting is that it promotes diversity actively and it has been crowned as the best off-the-shelf classifier by Leo Breiman himself, the creator of bagging [10]. After boosting has been applied, three trained models result from both the numerical and categorical tree.

The method BMLM is applied to the three UCI datasets Heart-C, Haberman and Wine. The results for individual learner as well as the boosted models are shown in Table2. The three datasets are selected such that they contain numerical and categorical features and do not have missing features at all. It can be seen from table2 that the classification accuracies obtained through BMLM method has increased compared to the individual tree classifier and the boosted models. All these classifications are implemented using the tool WEKA [12] and the classifier ensembles and BMLM are implemented in Java. Fig 2 shows the flow of execution of BMLM method. Initially the datasets to be classified are presented. When a dataset is selected, the features are split into numerical and categorical and are presented to the

TABLE II CLASSIFICATION RESULTS FOR THE THREE DATASETS USING BMLM METHOD

Classifier Ensemble Dataset Heart-Cleveland

Numerical Learner

Categorical Learner

Boosted Numerical Learner 81.34 76.33 92.35

Boosted Categorical Learner 81.21 76.12 92.11

MAX 82.8 76.22 95.9

MIN 82.8 76.22 95.9

BMLM PRO 82.87 76.1 95.42

AVG 82.81 76.71 95.42

NB 81.67 75.9 93.13

78.55 Haberman Survival Wine 76 89

82 75.5 91.74

153

IEEE-ICRTIT 2011
classifier (Decision tree). When boosting is selected, the performances of both the numerical and categorical trees are boosted. Then the classification result by BMLM is obtained by selecting one of the ensemble methods among MAX, MIN, AVG, PRO and NB. Fig. 3 shows the recognition rates (Classification Accuracies) for the three datasets Heart-C, Haberman and Wine by the Numerical Learners, Categorical Learners and the two boosted models. From the graph, it can be seen that boosting boosts up the recognition rates for Wine and Haberman dataset. For Heart dataset, the performance is boosted, but categorical learner has got the highest recognition rate than the boosted models.

Fig. 3 Graph Showing the Recognition Rates for the Three UCI Datasets by Individual Classifiers

When BMLM method is applied to the datasets, the recognition rates are increased significantly for all the three UCI datasets irrespective of the type of ensemble method opted and this is shown in Fig. 4. As per theory [4], MAX and MIN has given the same recognition rates for all the three datasets.

Fig. 2 Snapshot Showing the Execution of Classifier Ensemble

Fig. 4 Graph Showing the Recognition Rates for Three UCI BMLM method

Datasets by

154

Classifier Ensembles using Boosting with Mixed Learner Models (BMLM)


From Fig. 3 and Fig. 4, it can be visualised that, the recognition rates for all the three datasets are increased in the BMLM method compared to the individual boosted learners. The classification accuracies given by the individual learners, boosted models and the BMLM (by all ensemble methods MAX, MIN, AVG, PRO and NB) are represented in Fig. 5. From Fig. 5, the predictions and performance of the learners and BMLM can be easily compared. VI. CONCLUSION In this paper, a new classifier ensemble method BMLM has been proposed and implemented. BMLM is proposed by applying boosting to two diverse learners instead of a single classifier. The effectiveness of this method can be seen from the results obtained. Diversity plays an important role in combining the two different boosted models after boosting has been applied. Pair wise measures have been used for measuring the diversity. The proposed method BMLM has given classification accuracies increased by 3% than the individual classifiers and the individual boosted models. REFERENCES
[1] [2] L.I.Kuncheva, Combining Pattern Classifiers, Methods and Algorithms , Wiley Interscience, 2005. Hung-Yi Lo, Kai-Wei Chang, Shang-Tse Chen, Tsung-Hsien Chiang, Chun-Sung Ferng, Cho-Jui Hsieh, Yi-Kuang Ko, TsungTing Kuo, Hung-Che Lai, Ken-Yi Lin, Chia-Hsuan Wang, HsiangFu Yu, Chih-Jen Lin, Hsuan-Tien Lin, Shou-de Lin, An Ensemble of Three Classifiers for KDD Cup 2009: Expanded Linear Model, Heterogeneous Boosting, and Selective Nave Bayes, JMLR: Workshop and Conference Proceedings of KDD cup, 57-64, 2009 Polikar R., Ensemble based Systems in decision making IEEE Circuits and Systems Mag., vol. 6, no. 3, pp. 21-45, 2006. Ludmila I. Kuncheva and Christopher J. Whitaker, Measures of Diversity in Classifier Ensembles and Their Relationship with the Ensemble Accuracy, Machine Learning, 51, 181207, 2003 C.A.Shipp and L.I. Kuncheva,Relationships between combination methods and measures of diversity in combining classifiers, International Journal of Information Fusion, vol.3, no. 2, 135-148, 2002. Y. Freund, R.E. Schapire, Experiments with a new boosting algorithm, Proceeding of the Thirteenth International conference on Machine Learning, 148-156, 1996. Robi Polikar, JosephDePasquale , HusseinSyedMohammed, Gavin Brown , LudmillaI.Kuncheva , Learn++.MF: A random subspaceapproach for the missing feature problem, Pattern Recognition, 2010. Schapire.R.E The strength of weak learnability, Machine Learning, 5, 197227, 1990. A. Frank, A. Asuncion, UCI Machine Learning Repository [Online] Available : [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science (2010). Ludmila I. Kuncheva, Diversity in multiple classifier systems, Information Fusion, Vol.6, pp 34, 2005. Gavin Brown, Some thoughts at the interface of Ensemble Methods and Feature Selection, International workshop on Multiple Classifier Systems, 2010. WEKA: A Java Machine Learning Package. [Online] Available: http://www.cs.waikato.ac.nz/ml/weka/. R.O. Duda, P.E. Hart and D.G. Stork, Pattern Recognition, John Wiley & Sons Inc 2nd edition 2001. Hongbo Shi, Xiaoyong Lv, The Nave Bayesian Classifier Learning Algorithm based on Adaboost and Parameter Expectations IEEE Third International Joint Conference on Computational Science and Optimization, 377-381, 2010. Jin-Mao Wei, Shu-Qin Wang, and Xiao-Jie Yuan, Ensemble Rough Hypercuboid Approach for Classifying Cancers, IEEE Transactions on Knowledge and Data Engineering, Vol. 22, 381391, 2010.

[3] [4] [5]

[6] Fig. 5 Graph Showing the Comparison of Predictions for Three UCI Datasets by Individual, Boosted Learners and BMLM . [7]

From Fig. 5, it can be visualised that, i. For Haberman dataset, BMLM by AVG method has given the highest recognition rate compared to individual learners. ii. For Heart-C and Wine datasets, BMLM by MAX and MIN has given the highest classification accuracies compared to the individual learners and BMLM by other ensemble methods. iii. For all the three datasets, BMLM (MAX, MIN, PRO, AVG and NB) has given the highest recognition rates. From Table 2, Fig.3, Fig. 4 and Fig. 5 it can be seen that the proposed BMLM method has given higher recognition rates compared to the individual classifiers. From the classifications carried out it has been inferred that, i. Good classification results are obtained when the feature set is split into numerical and categorical features.

[8] [9] [10] [11] [12] [13] [14]

[15]

ii. When Boosting is applied to two


different classifiers and then ensemble is made, the accuracy is improved significantly.

155

S-ar putea să vă placă și