Documente Academic
Documente Profesional
Documente Cultură
pshunmugapriya@gmail.com
Abstract Bagging and Boosting are most famous classifier ensemble methods which have been used in a number of Pattern Classification applications. In this paper, an alternative approach for Classifier Ensembles by using Boosting method has been proposed. Usually boosting is used to boost the performance of a single base classifier. Boosting (BMLM) used to enhance the performance of the base classifier that is trained with the split numerical and categorical features of the same dataset has been carried out in this paper. BMLM is applied to the classification of three UCI (University of California, Irvine) datasets. Diversity between the base learners has also been calculated which holds good on increasing the recognition rates of the classifiers. It is seen that, the results of BMLM have shown up to 3% increase in classification accuracy than that of base classifier and single boosted classifier. Keywords Classifier, Learner, Classifier Ensemble, Multiple Classifier System, Diversity, Feature Selection, Boosting
I. INTRODUCTION Classifier Ensemble (CE) is an important and dynamically growing research area of Machine Learning [8]. So far numerous classifier ensembles have been proposed, experimented and shown to be efficient and successful. It is inferred from [1], [3], [4], [5], [11] and [15] that all works of classifier ensembles have been carried out in the following directions: i. Newly proposed ensembles give very good classification accuracies compared to the existing ensemble methods. ii. Application domains that have not been explored using classifier ensembles are explored with the newly proposed ensemble methods. iii. New classifier ensemble methods are proposed by making modifications to the existing methods. iv. Proposing classifier combinations considering diversity and feature selection. Ensemble of three classifiers has been proposed for the
prediction of three different tasks in the KDD Cup 2009 [2]. Three different ensemble methods have been experimented in this work: maximum entropy model that converts three binary classification tasks into joint multi class classification, heterogeneous adaboost that handles numerical and categorical features separately and selective nave-bayesian model that automatically groups categorical features and discretizes numerical features [2]. All these ensemble models were performimng good and assured higher accuracy for the prediction tasks. In this paper, the concept of heterogeneous model [2] is considered and based on the heterogeneous model, an alternative ensemble method named Boosting with Mixed Learner Models (BMLM) has been proposed and implemented. Feature set is split into two subsets: numerical and categorical. Classifiers are trained separately with the feature subsets. Then boosting is used to boost the two trained classifier models instead of boosting a single learner. Then, the results of the two boosted models are combined. Diversity between the base classifiers is also measured. This paper is organized as follows: Section II describes the background and related works of classifier ensembles. Classifier Ensembles, decision tree classifier, boosting, diversity and heterogeneous model of KDD method are explained briefly in this section. Section III introduces the datasets used for the proposed study. The proposed method BMLM is explained in section IV. The experimental detail of BMLM with 3 datasets is explained in section V. Section VI concludes the study. II. BACKGROUND AND RELATED WORKS A. Classifier Ensemble In Pattern classification, when individual classifiers do not hold good for a particular application, the results of the classifiers are combined to form a Classifier Ensemble (CE) or Multiple Classifier System (MCS) for better classification. A Classifier Ensemble is a learning paradigm where several classifiers are combined to increase the generalization ability of a single classifier [3]. Ensembles are usually formed by either classifier selection method or classifier fusion method.
151
IEEE-ICRTIT 2011
In classifier selection, the final prediction for any given instance is the decision given by the classifier that is closest to the feature vicinity space of the given instance. In Classifier Fusion, the final prediction for any given instance is always the combined prediction of all of the classifiers. A number of combination methods have been proposed so far to form the ensembles [6]. They are Majority Voting, Nave Bayes Combination, Dempster Shafer, Decision Template, Behavioral Knowledge space and Werneckes method. Apart from these combinations, techniques like bagging, boosting, stacking, Rotation Forest, Random Forest that enhances the performance of a classification algorithm are also available and new CE methods are proposed by making modifications to the existing methods. B. Decision Tree Classifier Decision tree induction is one among the most popular classification methods [10]. Decision trees try to find an optimal partitioning of the space of possible observations, mainly by means of subsequent recursive splits. This induction process is mostly done in a top-bottom approach. This classifier returns a crisp label that represents the predicted class of the given instance. C. Boosting Boosting is a technique used for creating an ensemble by boosting the performance of a single classifier. A randomly chosen subset of the training set is given to the base learner or classifier and the trained classifier is named C1. To form the second classifier C2, the base classifier is trained with the training set such that, half of the set are correctly classified instances by C1 and the other half are incorrectly classified instances by C1. When the base classifier is trained with the training set consisting of instances on which C1 and C2 disagree, C3 results. The ensemble is formed by Majority Voting of the three classifiers C1, C2 and C3 [3]. The main reason for success of bagging and boosting relies on the minimal fact that they build a diverse set of classifiers. In both bagging and boosting, different learning models are formed from the base learner by considering different subsets of the training data [4] and [14]. To measure the diversity between the classifiers, there are two kinds of measures: Pair wise measures and Non pair wise measures. Pairwise measures are taken by considering two classifiers at a time. Non pair wise measures are taken for any number of classifiers at the same time [5]. In this work, boosting has been used for creating the diverse classifiers and pairwise measures have been used for measuring the diversity between the classifiers. E. Heterogeneous Model of Ensemble The heterogeneous model of KDD cup is a combined model of heterogeneous base learner and Adaboost [2]. The heterogeneous base learner is a combination of numerical and categorical base learners. Decision tree is the base learner used. The Adaboost model utilizes the base learner to form the more accurate predictive model by calling the learner repeatedly on different distributions over the training instances. For each iteration, the model searches for all features and selects only one feature (numerical or categorical) to grow a (numerical or categorical) tree. During this, the numerical learner tries to identify a threshold of numerical features and the nodes in the categorical tree are branched according to a decision on a certain category is matched or not. The missing value can also be treated as a category. The numerical or categorical feature selection is done using wrapper method [2]. The work Boosting with Mixed Learner Models that has been carried out in this paper is based on the ideas of heterogeneous model of KDD cup. The concept of separate numerical and categorical learner is taken from the Heterogeneous ensemble model. The idea of considering missing feature as a separate category is also taken from the heterogeneous ensemble model [2] and has been implemented in BMLM.
III. BOOSTING WITH MIXED LEARNER MODELS (BMLM) On the basis of the heterogeneous model, Boosting with mixed model has been proposed in this paper. The base classifier that has been used is Decision tree. The features of any given dataset are split into two subsets: numerical features and categorical features as shown in Fig 1. When decision D. Diversity of Classifiers tree is applied to the two feature subsets, two learner models Diversity has been recognized as a very important called numerical tree and categorical tree are formed. characteristic in classifier combination [4]. The diversity Boosting is then applied to both classifier models. As a result, among the combination of classifiers is defined as: if one three classifier models for numerical tree and three classifier classifier has some errors, then for combination, we look for models for categorical tree will be obtained. Any instance classifiers which have errors on different objects [1]. This that is to be classified is given to both of the boosted models means, that classifiers with same performance are not of numerical and categorical decision tree. The results from preferred to be the components of an ensemble. This is these models are ensembled to give a final classification label because, when same type of classifiers are combined, there to the test instance. The methods used for ensembling are will be no significant improvement in the predictions of the Maximum (MAX), Minimum (MIN), Product (PRO), classifiers. When diverse classifiers are put into combination, Averaging (AVG) or Nave Bayes Combination. if one classifier makes error on a particular instance, it will be The difference between the heterogeneous model of KDD taken care of by other diverse classifier in the ensemble. cup and the method that has been proposed is that: In Diverse classifiers can be obtained through a number of ways heterogeneous model, when the learner is iteratively called to like bagging, boosting, re sampling the original training set etc. train on different distributions, it selects numerical or
152
The results from both the boosted models are taken from the Majority Voting of numerical and categorical boosted models The final class label for any particular instance is obtained by combining (BMLM) the results from the two models. The methods used for combination are the statistical methods MAX, MIN, PRO or AVG or the Nave Bayesian Ensemble. Diversity between the boosted models are also measured using pair wise diversity measures. Diversity is good enough for combining the two boosted models.
BOOSTING
Feature Set
NUMERICAL FEATURES
BASE CLASSIFIER
ENSEMBLE OF LEARNERS
Final Decision
No.of Class 2 2 3
Featu res 14 4 13
BOOSTING
V. EXPERIMENTAL RESULTS AND DISCUSSION The modules of the newly proposed method BMLM is as shown in Fig 1. First the tree classifier is trained with the numerical features and categorical features separately to form two different learners. Then, boosting is used for enhancing the performance of the two classifiers. The reason for selecting boosting is that it promotes diversity actively and it has been crowned as the best off-the-shelf classifier by Leo Breiman himself, the creator of bagging [10]. After boosting has been applied, three trained models result from both the numerical and categorical tree.
The method BMLM is applied to the three UCI datasets Heart-C, Haberman and Wine. The results for individual learner as well as the boosted models are shown in Table2. The three datasets are selected such that they contain numerical and categorical features and do not have missing features at all. It can be seen from table2 that the classification accuracies obtained through BMLM method has increased compared to the individual tree classifier and the boosted models. All these classifications are implemented using the tool WEKA [12] and the classifier ensembles and BMLM are implemented in Java. Fig 2 shows the flow of execution of BMLM method. Initially the datasets to be classified are presented. When a dataset is selected, the features are split into numerical and categorical and are presented to the
TABLE II CLASSIFICATION RESULTS FOR THE THREE DATASETS USING BMLM METHOD
Numerical Learner
Categorical Learner
82 75.5 91.74
153
IEEE-ICRTIT 2011
classifier (Decision tree). When boosting is selected, the performances of both the numerical and categorical trees are boosted. Then the classification result by BMLM is obtained by selecting one of the ensemble methods among MAX, MIN, AVG, PRO and NB. Fig. 3 shows the recognition rates (Classification Accuracies) for the three datasets Heart-C, Haberman and Wine by the Numerical Learners, Categorical Learners and the two boosted models. From the graph, it can be seen that boosting boosts up the recognition rates for Wine and Haberman dataset. For Heart dataset, the performance is boosted, but categorical learner has got the highest recognition rate than the boosted models.
Fig. 3 Graph Showing the Recognition Rates for the Three UCI Datasets by Individual Classifiers
When BMLM method is applied to the datasets, the recognition rates are increased significantly for all the three UCI datasets irrespective of the type of ensemble method opted and this is shown in Fig. 4. As per theory [4], MAX and MIN has given the same recognition rates for all the three datasets.
Fig. 4 Graph Showing the Recognition Rates for Three UCI BMLM method
Datasets by
154
[6] Fig. 5 Graph Showing the Comparison of Predictions for Three UCI Datasets by Individual, Boosted Learners and BMLM . [7]
From Fig. 5, it can be visualised that, i. For Haberman dataset, BMLM by AVG method has given the highest recognition rate compared to individual learners. ii. For Heart-C and Wine datasets, BMLM by MAX and MIN has given the highest classification accuracies compared to the individual learners and BMLM by other ensemble methods. iii. For all the three datasets, BMLM (MAX, MIN, PRO, AVG and NB) has given the highest recognition rates. From Table 2, Fig.3, Fig. 4 and Fig. 5 it can be seen that the proposed BMLM method has given higher recognition rates compared to the individual classifiers. From the classifications carried out it has been inferred that, i. Good classification results are obtained when the feature set is split into numerical and categorical features.
[15]
155