Sunteți pe pagina 1din 11

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No.

6 ISSN: 1837-7823

Significance of One-Class Classification in Outlier Detection


Anandkumar Prakasam1
1

Nickolas Savarimuthu2

ROOT Research Consultancy, Tiruchirappalli, Tamilnadu, India. root.anand@gmail.com 2 Associate Professor, National Institute of Technology, Tiruchirappalli, Tamilnadu, India. nickolas@nitt.edu
Abstract Outlier or Novelty detection is one of the important aspects in the current learning scenario. It helps to discover unknown knowledge from the available data. Novelty detection is also called outlier detection, since in some areas; these data are considered out-of-the ordinary and need to be eliminated. The current paper discusses various methods for detecting the novelties and compares them to obtain the best method that can be used for novelty detection. The current paper considers Generalized Extreme Studentized Deviate (GESD); an outlier detection method, SVM; a binary classifier, Nave Bayes Classifier; a multi-class classifier and a one-class classification method. Results reveal that the one-class classification method provides the best results in most of the scenarios, where the availability of training data for the outliers is minimal, and sometimes not available. Keywords One-Class Classification; GESD; SVM; Nave Bayes; Outlier Detection; Novelty Detection; Classification

1.

Introduction Multi-class classification is one of the majorly used algorithms in data mining. However, sometimes it is not

necessary to classify the data into multiple classes. If there is only one class that we are interested in, it is sufficient that the one specific class is separated from the rest of the data. This kind of data mining is called one-class classification. Usually, in one-class classification, the data instances that do not belong to the normal data or to the majority of the data are separated. For example, credit card frauds [1] or intrusion detection [2] can be regarded as anomaly detection, while detecting of previously unobserved patterns in data can be regarded as novelty detection [3] [4]. Such kind of novelty detection techniques can be used, for example, for detecting a new discussion topic in news groups. The difference between anomaly and novelty detection is that often the novelty detection method includes the discovered novelty patterns into the model [5]. One-class classification can be considered, for example, if the number of anomalies is much smaller than the number of normal data instances[21][22].

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 1.1 Anomaly Detection Techniques Anomaly detection techniques could be divided into four different categories: classification, distance-based, statistical and other techniques. Classification based anomaly detection techniques uses a model to predict if the test instance is normal or anomalous. In general, a set of training data is provided and then the system is provided with actual data for performing the classification process. If there exists multiple normal classes, then it is considered as multi-class [6] and if there is only one normal class, it is considered as one-class anomaly detection technique [5]. Both of these classifiers are trained with only the normal data instances, so they belong to semi-supervised learning methods. Typical classification based anomaly detection techniques are, for example, Bayesian networks [7], neural networks [8], rule-based techniques [7] and support vector machines [9]. Distance based techniques use the distance between points as a basic measure in detecting anomalies. Statistical techniques formulate statistical models to a given data set that can calculate probabilities to test instances and declare if they are anomalous or normal. Support vector machine is a method used in pattern recognition and classification. It is a classifier to predict or classify patterns into two categories; fraudulent or non fraudulent. It is well suited for binary classifications. As any artificial intelligence tool, it has to be trained to obtain a learned model. SVM has been used in many classification pattern recognition problems such as text categorization, and face detection. SVM is correlated to and having the basics of non-parametric applied statistics, neural networks and machine learning. [10][11][12][13]. SVM weight implements cost sensitive learning. Similar to SVM, the weighted SVM is used to maximize the margin of separation and minimize the classification error. The margin boundary is used to separate the classes. In CS-SVM different weights are assigned to the classes. Effective decision boundary is learned by adjusting the weights of the different classes. It improves the accuracy of the prediction rate. All of the anomaly detection techniques have their strengths and weaknesses and there is no single technique, which is suitable for all the situations. We will have to analyze its suitability for a particular application and aptly choose one. In the current paper, we provide a comparative study on GESD (Generalized Extreme Studentized Deviate), Nave Bayes and SVM Classifiers with one-class classifiers. The rest of this paper is organized as follows. Section 2 provides an overview of One-Class Classification technique used for the comparison study. Section 3 describes the datasets used, Section 4 describes the experimental results and Section 5 concludes the study. 2. One-Class Classification : An Overview

The traditional methods of classification have always been those that use all data classes to build models. Such models are discriminatory in nature, since they learn to discriminate between classes. However, many real world situations are such that it is only possible to have data from one class, the target class; data from other classes, the outlier classes, is either very difficult or impossible to obtain. Examples for such problems include Fraud Detection, Medicine, Machine fault detection, Wireless Sensor Networks, Intrusion detection and Object recognition such as Face Detection. These are the situations where one-class classification plays a major role in detection of anomalies.

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 Several methods have been proposed to solve the one-class classification problem. Three main approaches can be distinguished: the density estimation[16], the boundary methods and the reconstruction methods. For each of the three approaches, different concrete models can be constructed. Each of these methods differs in their ability to cope with or to exploit different characteristics of the data. In all one-class classification methods two distinct elements can be identified. The first element is a measure for the distance, d(z) or resemblance (or probability), p(z) of an object z to the target class, which is provided by the training set x . The second element is a threshold on this distance or resemblance. New objects are considered
tr

normal, when the distance to the target class is smaller than the threshold d. f(z) = I (d(z) < d ) or when the resemblance or probability is larger than the threshold p : f(z) = I ( p(z) > p ) where I is the indicator function . I is defined as:
if A is true I ( A) = {1 0. otherwise

The one-class classification methods differ in their definition of p(z) (or d(z)), in their optimization of p(z) (or d(z)) and thresholds with respect to the training set x . In most one-class classification methods the focus is on the optimization of the resemblance model p or distance d.
tr

2.1.

Characteristics of One-Class Approaches

According to D.M.J.Tax [14], the following are considered to be the characteristics of one-class approaches. Robustness to outliers: The training set is assumed to be a subset or a partition of the actual target data. Hence the training set itself is not free from outliers. The method presented must be robust such that it handles the training data effectively even with the outliers. When in a method only the resemblance or distance is optimized, it can therefore be assumed that objects near the threshold are the candidate outlier objects. For methods where the resemblance is optimized for a given threshold, a more advanced method for outliers in the training set should be applied. Incorporation of known outliers: When some outlier objects are available, they might be used to further tighten the description. To incorporate this information into the method, the model of the data and the training procedure should be flexible enough. The method should be flexible enough to add parameters in the one-class classifier. Magic parameters and ease of configuration: One of the most important aspects for easy operation of a method by the user, is the number of free parameters that have to be chosen beforehand, as well as their initial values. When a large number of free parameters is involved, finding a good working set might be very hard. When they are set 6

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 correctly, good performances can be achieved. These parameters are often called the magic parameters because they often have a big influence on the final performance and no clear rules are given how to set them. These numbers cannot be intuitively given beforehand, and only by trial and error a reasonable network size is found. Computation and storage requirements: A final consideration is the computational requirements of the methods. Although computers offer more power and storage every year, methods which require several minutes for the evaluation of a single test object might be unusable in practice. Since training is often done off-line, training costs are often not very important. However, in the case of adapting the method to a changing environment, these training costs might become very important. The most straightforward method to obtain a one-class classifier is to estimate the density of the training data and to set a threshold on this density. In our approach we have used the GESD for outlier detection, SVM and Nave Bayes classifier for multi-class classification and the one-class classification method described in [15] for analysis of outliers.

3. 3.1.

Dataset Description KEEL Dataset

The KEEL Datasets [23] available under the category of Classification was used for the analysis. The dataset includes target instances as well as outlier instances. All the datasets are binary problems and they do not contain any missing attributes. No categorical attributes are considered, and all the attributes available in the datasets are numerical. Table 1: Dataset Description Name Banana Phoneme Appendicitis Titanic Mammographic No of Instances 5300 5404 106 2201 830

4. 4.1.

Experimental Results and Discussion Results on KEEL Dataset

Analysis of the results on various datasets shows that the One-Class Classifiers detect more outliers, when compared to other classification methods. While the SVM shows an almost similar performance, its performance is slightly lower than one-class classifiers. Figure 1-5 shows the performance of GESD, Nave Bayes, SVM and One-Class Classifiers on various datasets.

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823

N o o f

O 140 u 120 t 100 l 80 i 60 e 40 r 20 s 0

GESD NaiveBayes SVM OCC Banana DataSet

Figure 1: Banana Dataset

350 N o o f O u t l i e r s 300 250 200 150 100 50 0 Phoneme DataSet GESD NaiveBayes SVM OCC

Figure 2: Phoneme Dataset

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823

60 N o o f O u t l i e r s 50 40 30 20 10 0 Appendicitis DataSet GESD NaiveBayes SVM OCC

Figure 3: Appendicitis Dataset

40 N o o f O u t l i e r s 35 30 25 20 15 10 5 0 Titanic DataSet GESD NaiveBayes SVM OCC

Figure 4: Titanic Dataset

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823

60 N o o f O u t l i e r s 50 40 30 20 10 0 Mammographic DataSet GESD NaiveBayes SVM OCC

Figure 5: Mammographic Dataset Experiments were considered by varying the number of outliers during the training phase and then observing the performance of each method. Since GESD is a statistics based outlier detection method, its performance deteriorated slightly. While considering Nave Bayes and SVM, they rely completely on the training data while classifying data, hence as the number of outliers in the training data reduces, the detection rate of SVM and Nave Bayes drops drastically. The one-class classifier relies completely on the normal data and not the anomalies, hence the drop in data detection rate is very low when compared to other methods.

Banana
140 120 100 80 60 40 20 0 100 90 80 70 60 50 40 30 20 10 GESD NaiveBayes SVM OCC

Figure 6: Banana Dataset (X-Axis: percentage of outliers in training dataset & Y-Axis: number of outliers detected)

10

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 Figures 6-10 represent the detection rates of various algorithms, when we increased the imbalance in training dataset. Analysis of the datasets show consistent performance of the one-class classifier, while deviations are observed in the other detection methods.

Phoneme
400 350 300 250 200 150 100 50 0 100 90 80 70 60 50 40 30 20 10 GESD NaiveBayes SVM OCC

Figure 7: Phoneme Dataset (X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers detected)

Appendicitis
60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 GESD NaiveBayes SVM OCC

Figure 8: Appendicitis Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers detected) 11

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823

Titanic
40 35 30 25 20 15 10 5 0 100 90 80 70 60 50 40 30 20 10 GESD NaiveBayes SVM OCC

Figure 9: Titanic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers detected)

Mammographic
60 50 40 30 20 10 0 100 90 80 70 60 50 40 30 20 10 GESD NaiveBayes SVM OCC

Figure 10: Mammographic Dataset(X-Axis: Percentage of outliers in training dataset & Y-Axis: number of outliers detected)

12

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 4.2. Discussions

From figures 6-10, where we analyses and compared the detection rates of various methods, we could infer from the slope of the line that One Class Classification approach doesnt show much difference when the imbalance in the training dataset is increased to a greater extent. SVM a popular binary classifier showed considerable performance initially but later deteriorated when higher levels of imbalance were introduced. The slope of GESD, which is completely based on statistical approach and One Class Classification, which doesnt rely on them, remained constant showing the fact that they are not affected by increasing imbalance in the training datasets. This shows that One Class Classifier could be used as a reliable technique when class imbalance is very high in the training data. 5. Conclusion

One-class classification becomes significant when in a conventional classification problem, classes in the (training) data are highly imbalanced, i.e. one of the classes is severely under-represented due to the measuring costs for that class caused by the low frequency of occurrence. It might also appear that it is completely unclear, what the representative distribution of the data is. Since most of the real time data do not contain instances for anomalies, this fails in most cases. Multi-class classification methods in general require training set containing samples for both legitimate data and the anomalies. Hence, as the number of anomalies starts decreasing, the accuracy of these methods will deteriorate. While considering the outlier detection methods, the false positive rates of these methods are high, and are not reliable when considering real-time scenarios. Hence one-class classification proves to be the best approach for usage in real-time scenarios where the occurrence of anomalies is limited and data about the anomalies could not be obtained. Feature Selection[17][18][20] can be incorporated into one-class classification scenarios to provide better and optimal results. This also helps to avoid unimportant parameters and provide increased accuracy rate. Selective sampling[19] can also be used for reducing the cost, by labeling only the important parameters. References [1] R. Brause, T. Langsdorf and M. Hepp, Neural Data Mining for Credit Card Fraud Detection, in Proceedings of the 11th IEEE International Conference on Tools with Artificial Intelligence, pages 103 - 106, Washington, DC, USA, 1999. F. A. Gonzlez and D. Dasgupta, Anomaly Detection Using Real-Valued Negative Selection, in Genetic Programming And Evolvable Machines, Volume 4, Issue 4, pages 383 - 403, Klower Academic Publishers Hingham, MA, USA, December 2003. M. Markou and S. Singh, Novelty Detection: A Review - Part 1: Statistical Approaches, Signal Processing, Volume 83, Issue 12, pages 2481 - 2497, December 2003. M. Markou and S. Singh, Novelty Detection: A Review - Part 2: Neural Network Based Approaches, Signal Processing, Volume 83, Issue 12, pages 2499 - 2521, December 2003. V. Chandola, A. Banerjee and V. Kumar, Anomaly Detection: A Survey, in ACM Computing Surveys, Volume 41, Number 3, Article 15, ACM, New York, USA, July 2009. D. Barbar, N. Wu and S. Jajodia, Detecting novel network intrusions using Bayes estimators, in Proceedings of the first SIAM Conference on Data Mining, Chicago, April 2001.

[2]

[3] [4] [5] [6]

13

International Journal of Computational Intelligence and Information Security, June 2013 Vol. 4, No. 6 ISSN: 1837-7823 [7] [8] [9] Ethem Alpaydin, Introduction to Machine Learning 2nd edition, pages 109-112 and 489 - 493, The MIT Press, Cambridge, Massachusetts, London, England, 2010. Christopher M. Bishop, Neural networks for pattern recognition, Oxford University Press, Oxford 1996. Corinna Cortes and Vladimir Vapnik, Support-vector networks, Machine Learning 20, pages 273 - 297, 1995.

[10] Piyaphol Phoungphol, Yanqing Zhang, Yichuan Zha, and Bismita Srichandan, Multiclass SVM with Ramp Loss for Imbalanced Data Classification, IEEE International Conference on Granular Computing, 2012. [11] Yuchun Tang, Bo Jin, Yi Sun, and Yan-Qing Zhang, Member, Granular Support Vector Machines for Medical Binary Classification Problems, IEEE, 2004. [12] Yuchun Tang, Member, IEEE, Yan-Qing Zhang, Member, IEEE, Nitesh V. Chawla, Member, IEEE, and Sven Krasser, Member, SVMs Modeling for Highly Imbalanced Classification, IEEE, 2009. [13] Chih-Wei Hsu, Chih-Chung Chang, and Chih-Jen Lin, A Practical Guide to Support Vector Classification, 2010. [14] David Martinus Johannes TAX, One-class classification-Concept-learning in the absence of counterexamples, ISBN: 90-75691-05-x, 2001. [15] Zineb Noumir, Paul Honeine, Cedric Richard, On Simple One-Class Classification Methods, 2012 IEEE International Symposium on Information Theory Proceedings, 978-1-4673-2579-0/12/$31.00 2012 IEEE. [16] Defeng Wang, Daniel S. Yeung, Structured One-Class Classification, IEEE Transactions On Systems, Man, And CyberneticsPart B: Cybernetics, Vol. 36, No. 6, December 2006 [17] Young-Seon Jeong, In-Ho Kang, Myong-Kee Jeong, Dongjoon Kong, A New Feature Selection Method for One-Class Classification Problems, IEEE Transactions On Systems, Man, And CyberneticsPart C: Applications And Reviews, Vol. 42, No. 6, November 2012. [18] George Gomes Cabral, Adriano Lorena Inacio de Oliveira,A Novel One-Class Classification Method Based on Feature Analysis and Prototype Reduction, 978-1-4577-0653-0/11/$26.00 2011 IEEE [19] Piotr Juszczak, Robert P.W. Duin ,Selective Sampling Methods in One-Class Classification Problems, O. Kaynak et al. (Eds.): ICANN/ICONIP 2003, LNCS 2714, pp. 140148, 2003. [20] David M.J. Tax, Klaus-R. Muller ,Feature Extraction for One-Class Classification, O. Kaynak et al. (Eds.): ICANN/ICONIP 2003, LNCS 2714, pp. 342349, 2003. [21] Kathryn Hempstalk, Eibe Frank ,Discriminating Against New Classes: One-class versus Multi-class Classification, W. Wobcke and M. Zhang (Eds.): AI 2008, LNAI 5360, pp. 325336, 2008. [22] Kenneth Kennedy, Brian Mac Namee, Sarah Jane Delany,Learning without Default: A Study of One-Class Classification and the Low-Default Portfolio Problem, L. Coyle and J. Freyne (Eds.): AICS 2009, LNAI 6206, pp. 174187, 2010. [23] Knowledge Extraction based on Evolutionary Learning Datasets: http://sci2s.ugr.es/keel/datasets.php.

14

S-ar putea să vă placă și