Sunteți pe pagina 1din 15

SEMINAR ON

A SURVEY OF CLASSIFICATION TECHNIQUES ON BIG DATA

Agenda

Introduction Classification techniques Comparative study Conclusion References

Introduction

Classification is the techniques that maps the data into predefined classes and groups

Predict group membership for data instance

Knowledge discovery

Future plan

Predicts categorical class labels

Classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data

Introduction (Cont.)

Classification
Supervised
Unsupervised
Human-guided
classification
Predictive or
directed
Decision tree
Calculated by
software
Descriptive or
undirected
K- means

Classification Techniques

Decision
Tree
Supervised
Support
Nearest
Classification
Vector
Neighbor
Techniques
Machine
Naïve
Bayes

Decision Tree

It is a flow chart like a tree structure which classify instances by sorting its attribute values It generates the rule for the classification of the dataset

 Algorithms Iterative Dichotomer C4.5 Classification Regression Tree (CART) (ID3) Measure Entropy information gain Gini diversity index Top-down procedure Construct Binary DT Pruning through single pass algorithm Post pruning based on cost

Bayesian Network

It is a graphical model for set of various variable features

Show high accuracy and speed

Probabilistic
Learning
Probabilistic
Prediction
Incremental
Standards

Nearest Neighbor

The heuristic techniques are used to select the good ‘k’

It has some strong consistency results

Instance-based classifiers work by storing training records and using them to predict the class

label of unseen cases

Support Vector Machine (SVM)

It trains classifier to predict the class of the new sample

Key Implementation
Mathematic
al
Kernel
programmi
Function
ng

Support Vector Machine (SVM)(Cont.)

 Algorithms Linear Non-linear Non separable use
 Data is linearly separable Not suitable for C class Noisy data is available hypothesis

Comparative Study

 Techniques Advantages Disadvantages Decision tree • Simple to understand and interpret • Locally-optimal decisions are made at each node • Requires little data preparation • Do not generalize well from the training data Bayesian • Able to handle noisy data • Training time will be large Network • Well suited for continuous features • Poor interoperability • Require parameters K- nearest • Easy to understand • Computational costs are expensive neighbor • Implement classification technique • The local data is very sensitive and require large storage Support vector machine • Finds the best classification function of training data • Computationally expensive • Require large time and storage • Prevent over fitting than other methods • Poor interpretability of results

Cont.

 Algorithms Predictive Fitting Prediction Memory Easy to Handles Accuracy Speed Speed Usage Interpret Categorical Predictors Trees Low Fast Fast Low Yes Yes SVM High Medium * * * No Naïve Bayes Low ** ** ** Yes Yes Nearest *** Fast *** Medium High No Yes *** Neighbor

* SVM prediction speed and memory usage are good ** Naïve Bayes speed and memory usage are good *** Nearest neighbor usually has good prediction in low dimensions

Conclusion

Here I have discussed various classification techniques such as Decision tree, Bayesian network, Nearest neighbor and Support vector machine

Decision tree and SVM have different operational profiles where one is accurate and other is not and vice versa

References

 1. Seema Sharma, Jitendra Agrawal, Shikha Agarwal, Sanjeev Sharma, “Machine Learning Techniques for Data Mining: A Survey”, 979-1-4799-1597-2/13,2013, IEEE 2. Krisztian Balog, Heri Ramampiaro, “Cumulative Citation Recommendation: Classification vs Ranking”, ACM 978-1-4503-2034-4/13/07, 2013 3. Mohd Fauzi bin Othman, Thomas Moh Shan Yau, “Comparison of Different Classification Techniques Using WEKA for Breast Cancer”,520-523, Springer-Verlag Berlin Heidelberg 2007 4. Francesco Ricci, lior Rokach, Bracha Shapira, Paul B, Kantor, “Recommender System Handbook”,