Sunteți pe pagina 1din 15

SEMI NAR ON

A SURVEY OF CLASSIFICATION
TECHNIQUES ON BIG DATA
Agenda
Introduction
Classification techniques
Comparative study
Conclusion
References
15-Oct-14 2
Introduction
Classification is the techniques that maps the data into predefined classes and groups
Predict group membership for data instance
Knowledge discovery
Future plan
Predicts categorical class labels
Classifies data (constructs a model) based on the training set and the values (class labels) in a
classifying attribute and uses it in classifying new data
15-Oct-14 3
Introduction (Cont.)
Classification
Supervised
Human-guided
classification
Predictive or
directed
Decision tree
Unsupervised
Calculated by
software
Descriptive or
undirected
K- means
15-Oct-14 4
Classification Techniques
Supervised
Classification
Techniques
Decision
Tree
Support
Vector
Machine
Nave
Bayes
Nearest
Neighbor
15-Oct-14 5
Decision Tree
It is a flow chart like a tree structure which classify instances by sorting its attribute values
It generates the rule for the classification of the dataset
Algorithms
Iterative Dichotomer
(ID3)
C4.5 Classification Regression
Tree (CART)
Measure Entropy information gain Gini diversity index
Top-down procedure Construct Binary DT
Pruning through single pass algorithm Post pruning based on cost
15-Oct-14 6
Bayesian Network
It is a graphical model for set of various variable features
Show high accuracy and speed
Probabilistic
Learning
Incremental
Probabilistic
Prediction
Standards
15-Oct-14 7
Nearest Neighbor
The heuristic techniques are used to select the good k
It has some strong consistency results
Instance-based classifiers work by storing training records and using them to predict the class
label of unseen cases
15-Oct-14 8
Support Vector Machine (SVM)
It trains classifier to predict the class of the new sample
Key Implementation
Mathematic
al
programmi
ng
Kernel
Function
15-Oct-14 9
Support Vector Machine (SVM)(Cont.)
Algorithms
Linear Non-linear Non separable use
Data is linearly separable Not suitable for C class
hypothesis
Noisy data is available
15-Oct-14 10
Comparative Study
Techniques Advantages Disadvantages
Decision tree Simple to understand and interpret
Requires little data preparation
Locally-optimal decisions are made at
each node
Do not generalize well from the
training data
Bayesian
Network
Able to handle noisy data
Well suited for continuous features
Training time will be large
Poor interoperability
Require parameters
K- nearest
neighbor
Easy to understand
Implement classification technique
Computational costs are expensive
The local data is very sensitive and
require large storage
Support vector
machine
Finds the best classification function
of training data
Prevent over fitting than other
methods
Computationally expensive
Require large time and storage
Poor interpretability of results
15-Oct-14 11
Cont.
* SVM prediction speed and memory usage are good
** Nave Bayes speed and memory usage are good
*** Nearest neighbor usually has good prediction in low dimensions
Algorithms Predictive
Accuracy
Fitting
Speed
Prediction
Speed
Memory
Usage
Easy to
Interpret
Handles
Categorical
Predictors
Trees Low Fast Fast Low Yes Yes
SVM High Medium * * * No
Nave Bayes Low ** ** ** Yes Yes
Nearest
Neighbor
*** Fast *** Medium High No Yes ***
15-Oct-14 12
Conclusion
Here I have discussed various classification techniques such as Decision tree, Bayesian network,
Nearest neighbor and Support vector machine
Decision tree and SVMhave different operational profiles where one is accurate and other is not
and vice versa
15-Oct-14 13
References
1. Seema Sharma, Jitendra Agrawal, Shikha Agarwal, Sanjeev Sharma, Machine Learning Techniques for
Data Mining: A Survey, 979-1-4799-1597-2/13,2013, IEEE
2. Krisztian Balog, Heri Ramampiaro, Cumulative Citation Recommendation: Classification vs Ranking,
ACM 978-1-4503-2034-4/13/07, 2013
3. Mohd Fauzi bin Othman, Thomas Moh Shan Yau, Comparison of Different Classification Techniques
Using WEKA for Breast Cancer,520-523, Springer-Verlag Berlin Heidelberg 2007
4. Francesco Ricci, lior Rokach, Bracha Shapira, Paul B, Kantor, Recommender System Handbook,
ISBN 978-0-387-85819-7 Springer Science + Business Media, LLC 2011
15-Oct-14 14
15-Oct-14 15

S-ar putea să vă placă și