Sunteți pe pagina 1din 51

1

Introduce to WEKA
2
Outline
! Introduction
! History
! Architecture
! Pre-processing
! Operator
! Classification
! Clustering
! Association rule
! Visualization
! API
3
Introduction
! WEKA: (Waikato Environment for Knowledge Analysis)
which is developed by University of Waikato in New
Zealand
! http://www.cs.waikato.ac.nz/ml/weka/
! Goal:
! Improve the development of the agriculture in New
Zealand
! WEKA is written in Java (Java 1.2 ) and is
available under Linux, Windows and Macintosh..
4
Introduction (Cont.)
! Algorithms contained in WEKA can either
be applied directly to a dataset or called
from your own Java code
! WEKA provides tools for data prep-
ocessing, regression, visualization and
result evaluation
! Weka is open source software issued
under the GNU General Public License
5
History
Book
version
GUI
version
Development
version












1999
2003
6
Architecture
7
Preprocessing
! Input
! The ARFF file
! Access
! URL
! Filter
! Select attributions
! Apply filters
8
Input: ARFF
! ARFF (Attribute-Relation File Format) : an ASCII
text file
! Header information:
! contains the name of the relation ,a list of attributes and their
types
! @relation <relation-name >
! @attribute <attribute-name> <datatype>
! Data information:
! contains the data declaration line and the actual instance
lines: @data
! Missing value: ?
9
An example for the ARFF
% ARFF for the weather data
@relation weather
@attribution outlook {sunny,overcaset,rainy}
@attribution temperature numeric
@attribution windy {true,false}
@attribution play {yes,no}
@data
Sunny,85,false,no
Overcaset,80, true,yes
Rainy,71,true,no
10
Input: Access
! It is ok for versions 3-2-1 and 3-3 or
greater
! Throw the ODBC
! Find out the file DatabaseUtils.props and
rewrite its content
! You can get the details in the page:
http://www.cs.waikato.ac.nz/~ml/weka/opening_
windows_DBs.html
11
Load a dataset
Your database name
12
Load a dataset (Cont.)
Attributes in the database
The values of the attribute frequency
13
Filter
! Attribute selection
! Attributes Evaluator
! Consider one attribute at a time
! Consider a set of attributes together
! Search methods
! Apply filters
! It is used to delete specified attributes from
the dataset
14
Select attributes
The message about every action:
error, recommended resolutions
15
Select attributes-consider one attribute
Attribute evaluator:
infoGainAttributeEval
Search method: Ranker
16
Select attributes-consider one attribute
(Cont.)
Select the attribute to use as the class
17
Select attributes-consider one attribute
(Cont.)
The information gain for
each attribute respected
to the class
18
Select attributes-consider a set of
attributes
Attribute evaluator:
CfsSubsetEval
Search method: BestFirst
19
Select attributes-consider a set of
attributes (Cont.)
Select the attribute to use as the class
20
Select attributes-consider a set of
attributes (Cont.)
Select the worth of subset of
attributes
21
Help
more
22
Apply filters
Push here !
Select
AttributeFilter
The attributes
you want to retain
23
Apply filters (Cont.)
Press Replace, you
will see the attributes
you retain only
24
Visualize input
! Each training instance is represented as a
point in the visualization
! You can see the relationships between
every two attributes you select
! View the details:
! View the detail of a point by clicking the point
! Select a set of points to observe
25
Visualize input (Cont.)
Zoom in/ out
The way you see details of data
26
Classification
! Select a classifier
! Select an algorithm
! Here, we introduce how to set J48
! Test options
! Specify how to train/test data
! Load testing data
! Percentage split
! Cross-validation
! Select the class
! Output
! Decision tree
! Statistic evaluation
27
Classification (Cont.)
! what is the Cross-Validation ?
! Select the number of folds =>partitions of the
data, ex: 3
! Split the data into 3 approximately equal
partitions
! Each partition in turn is used for testing, the
remainder is used for training
28
Classification (Cont.)
Press here !
Select J48
29
Settings of the J48
! reducedErrorPruning:
! Reduced-error pruning or not
! confidenceFactor
! Specify the confidence threshold for pruning ( lower then prune
more drastically )
! binarySplits
! Build a binary tree for the nominal attribute or not
! minObj
! Specify the min # of instances in a leaf
! numFolds
! Specify the fold: partitions of your dataset
! subtreeRaising
! Improve the efficiency
30
Classification (Cont.)
How many times you want to repeat
Cross-Validation
Select the class
31
Classification (Cont.)
Output: includes a decision
tree, statistic evaluation
32
Visualize the output of classification
Click right and select which kind of
display you want
33
View the decision tree
34
Clustering
! WEKA provides three algorithms
! We take KMeans as an example
! Input file: iris.arff
! Output
! Clusters
! Probability distribution for all attributes
! The likelihood of training data with respect to
the clustering that it generates
35
Clustering (Cont.)
Select SimpleKMeans
Determine the number of clusters
36
Clustering (Cont.)
Centroid for each cluster
Select the way you
train the dataset
37
Visualize the result of clustering
Click right and select which kind of display
you want
38
View the clusters
39
Association rules
! WEKA supports only Apriori algorithm
! It handles only nominal attributes
! Input file :weather.normal.arff
! We can determine the number of rules ,
minimal support value and minimal
confidence value
! Output all the rules which fit our
restrictions
40
Association rules (Cont.)
41
Association rules (Cont.)
! Main settings for the association rules
! lowerBoundMinSupport:
! The lower bound for the support value
! minMetric:
! The minimal confidence value
! delta:
! Iteratively decreases the minimal support value from delta
! Apriori terminates until there are enough
rules(numRule) or the support reaches to the
lower bound
42
Association rules (Cont.)
43
View the association rules
Click right
44
Association rules (Cont.)
! Output
! The minimal support when Apriori stops
! The number of iterative during rule generation
under the conditions we set
! The large itemsets
! Rules:
! The number proceeding the => indicates the rule
support
! rule confidence
45
API
WEKAHOME/doc/packages.html (WEKAHONE means the directory you installed WEKA)
http://www.cs.waikato.ac.nz/~ml/weka/doc_gui/packages.html
46
Packages
! You can see all packages in
WEKAHOME/wake.jar
47
Packages (Cont.)
! weka.core
! FastVector
! Implements a fast vector class without synchronized methods.
(Synchronized methods tend to be slow)
! Attribute
! Class for handling an attribute
! Instance
! Class for handling an instance(a record)
! Instances
! Class for handling an ordered set of weighted instances (all
data)
! Other packages
48
How To Use API
! Iris Dataset for Example
! Setup Attribute information and Create
dataset
! Built instance
! Apply filter
! Generates the classifier
Double Click on this icon and Use
a text editor to open it.

49
How To Use API (Cont.)

Attributes

Instance
setDataset
50
API Tutorial
! http://www4.cs.umanitoba.ca/~jacky/Teac
hing/Courses/74.436/current/Software/wek
a-tutorial.pdf
! Section 8.4
51
More Documents
! http://www.cs.waikato.ac.nz/~ml/weka/gui
_explorer.html
! http://www.cs.waikato.ac.nz/~ml/weka/Exp
eriments.pdf
! WEKA Website