Sunteți pe pagina 1din 87

„ WEKA: A Machine

Machine Learning with Learning Toolkit


WEKA „ The Explorer
• Classification and
Regression
• Clustering
Eibe Frank • Association Rules
• Attribute Selection
Department of Computer Science,
University of Waikato, New Zealand • Data Visualization
„ The Experimenter
„ The Knowledge
Flow GUI
„ Conclusions

WEKA: the bird

Copyright: Martin Kramer (mkramer@wxs.nl)


6/29/2010 University of Waikato 2

Machine Learning for Data Mining 1


WEKA: the software
„ Machine learning/data mining software written in
Java (distributed under the GNU Public License)
„ Used for research, education, and applications
„ Complements “Data Mining” by Witten & Frank
„ Main features:
‹ Comprehensive set of data pre-processing
pre processing tools,
learning algorithms and evaluation methods
‹ Graphical user interfaces (incl. data visualization)

‹ Environment for comparing learning algorithms

6/29/2010 University of Waikato 3

WEKA: versions
„ There are several versions of WEKA:
‹ WEKA 3.0: “book version” compatible with
description in data mining book
‹ WEKA 3.2: “GUI version” adds graphical user
interfaces (book version is command-line only)
‹ WEKA 3.3: “development version” with lots of
i
improvements t
„ This talk is based on the latest snapshot of WEKA
3.3 (soon to be WEKA 3.4)

6/29/2010 University of Waikato 4

Machine Learning for Data Mining 2


WEKA only deals with “flat” files
@relation heart-disease-simplified

@ tt ib t age numeric
@attribute i
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...

6/29/2010 University of Waikato 5

WEKA only deals with “flat” files


@relation heart-disease-simplified

@ tt ib t age numeric
@attribute i
@attribute sex { female, male}
@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}
@attribute cholesterol numeric
@attribute exercise_induced_angina { no, yes}
@attribute class { present, not_present}

@
@data
63,male,typ_angina,233,no,not_present
67,male,asympt,286,yes,present
67,male,asympt,229,yes,present
38,female,non_anginal,?,no,not_present
...

6/29/2010 University of Waikato 6

Machine Learning for Data Mining 3


6/29/2010 University of Waikato 7

6/29/2010 University of Waikato 8

Machine Learning for Data Mining 4


6/29/2010 University of Waikato 9

Explorer: pre-processing the data


„ Data can be imported from a file in various
formats: ARFF, CSV, C4.5, binary
„ Data can also be read from a URL or from an SQL
database (using JDBC)
„ Pre-processing tools in WEKA are called “filters”
„ WEKA contains filters for:
‹ Discretization, normalization, resampling, attribute
selection, transforming and combining attributes, …

6/29/2010 University of Waikato 10

Machine Learning for Data Mining 5


6/29/2010 University of Waikato 11

6/29/2010 University of Waikato 12

Machine Learning for Data Mining 6


6/29/2010 University of Waikato 13

6/29/2010 University of Waikato 14

Machine Learning for Data Mining 7


6/29/2010 University of Waikato 15

6/29/2010 University of Waikato 16

Machine Learning for Data Mining 8


6/29/2010 University of Waikato 17

6/29/2010 University of Waikato 18

Machine Learning for Data Mining 9


6/29/2010 University of Waikato 19

6/29/2010 University of Waikato 20

Machine Learning for Data Mining 10


6/29/2010 University of Waikato 21

6/29/2010 University of Waikato 22

Machine Learning for Data Mining 11


6/29/2010 University of Waikato 23

6/29/2010 University of Waikato 24

Machine Learning for Data Mining 12


6/29/2010 University of Waikato 25

6/29/2010 University of Waikato 26

Machine Learning for Data Mining 13


6/29/2010 University of Waikato 27

6/29/2010 University of Waikato 28

Machine Learning for Data Mining 14


6/29/2010 University of Waikato 29

6/29/2010 University of Waikato 30

Machine Learning for Data Mining 15


6/29/2010 University of Waikato 31

Explorer: building “classifiers”


„ Classifiers in WEKA are models for predicting
nominal or numeric quantities
„ Implemented learning schemes include:
‹ Decision trees and lists, instance-based classifiers,
support vector machines, multi-layer perceptrons,
logistic regression, Bayes’ nets, …
„ “Meta”-classifiers include:
‹ Bagging, boosting, stacking, error-correcting output
codes, locally weighted learning, …

6/29/2010 University of Waikato 32

Machine Learning for Data Mining 16


6/29/2010 University of Waikato 33

6/29/2010 University of Waikato 34

Machine Learning for Data Mining 17


6/29/2010 University of Waikato 35

6/29/2010 University of Waikato 36

Machine Learning for Data Mining 18


6/29/2010 University of Waikato 37

6/29/2010 University of Waikato 38

Machine Learning for Data Mining 19


6/29/2010 University of Waikato 39

6/29/2010 University of Waikato 40

Machine Learning for Data Mining 20


6/29/2010 University of Waikato 41

6/29/2010 University of Waikato 42

Machine Learning for Data Mining 21


6/29/2010 University of Waikato 43

6/29/2010 University of Waikato 44

Machine Learning for Data Mining 22


6/29/2010 University of Waikato 45

6/29/2010 University of Waikato 46

Machine Learning for Data Mining 23


6/29/2010 University of Waikato 47

6/29/2010 University of Waikato 48

Machine Learning for Data Mining 24


6/29/2010 University of Waikato 49

6/29/2010 University of Waikato 50

Machine Learning for Data Mining 25


6/29/2010 University of Waikato 51

6/29/2010 University of Waikato 52

Machine Learning for Data Mining 26


6/29/2010 University of Waikato 53

6/29/2010 University of Waikato 54

Machine Learning for Data Mining 27


6/29/2010 University of Waikato 55

6/29/2010 University of Waikato 56

Machine Learning for Data Mining 28


6/29/2010 University of Waikato 57

6/29/2010 University of Waikato 58

Machine Learning for Data Mining 29


6/29/2010 University of Waikato 59

6/29/2010 University of Waikato 60

Machine Learning for Data Mining 30


6/29/2010 University of Waikato 61

6/29/2010 University of Waikato 62

Machine Learning for Data Mining 31


6/29/2010 University of Waikato 63

6/29/2010 University of Waikato 64

Machine Learning for Data Mining 32


6/29/2010
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 65

6/29/2010
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 66

Machine Learning for Data Mining 33


6/29/2010
QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. University of Waikato 67

6/29/2010 University of Waikato 68

Machine Learning for Data Mining 34


6/29/2010 University of Waikato 69

6/29/2010 University of Waikato 70

Machine Learning for Data Mining 35


6/29/2010 University of Waikato 71

6/29/2010 University of Waikato 72

Machine Learning for Data Mining 36


6/29/2010 University of Waikato 73

6/29/2010 University of Waikato 74

Machine Learning for Data Mining 37


QuickTime™ and a TIFF (LZW) decompressor are needed to see this pict

6/29/2010 University of Waikato 75

6/29/2010 University of Waikato 76

Machine Learning for Data Mining 38


6/29/2010 University of Waikato 77

6/29/2010 University of Waikato 78

Machine Learning for Data Mining 39


6/29/2010 University of Waikato 79

QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

6/29/2010 University of Waikato 80

Machine Learning for Data Mining 40


QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

6/29/2010 University of Waikato 81

6/29/2010 University of Waikato 82

Machine Learning for Data Mining 41


QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture.

6/29/2010 University of Waikato 83

6/29/2010 University of Waikato 84

Machine Learning for Data Mining 42


6/29/2010 University of Waikato 85

6/29/2010 University of Waikato 86

Machine Learning for Data Mining 43


6/29/2010 University of Waikato 87

6/29/2010 University of Waikato 88

Machine Learning for Data Mining 44


6/29/2010 University of Waikato 89

6/29/2010 University of Waikato 90

Machine Learning for Data Mining 45


6/29/2010 University of Waikato 91

Explorer: clustering data


„ WEKA contains “clusterers” for finding groups of
similar instances in a dataset
„ Implemented schemes are:
‹ k-Means, EM, Cobweb, X-means, FarthestFirst
„ Clusters can be visualized and compared to “true”
clusters ((if given)
g )
„ Evaluation based on loglikelihood if clustering
scheme produces a probability distribution

6/29/2010 University of Waikato 92

Machine Learning for Data Mining 46


6/29/2010 University of Waikato 93

6/29/2010 University of Waikato 94

Machine Learning for Data Mining 47


6/29/2010 University of Waikato 95

6/29/2010 University of Waikato 96

Machine Learning for Data Mining 48


6/29/2010 University of Waikato 97

6/29/2010 University of Waikato 98

Machine Learning for Data Mining 49


6/29/2010 University of Waikato 99

6/29/2010 University of Waikato 100

Machine Learning for Data Mining 50


6/29/2010 University of Waikato 101

6/29/2010 University of Waikato 102

Machine Learning for Data Mining 51


6/29/2010 University of Waikato 103

6/29/2010 University of Waikato 104

Machine Learning for Data Mining 52


6/29/2010 University of Waikato 105

6/29/2010 University of Waikato 106

Machine Learning for Data Mining 53


6/29/2010 University of Waikato 107

Explorer: finding associations


„ WEKA contains an implementation of the Apriori
algorithm for learning association rules
‹ Works only with discrete data
„ Can identify statistical dependencies between
groups of attributes:
‹ milk, butter ⇒ bread, eggs (with confidence 0.9 and
support 2000)
„ Apriori can compute all rules that have a given
minimum support and exceed a given confidence

6/29/2010 University of Waikato 108

Machine Learning for Data Mining 54


6/29/2010 University of Waikato 109

6/29/2010 University of Waikato 110

Machine Learning for Data Mining 55


6/29/2010 University of Waikato 111

6/29/2010 University of Waikato 112

Machine Learning for Data Mining 56


6/29/2010 University of Waikato 113

6/29/2010 University of Waikato 114

Machine Learning for Data Mining 57


6/29/2010 University of Waikato 115

Explorer: attribute selection


„ Panel that can be used to investigate which
(subsets of) attributes are the most predictive ones
„ Attribute selection methods contain two parts:
‹ A search method: best-first, forward selection,
random, exhaustive, genetic algorithm, ranking
‹ An evaluation method: correlation-based, wrapper,
information gain, chi-squared, …
„ Very flexible: WEKA allows (almost) arbitrary
combinations of these two

6/29/2010 University of Waikato 116

Machine Learning for Data Mining 58


6/29/2010 University of Waikato 117

6/29/2010 University of Waikato 118

Machine Learning for Data Mining 59


6/29/2010 University of Waikato 119

6/29/2010 University of Waikato 120

Machine Learning for Data Mining 60


6/29/2010 University of Waikato 121

6/29/2010 University of Waikato 122

Machine Learning for Data Mining 61


6/29/2010 University of Waikato 123

6/29/2010 University of Waikato 124

Machine Learning for Data Mining 62


Explorer: data visualization
„ Visualization very useful in practice: e.g. helps to
determine difficulty of the learning problem
„ WEKA can visualize single attributes (1-d) and
pairs of attributes (2-d)
‹ To do: rotating 3-d visualizations (Xgobi-style)
„ Color-coded class values
„ “Jitter” option to deal with nominal attributes (and
to detect “hidden” data points)
„ “Zoom-in” function
6/29/2010 University of Waikato 125

6/29/2010 University of Waikato 126

Machine Learning for Data Mining 63


6/29/2010 University of Waikato 127

6/29/2010 University of Waikato 128

Machine Learning for Data Mining 64


6/29/2010 University of Waikato 129

6/29/2010 University of Waikato 130

Machine Learning for Data Mining 65


6/29/2010 University of Waikato 131

6/29/2010 University of Waikato 132

Machine Learning for Data Mining 66


6/29/2010 University of Waikato 133

6/29/2010 University of Waikato 134

Machine Learning for Data Mining 67


6/29/2010 University of Waikato 135

6/29/2010 University of Waikato 136

Machine Learning for Data Mining 68


6/29/2010 University of Waikato 137

Performing experiments
„ Experimenter makes it easy to compare the
performance of different learning schemes
„ For classification and regression problems
„ Results can be written into file or database
„ Evaluation options: cross-validation, learning
curve,, hold-out
„ Can also iterate over different parameter settings
„ Significance-testing built in!

6/29/2010 University of Waikato 138

Machine Learning for Data Mining 69


6/29/2010 University of Waikato 139

6/29/2010 University of Waikato 140

Machine Learning for Data Mining 70


6/29/2010 University of Waikato 141

6/29/2010 University of Waikato 142

Machine Learning for Data Mining 71


6/29/2010 University of Waikato 143

6/29/2010 University of Waikato 144

Machine Learning for Data Mining 72


6/29/2010 University of Waikato 145

6/29/2010 University of Waikato 146

Machine Learning for Data Mining 73


6/29/2010 University of Waikato 147

6/29/2010 University of Waikato 148

Machine Learning for Data Mining 74


6/29/2010 University of Waikato 149

6/29/2010 University of Waikato 150

Machine Learning for Data Mining 75


6/29/2010 University of Waikato 151

The Knowledge Flow GUI


„ New graphical user interface for WEKA
„ Java-Beans-based interface for setting up and
running machine learning experiments
„ Data sources, classifiers, etc. are beans and can
be connected graphically
„ Data “flows” through
g components:
p e.g.,
g,
“data source” -> “filter” -> “classifier” -> “evaluator”
„ Layouts can be saved and loaded again later

6/29/2010 University of Waikato 152

Machine Learning for Data Mining 76


6/29/2010 University of Waikato 153

6/29/2010 University of Waikato 154

Machine Learning for Data Mining 77


6/29/2010 University of Waikato 155

6/29/2010 University of Waikato 156

Machine Learning for Data Mining 78


6/29/2010 University of Waikato 157

6/29/2010 University of Waikato 158

Machine Learning for Data Mining 79


6/29/2010 University of Waikato 159

6/29/2010 University of Waikato 160

Machine Learning for Data Mining 80


6/29/2010 University of Waikato 161

6/29/2010 University of Waikato 162

Machine Learning for Data Mining 81


6/29/2010 University of Waikato 163

6/29/2010 University of Waikato 164

Machine Learning for Data Mining 82


6/29/2010 University of Waikato 165

6/29/2010 University of Waikato 166

Machine Learning for Data Mining 83


6/29/2010 University of Waikato 167

6/29/2010 University of Waikato 168

Machine Learning for Data Mining 84


6/29/2010 University of Waikato 169

6/29/2010 University of Waikato 170

Machine Learning for Data Mining 85


6/29/2010 University of Waikato 171

6/29/2010 University of Waikato 172

Machine Learning for Data Mining 86


Conclusion: try it yourself!
„ WEKA is available at
http://www.cs.waikato.ac.nz/ml/weka
ƒ Also has a list of projects based on WEKA
ƒ WEKA contributors:
Abdelaziz Mahoui, Alexander K. Seewald, Ashraf M. Kibriya, Bernhard
Pfahringer
g , Brent Martin, Peter Flach, Eibe Frank ,Gabi Schmidberger
g
,Ian H. Witten , J. Lindgren, Janice Boughton, Jason Wells, Len Trigg,
Lucio de Souza Coelho, Malcolm Ware, Mark Hall ,Remco Bouckaert ,
Richard Kirkby, Shane Butler, Shane Legg, Stuart Inglis, Sylvain Roy,
Tony Voyle, Xin Xu, Yong Wang, Zhihai Wang

6/29/2010 University of Waikato 173

Machine Learning for Data Mining 87

S-ar putea să vă placă și