Sunteți pe pagina 1din 4

Weka Tutorial

1. Downloading and Installing Weka (version 3.6) Website: http://www.cs.waikato.ac.nz/m1/weka/ You can also use the User manual and the documentation installed with Weka or you can download them from the website 2. Weka Textbook The primary reference for the Weka tutorials is written and Franks book Data Mining: Practical Learning Tools and Techniques, 2nd Edition 3. Downloadable Datasets The largest and most famous library for Machine Learning Datasets is UCI Machine Learning Repository (http://archive.ics.uci.edu/m1/index.html) This tutorial exercises introduce WEKA and ask you to try out several machine learning, visualization, and preprocessing methods using a wide variety of datasets: A. Learners: decision tree learner(J48), instance-based learner (1Bk), Nave Bayes (NB),Nave Bayes Multinomial (NBM), support vector machine(SMO), association rule learner(Apriori) B. Meta-learners: filtered classifier, attribute selected classifiers C. Visualization: visualize datasets, decision trees, classification errors D. Preprocessing: remove attributes and instances E. Testing: on training set, on supplied test set, using cross-validation, confidence and support of association rules Exercise 1: Set up your environment and start the Explorer, Look at the Preprocess, Classify, and Visualize panels 1. Load a dataset (weather nominal) and look at it. Apply a filter (to remove attributes and instances). 2. In Visualize: i. Load a dataset (iris) and visualize it. ii. Examine instance info 3. In Classify: i. Consider the data, weather nominal dataset, you will find it under Weka directory in the data folder for example (c:\Program Files\Weka-3-6\data). Now build a decision tree. - Using different techniques - List five criteria for evaluating these classification methods

Ii examine the tree in the Classifier output panel Iii visualize the tree Iv interpret classification accuracy and confusion matrix V visualize classifier errors Mapping Accuracy Interpretability Exercise 2: Import the dataset (segment-challenge.arff) you will find it under Weka directory in the data folder for example(c:\Program Files\Weka-3-6\data). and apply on it the following tasks: 1. Apply the J48 decision tree (use 10 folds cross validation, percentage split), MultilayerPerceptron(use 10 folds cross validation and using percentage split), and Nave Bayes classifier (using the training data , use 10 folds cross validation and using percentage split) then i. Visualize the curves ii. Compare the confusion matrices iii. Compare the accuracy Try to analyze the accuracy estimates on (1) train data, (2) cross-validation,(3) train/test split. Report major findings. Repeat the above questions using some other Tree, Function, Rule based method. 2. Apply the Simple Means cluster specifying 6 classes (using classes to clusters evaluation mode) i. ii. Apply Visualize (clusters axis, Variable: Y: by changing color, etc.) By changing the different cluster mode.

3. Import the (contact-lenses-ariff) data set and apply the Predictive Apiori algorithm to extract the hidden rules among the data Exercise 3: Introduce the datasets vote, weather nominal and supermarket. 1. Apply an association ruler learner (Apriori): i. Discuss the meaning of the rules ii. Identify the support and number of instances predicted correctly of certain rules 2. Make association rules for the supermarket dataset: I. Load supermarket II. Generate association rules and discuss some inferences you would make from them

Applying a Filter

Load the weather.nominal dataset. Use the filter weka. unsupervised.instance.RemoveWithValues to remove all instances in which the humidity attribute has the value high . To do this, first make the field next to the Choose button show the text RemoveWithValues . Then click on it to get the Generic Object Editor window, and figure out how to change the filter settings appropriately. Undo the change to the dataset that you just performed, and verify that the data has reverted to its original state. The Glass Dataset: The glass dataset glass.arff from the U.S. Forensic Science Service contains data on six types of glass. Glass is described by its refractive index and the chemical elements that it contains; the the aim is to classify different types of glass based on these features. This dataset is taken from the UCI datasets, which have been collected by the University of California at Irvine and are freely available on the Web. They are often used as a benchmark for comparing data mining algorithms. Find the dataset glass.arff and load it into the Explorer interface. For your own information, answer the following exercises. How many attributes are there in the dataset? What are their names? What is the class attribute? Run the classification algorithm IBk (weka.classifiers.lazy.IBk). Use cross-validation to test its performance, leaving the number of folds at the default value of 10. Recall that you can examine the classifier options in the Generic Object Editor window that pops up when you click the text beside the Choose button. The default value of the KNN field is 1: This sets the number of neighboring instances to use when classifying What is the accuracy of IBk (given in the Classifier Output box)? Run IBk again, but increase the number of neighboring instances to k=5 by entering this value in the KNN field. Here and throughout this section, continue to use cross-validation as the evaluation method. What is the accuracy of IBk with five neighboring instances (k=5)? Best Attributes: How to select the best attributes? Record the best attribute set and the greatest accuracy obtained in each iteration. The best accuracy obtained in this process is quite a bit higher than the accuracy obtained on the full dataset. Market Basket Analysis: Your job is to mine supermarket checkout data for associations. The data in supermarket.arff was collected from an actual New Zealand supermarket. Take a look at this file using a text editor to verify that you understand the structure. The main point of this exercise is to show you how difficult it is to find any interesting patterns in this type of data!

Experiment with Apriori and investigate the effect of the various parameters described before. Write a brief report on the main findings of your investigation.

S-ar putea să vă placă și