Sunteți pe pagina 1din 9

Course Code MIT412

Description Advanced Database System

College / Department:
Assignment No. 8
Online Education

LABORATORY EXERCISE Page 1 of 9

Laboratory Exercise
To perform this activity, you need to download and install WEKA.

Data Transformation
The most compatible file extension that WEKA can process is with .arff and .csv files.
An ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of
instances sharing a set of attributes.
A CSV is a simple file format used to store tabular data, such as a spreadsheet or
database. CSV stands for "comma-separated values”.

Sample ARFF File

ARFF format files


The following is an example of an ARFF file.
Datasets descriptions: The datasets consist of four (4) independent variables or predictors and 1
(dependent variable or target variable)
@relation weather.symbolic

@attribute outlook {sunny, overcast, rainy}


@attribute temperature {hot, mild, cool}
@attribute humidity {high, normal}
@attribute windy {TRUE, FALSE}
@attribute play {yes, no}

@data
sunny,hot,high,FALSE,no
sunny,hot,high,TRUE,no
overcast,hot,high,FALSE,yes
rainy,mild,high,FALSE,yes
rainy,cool,normal,FALSE,yes
rainy,cool,normal,TRUE,no
overcast,cool,normal,TRUE,yes
sunny,mild,high,FALSE,no
sunny,cool,normal,FALSE,yes
rainy,mild,normal,FALSE,yes
sunny,mild,normal,TRUE,yes
overcast,mild,high,TRUE,yes
overcast,hot,normal,FALSE,yes
rainy,mild,high,TRUE,no

WEKA
The Explorer is the most useful interface in testing separate classifier. Clicking on the button will
launch the Explorer interface.
1. Kindly click the explorer button
The Explorer Interface

Opening a data set.


2. In the Explorer window, click on “Open file” and then use the browser to navigate to the
‘data’ folder. Select the file called dataminingweatherdata.csv.

Data Visualization
The figure below indicates that there are five attributes in the given dataset. At the left
panel of the figure it shows the visualization of the data using simple descriptive statatitics.
The dataset contains 14 observations (instances) with five (5) attributes. The ‘play’
attribute will be selected as the class attribute.
The activity aims to determine patterns of playing ‘yes’ or playing ‘no’ based on given
sets of attributes and observations.
3. Click Visualize all to visualize the frequency distribution of each predictor.
Classify Tab
By default, zero classifier is selected. In this activity, we need to select the “Play” attribute in
the dropdown list. The selection enables the process to determine in which the predictor is the
target variable. The ‘play’ attribute has been suggested as the class attribute (i.e. the one that
will be predicted from the others).
4. Get to the Classify mode (by clicking on the Classify tab) as shown below:

5. Next we must select a machine learning classifier to apply to this data. The task is
classification so click on the ‘classify’ tab near the top of the Explorer window.

6. Please select the J48 under the tree folder.


7. Then Click Start.

J48 Results
The results will provide the following information:
Correctly Classified Instances 7 50 %
Incorrectly Classified Instances 7 50 %
The table below indicates that the model derived from the dataset using J48 method has accuracy
results of 50 percent.
Visualizing Data Model (Tree Diagram)
The panel on the lower left headed ‘Result list (right-click for options)’ provides access
to more information about the results. Right clicking will produce a menu from which ‘Visualize
Tree’ can be selected. This will display the decision tree in a more attractive format:
The Generated RuleSets
J48 pruned tree
------------------

outlook = overcast: yes (4.0)


outlook = rainy
| windy = FALSE: yes (3.0)
| windy = TRUE: no (2.0)
outlook = sunny
| humidity = high: no (3.0)
| humidity = normal: yes (2.0)

Number of Leaves : 5

Size of the tree : 8


Answer the following:
Analysis: Based from the decision tree model generated, derived the 5 rule sets.
Rule 1: If (outlook = sunny and Humidity = High)  he will not play
Rule 2: if( outlook = sunny and Humidity = Normal)  he will play
Rule 3: if(outlook =overcast) he will play
Rule 4: if(outlook =rainy and Windy=false)  he will play
Rule 5:if(outlook =rainy and Windy =no) he will not play

S-ar putea să vă placă și