Documente Academic
Documente Profesional
Documente Cultură
BE IT A 2
Experiment No: 4
BE IT A 2
smaller) version of the bank data can be found in the file "bank.arff" and the new unclassified
instances are in the file "bank-new.arff".
As usual, we begin by loading the data into WEKA, as seen in Figure A:
Figure A
Next, we select the "Classify" tab and click the "Choose" button to select the J48 classifier, as
depicted in Figures 21-a and 21-b. Note that J48 (implementation of C4.5 algorithm) does not
require discretization of numeric attributes, in contrast to the ID3 algorithm from which C4.5 has
evolved.
Experiment No: 4
BE IT A 2
Figure B
Now, we can specify the various parameters. These can be specified by clicking in the text box
to the right of the "Choose" button, as depicted in Figure 22. In this example we accept the
default values.
The default version does perform some pruning (using the subtree raising approach), but does
not perform error pruning. The selected parameters are depicted in Figure B.
Experiment No: 4
BE IT A 2
Figure D
Under the "Test options" in the main panel we select 10-fold cross-validation as our evaluation
approach. Since we do not have separate evaluation data set, this is necessary to get a reasonable
idea of accuracy of the generated model. We now click "Start" to generate the model. The ASCII
version of the tree as well as evaluation statistics will appear in the eight panel when the model
construction is completed (see Figure 23).
Experiment No: 4
BE IT A 2
Figure E
We can view this information in a separate window by right clicking the last result set (inside the
"Result list" panel on the left) and selecting "View in separate window" from the pop-up menu.
These steps and the resulting window containing the classification results are depicted in Figures
F and G.
Experiment No: 4
BE IT A 2
Figure G
Note that the classification accuracy of our model is only about 69%. This may indicate that we
may need to do more work (either in preprocessing or in selecting the correct parameters for
classification), before building another model. In this example, however, we will continue with
this model despite its inaccuracy.
WEKA also let's us view a graphical rendition of the classification tree. This can be done by
right clicking the last result set (as before) and selecting "Visualize tree" from the pop-up menu.
The tree for this example is depicted in Figure 25. Note that by resizing the window and
selecting various menu items from inside the tree view (using the right mouse button), we can
adjust the tree view to make it more readable.
Experiment No: 4
BE IT A 2
Figure H
Experiment No: 4
BE IT A 2
Experiment No: 4
BE IT A 2
35
85
15
%
%
Experiment No: 4
BE IT A 2
Conclusion: Thus, we have studied Classification in data mining, We saw how a tree can be
Viewed from data set. We also understood that classification basically helps us to classify Items
based on test and training models. Data sets are stored in database in .arff format. The whole
process of data mining was understood with example of bank and J-48 tree.