Sunteți pe pagina 1din 17

Name :Samah Ziad Abudaia

Student ID:220133612

Introduction: The report shows the mechanics of data


mining on a large database and reduce the size of the
data and find useful relationships we bring data and
preparation data and the use of methods for extracting
data will remember the steps in detail and explain the
results
The first step: the process of bringing data
From site https://archive.ics.uci.edu/ml/datasets.html
Explanation of data: data that has been extracted were
talking about a database of diabetes where they used
algorithm literature to predict the beginning of diabetes
in pregnant women and the results turned to the
variable binary 0 or 1 and are heading 1 is a positive
test and Depends this examination on several variables
1. Number of times pregnant
2. Plasma glucose concentration a 2 hours in an oral
glucose tolerance test
3. Diastolic blood pressure (mm Hg)
4. Triceps skin fold thickness (mm)
5. 2-Hour serum insulin (mu U/ml)
6. Body mass index (weight in kg/(height in m)^2)
7. Diabetes pedigree function
8. Age (years)
9. Class variable (0 or 1)
1-Taking the data and put it in a file and call in the
program to executed their operations
1

2-After the data insertion in the program And the


designation of the columns and reservation the process
then pull data to implementation square

Run the program shows the following table, which -3


contains data

Step Two: Prepare the data in order to become ready


for use and the use of three methods
2

The first process Remove duplication


Look in a box operations Filtering Then we choose -1
to Remove duplication and implementation of the
process and see the results

After running the program: does not have a repeat -2


of the data

The Second process

Look in a box operations Data Transformation Then -1


data cleansing we choose to Replace missing value and
implementation of the process and see the results

After running the program: does not have-2


the missing values

The third process outliers


Look in a box operations Data-1
Transformation Then data cleansing we
choose to Detect outliers (Distances)and
implementation of the process and see the
results

After running the program: does not have the -2


missing values

After these three processes for data processing data are


ready and correct for the application Association Rules,
.two classification methods, clustering outlier
We will apply the Association Rules existing data

Process: We withdraw the data set to square work and


then look for Data Transformation then Type
Conversion and we withdraw Numerical to Binominal
and Numerical to Numerical square to work and
connected with the data set and Look for Modeling
then choose Association and Item Set mining and
choose FP-Growth and withdraw to square
implementation

Explains the Association Rules of the special


relationship between the characteristics

Result : This table shows the Special relationships


between variables

This chart to Association

:
Rules

We will apply the classification existing data


Process: pull the data set to square working and -1
looking for Numerical to Binominal then define the
label from the list look for Data transformation then
withdraw name and role modification then set role on
right of the page there are properties in which we
define label

We create a splitter to split the data into testing and -2


training data
Looking for a list of training and evaluation, and
validation and then the spilt validation Tow click on
validation appear divided into two parts, the first
section screen training looking for modeling and
classification and regression induction tree and then
withdraw the decision tree
8

The second section is the test we put the search for


model application and then the confidences and pull
apply model for
Looking for a performance evaluation of the list and
then the validation of performance and regression then
the (performance classification) to measure the
accuracy of the data finally make running to the
process

Result : The new data classification based on old


data and measure the accuracy and classification
analyze the input data and to develop an accurate
description or model for each class using the features
present in the data
9

After conducting this process was accurate


measurement of old and new data and taking average
and equal81 %

Chart:

Naive Bayes: The second method of


classification are the same steps.

10

Clustering : pull the data set to square working


and looking for Numerical to Binominal and Remove
Duplicate then define the label from the list look for
Data transformation then withdraw name and role
modification then set role on right of the page there are
properties in which we define label
Looking for a performance evaluation of the list and
then the validation of performance and regression then
the (performance clustering)Looking for a list of
Modeling then cluster and segmentation and withdraw
11

the K-means then search for data transformation then


attribute set reduction and transformation then
transformation then singular values composite

12

Result: split the data into tow clustering

The last process: Outliers


pull the data set to square working and look
for Data transformation then Data Cleansing
then outliers Detect and withdraw Detect
outliers(Distances) Find SVD and also
withdrawn.

13

Result : The data were classified to the


Outliers = True
And not outliers=false

This image represents statistical outliers


14

There are only ten outliers.

Outliers=10
Not Outliers=758

This chart: shows the percentage of nonoutliers and outliers values Fallon Red represents
a few outliers either blue color represents a nonoutliers values, the largest percentage

15

Conclusion: been identified in this report on


how to attract data and operations on data processing
and make it usable for the application of data mining
techniques, including the Association Rules ,outliers,
clustering and classification.
Each one of them has a different mechanism and
different result identify the existing data set we have
large they need these easy ways and identify Statistics.

16

S-ar putea să vă placă și