Sunteți pe pagina 1din 26

DATA MINING

PERTEMUAN 6
WILDAN BUDIAWAN Z

OUTLINE
What is classification?
Preparing and comparing classification method/algorythm

Classification by Nearest Neighbor


Classification by Naive Bayes Classifier
DATA MINING PERTEMUAN 6

CLASSIFICATION OVERVIEW

Databases are rich with hidden information that can be used for making intelligent

business decision.
Clasification and prediction are two forms of data analysis which can be used to

extract model describing important data classes or to predict future data trends.
A model is built describing a predetermined set of data classes or concepts. The

model is contructed by analyzing database tuples described by atributes.

DATA MINING PERTEMUAN 6

CLASSIFICATION OVERVIEW (2)


In the context of classification, data tuples are also refered to as samples of objects.
The data tuples analyed to build the model collectively form the training data set.

The individual tuples making up the training set are refered to as training samples and
are randomly selected from the sample population.

DATA MINING PERTEMUAN 6

SUPERVISED LEARNING AND UNSUPERVISED LEARNING

Since the class label of each training


sample is provided, this step is also
known as supervised learning.

DATA MINING PERTEMUAN 6

Unsupervised learning in which the


class labels of training samples are not
known, and the number or set of clases
to be learned may not be known in
advance.

CLASSIFICATION 1ST STEP

DATA MINING PERTEMUAN 6

CLASSIFICATION 2ND STEP

DATA MINING PERTEMUAN 6

PREPARING THE DATA FOR CLASSIFICATION

Data
cleaning

DATA MINING PERTEMUAN 6

COMPARING CLASSIFICATION ALGORYTHM

Accuracy
Speed
Robusness
Scalability
Interpretablity
DATA MINING PERTEMUAN 6

NEAREST NEIGHBOR ALGORYTHM

similarity =

=1 ( , )

: Kasus baru

: kasus yang ada dalam penyimpanan

: jumlah atribut

: atribut individu antara 1 n

: fungsi similarity ATRIBUT I antara kasus T dan kasus S

: bobot yang diberikan pada atribut ke-i

DATA MINING PERTEMUAN 6

NEAREST NEIGHBOR : CONTOH KASUS


No.

Jenis Kelamin

Pendidikan

Agama

Bermasalah

S1

Islam

Ya

SMA

Kristen

Tidak

SMA

Islam

Tidak

DATA MINING PERTEMUAN 6

NEAREST NEIGHBOR : CONTOH KASUS (2)

Definisi Bobot
Atribut

DATA MINING PERTEMUAN 6

Kedekatan Nilai
Atribut Jenis
Kelamin

Kedekatan Nilai
Atribut
Pendidikan

Kedekatan Nilai
Atribut Agama

NEAREST NEIGHBOR : CONTOH KASUS (3)

DEFINISI NILAI BOBOT ATRIBUT


Atribut
Jenis Kelamin
Pendidikan
Agama

DATA MINING PERTEMUAN 6

Bobot
0.5
1
0.75

NEAREST NEIGHBOR : CONTOH KASUS (4)

DEFINISI KEDEKATAN ATRIBUT


Jenis Kelamin

Nilai 1
L
P
L
P

Agama

Nilai 2
L
P
P
L

Kedekatan
1
1
0.5
0.5

Nilai 2
S1
SMA
SMA
S1

Kedekatan
1
1
0.4
0.4

Pendidikan

Nilai 1
S1
SMA
S1
SMA

DATA MINING PERTEMUAN 6

Nilai 1 Nilai 2 Kedekatan


Islam
Islam
1
Kristen Kristen
1
Islam Kristen
0.75
Kristen Islam
0.75

Calculate

Data Mining Pertemuan 6

NEAREST NEIGHBOR : CONTOH KASUS (5)

KASUS BARU
Kedekatan kasus baru dengan kasus 1

Kasus Baru:
Jenis kelamin : L
Pendidikan

: SMA

Agama

: Kristen

Class

: ???

a
b
c
d
e
f

DATA MINING PERTEMUAN 6

:
:
:
:
:
:
:
:
:
:
:
:

Kedekatan nilai atribut Jenis Kelamin (Laki-laki dengan Laki-laki)


1
Bobot Atribut Jenis Kelamin
0.5
Kedekatan nilai atribut Pendidikan (SMA dengan S1)
0.4
Bobot Atribut Pendidikan
1
Kedekatan nilai Atribut Agama (Kristen dengan Islam)
0.75
Bobot Atribut Agama
0.75

NEAREST NEIGHBOR : CONTOH KASUS (5)


KEDEKATAN KASUS BARU DENGAN KASUS NOMOR 1

=
=

+ +()
++
10.5 + 0.41 +(0.750.75)
0.5+1+0.75
1.46
2.25

Jarak = 0.065
DATA MINING PERTEMUAN 6

NAIVE BAYES CLASSIFIER/BAYESIAN CLASSIFICATION

| ()
=
()
Keterangan :
X

: data dengan class yang belum diketahui

: hipotesis data X merupakan suatu class spesifik

P (H|X)

: probabilitas hipotesis H berdasar kondisi X (postteriori probability)

P(H)

: probabilitas hiotesis H (prior probability)

P(X | H)

: probabilitas X berdasar kondisi pada hipotesis H

P(X)

: probabilitas dari X

DATA MINING PERTEMUAN 6

NAIVE BAYES CLASSIFIER: CONTOH KASUS


ID Age
1
<=30
2
<=30
3
31..40
4
>40
5
>40
6
>40
7
31..40
8
<=30
9
<=30
10 >40
11 <=30
12 31..40
DATA MINING PERTEMUAN 6
13 31..40
14 >40

Income
High
High
High
Medium
Low
Low
Low
Medium
Low
Medium
Medium
Medium
Low
Medium

Student
No
No
No
No
Yes
Yes
Yes
No
Yes
Yes
Yes
No
Yes
No

Credit Rating
Fair
Excellent
Fair
Fair
Fair
Excellent
Excellent
Fair
Fair
Fair
Excellent
Excellent
Fair
Excellent

Buy Computer:
No
No
Yes
Yes
Yes
No
Yes
No
Yes
Yes
Yes
Yes
Yes
No

NAIVE BAYES CLASSIFIER : CONTOH KASUS (2)

Kasus baru:
Age

: <=30

Income

: Medium

Student

:Yes

Credit Rating

: Fair

Class

: ???

DATA MINING PERTEMUAN 6

NAIVE BAYES CLASSIFIER: CONTOH KASUS (3)

Hitung
Probabilitas
Class

DATA MINING PERTEMUAN 6

Hitung
Probabilitas
Atribut Age

Hitung
Probabilitas
Atribut
Income

Hitung
Probabilitas
Atribut
Student

Hitung
Probabilitas
Atribut
Credit Rating

Kalkulasi
Class

Calculate

Data Mining Pertemuan 6

NAIVE BAYES CLASSIFIER : CONTOH KASUS (4)


P(Buy Computer=yes) = 9/14 = 0.643
P(Buy Computer=no) = 5/14 = 0.357

DATA MINING PERTEMUAN 6

NAIVE BAYES CLASSIFIER : CONTOH KASUS (5)


P(Age = <=30 | Buy Computer = Yes) = 2/9 = 0.222
P(Age = <=30 | Buy Computer = No) = 3/5 = 0.222
P(Income = Medium | Buy Computer = Yes) = 4/9 = 0.444
P(Income = Medium | Buy Computer = No) = 2/5 = 0.400

P(Student= Yes | Buy Computer = Yes) = 6/9 = 0.667


P(Student= Yes | Buy Computer = No) = 1/5 = 0.200

DATA MINING PERTEMUAN 6

P(Credit Rating = Fair | Buy Computer = Yes) = 6/9 = 0.667


P(Credit Rating = Fair | Buy Computer = No) = 2/5 = 0.400

NAIVE BAYES CLASSIFIER : CONTOH KASUS (6)

P(Student= Yes | Buy Computer = Yes) = 6/9 = 0.667

P(Student= Yes | Buy Computer = No) = 1/5 = 0.200


P(Credit Rating = Fair | Buy Computer = Yes) = 6/9 = 0.667
P(Credit Rating = Fair | Buy Computer = No) = 2/5 = 0.400

DATA MINING PERTEMUAN 6

NAIVE BAYES CLASSIFIER : CONTOH KASUS (7)


P(X | Buy Computer = Yes) = 0.222 * 0.444 * 0.667 * 0.667 = 0.044
P(X | Buy Computer = No) = 0.222 * 0.400 * 0.200 * 0.400 = 0.019

Class = Yes
= P(X | Buy Computer = Yes) *
P(Buy Computer=yes)
= 0.044 * 0.643
= 0.028
DATA MINING PERTEMUAN 6

Class = No
= P(X | Buy Computer = No) *
P(Buy Computer=No)
= 0.019 * 0.357
= 0.007

S-ar putea să vă placă și