Documente Academic
Documente Profesional
Documente Cultură
Procedure/Algorithm:
Step 1: Given an unknown sample X, belongs to class label having the highest
posterior probability conditioned on X. That is, the Naïve Bayes Classifier assigns
an unknown data sample X to a class label Ci, if and only if P(Ci|X) > P(Cj|X)
where 1<=j<=m, j!=i.
Step 6: P(Xk|Ci) will be calculated using any one of the following options:-
Demonstration:
An example of a feature vector and corresponding class variable can be: (refer 1st
row of dataset)
X = (Rainy, Hot, High, False)
y = No
So basically, P(X|y) here means, the probability of “Not playing golf” given that
the weather conditions are “Rainy outlook”, “Temperature is hot”, “high humidity”
and “no wind”.
So, we calculate the following probabilities using Naïve Bayes Classifier.
So, in the figure above, we have calculated P(xi | yj) for each xi in X and yj in y
manually in the tables 1-4. For example, probability of playing golf given that the
temperature is cool, i.eP(temp. = cool | play golf = Yes) = 3/9.
Also, we need to find class probabilities (P(y)) which has been calculated in the
table 5. For example, P(play golf = Yes) = 9/14.
P(Yes|today) = 0.67
P(No|today) = 0.33
Therefore, P(Yes|today) > P(No|Today).So, prediction that golf would be played is
‘Yes’.
Program Description:-
import numpy as np
import math
import random
class Classification(object):
def __init__(self):
self.classlabel = 2
self.age = np.array((['<=30', '<=30', '31..40', '>40', '>40', '>40', '31..40', '>40', '<=30', '31..40',
'31..40', '>40']), dtype = str)
self.income = np.array((['high', 'high', 'high', 'medium', 'low', 'low', 'low', 'medium', 'low',
'medium', 'medium', 'medium', 'high', 'medium']), dtype=str)
self.student = np.array((['no', 'no', 'no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'yes', 'yes', 'no', 'yes',
'no']), dtype = str)
self.credit_rating = np.array((['fair', 'excellent', 'fair', 'fair', 'fair', 'excellent', 'excellent', 'fair',
'fair', 'fair', 'excellent', 'excellent', 'fair', 'excellent']), dtype = str)
self.buys_computer = np.array((['no', 'no', 'yes', 'yes', 'yes', 'no', 'yes', 'no', 'yes', 'yes', 'yes',
'yes', 'yes', 'no']), dtype = str)
print(self.age)
print(self.income)
print(self.student)
print(self.credit_rating)
print(self.buys_computer)
print("Enter age,income,student and credit rating class labels:")
self.a = input()
self.i = input()
self.s = input()
self.c = input()
def naive_bayes(self):
str1 = 'yes'
c=0
c1 = 0
for i in range(len(self.buys_computer)):
if(str1 == self.buys_computer[i]):
c = c+1
else:
c1 = c1+1
#print(c, "\t", c1)
p1 = c/len(self.buys_computer)#for yes
p2 = c1/len(self.buys_computer)#for no
prob1 = np.zeros(4, dtype=float)
prob2 = np.zeros(4, dtype=float)
p=0
p1 = 0
p2 = 0
p3 = 0
q=0
q1 = 0
q2 = 0
q3 = 0
pro1 = 0.0
pro2 = 0.0
for j in range(len(self.age)):
if(self.buys_computer[j] == 'yes'):
if(self.a == self.age[j]):
p = p+1
if(self.i == self.income[j]):
p1 = p1+1
if(self.s == self.student[j]):
p2 = p2+1
if(self.c == self.credit_rating[j]):
p3 = p3+1
for j in range(len(self.age)):
if(self.buys_computer[j] == 'no'):
if(self.a == self.age[j]):
q = q+1
if(self.i == self.income[j]):
q1 = q1+1
if(self.s == self.student[j]):
q2 = q2+1
if(self.c == self.credit_rating[j]):
q3 = q3+1
prob2[0] = q/c1
prob2[1] = q1/c1
prob2[2] = q2/c1
prob2[3] = q3/c1
prob1[0] = p/c
prob1[1] = p1/c
prob1[2] = p2/c
prob1[3] = p3/c
print(prob1[0] , prob1[1], prob1[2], prob1[3])
print(prob2[0] , prob2[1], prob2[2], prob2[3])
pro1 = prob1[0]*prob1[1]*prob1[2]*prob1[3]*p1
pro2 = prob2[0]*prob2[1]*prob2[2]*prob2[3]*p2
print(pro1, "\t", pro2)
print("Class label of buys_computer of the new data sample is:")
if( pro1 > pro2):
print("yes")
else:
print("no")
C = Classification()
C.naive_bayes()
Output:-
Report :
1) Naive Bayes can be modeled in several different ways including normal,
lognormal, gamma and Poisson density functions.
2) Applying Laplace correction to handle records with zeros values in
X variables improves performance.
3) It is easy and fast to predict class of test data set. It also perform well in
multi class prediction.
4) When assumption of independence holds, a Naive Bayes classifier
performs better compare to other models like logistic regression and you
need less training data.
5) It perform well in case of categorical input variables compared to
numerical variable(s). For numerical variable, normal distribution is
assumed (bell curve, which is a strong assumption).