Sunteți pe pagina 1din 7

Project purpose

This project aims to classify currency notes as fake or genuine using clustering algorithm.

In [14]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans

data=pd.read_csv("php50jXam.csv")
data.dropna()
data.describe()

Out[14]:
V1 V2 V3 V4 Class

count 1372.000000 1372.000000 1372.000000 1372.000000 1372.000000

mean 0.433735 1.922353 1.397627 -1.191657 1.444606

std 2.842763 5.869047 4.310030 2.101013 0.497103

min -7.042100 -13.773100 -5.286100 -8.548200 1.000000

25% -1.773000 -1.708200 -1.574975 -2.413450 1.000000

50% 0.496180 2.319650 0.616630 -0.586650 1.000000

75% 2.821475 6.814625 3.179250 0.394810 2.000000

max 6.824800 12.951600 17.927400 2.449500 2.000000

Data Analysis

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
As a last data analysis step, we wanted to see the relationship between the different features in
our dataset. The “pairplot()” function takes dataset as a parameter and plots a graph that
contains relationships between all the features in the dataset as shown below:

In [6]: sns.pairplot(data)

Out[6]: <seaborn.axisgrid.PairGrid at 0x7fdfa25a1110>

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
It is visible from the output that V4(entropy) and V1(variance) have a slight linear correlation.
Similarly, there is an inverse linear correlation between the V3(curtosis) and V2(skew). Finally,

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
we can see that the values for V3 and V4 are slightly higher for real banknotes, while the values
for V2 and V1 are higher for the fake banknotes.

Data Preprocessing
The dataset we will use consists of two features : V1 and V2 which are two measured features of
both fake and genuine notes.

In [10]: # Normalized dataset

norm_data = (data - data.min())/(data.max()-data.min())

norm_mean = [ norm_data['V1'].mean(), norm_data['V2'].mean() ]


norm_std = [ norm_data['V1'].std(), norm_data['V2'].std() ]

print("Mean: ", norm_mean, "Std dev: ", norm_std)

Mean: [0.5391136632764809, 0.5873013774145726] Std dev: [0.2050034676


9971414, 0.2196113237409729]

In [12]: plt.xlabel('V1')
plt.ylabel('V2')
plt.scatter(norm_data['V1'], norm_data['V2'], alpha=0.25)
plt.scatter(norm_mean[0], norm_mean[1], label="Mean")

plt.title("Banknotes")
plt.legend()
plt.show()

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [13]: # Using K_means
from sklearn.cluster import KMeans

for i in range(1):
kmeans = KMeans(n_clusters=2).fit(norm_data)

# Calculating the centres of our clusters


clusters = kmeans.cluster_centers_

# Mask with the classification of the elements


y_kmeans = kmeans.predict(norm_data)

#print(clusters)

# Create a column with the labels

norm_data['Class'] = y_kmeans

class_1 = norm_data[ norm_data['Class'] == 0 ]


class_2 = norm_data[ norm_data['Class'] == 1 ]

# Plotting with two clusters

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.xlabel('V1')
plt.ylabel('V2')
plt.scatter(class_1['V1'], class_1['V2'], label="Class 1", alpha=0.
5)
plt.scatter(class_2['V1'], class_2['V2'], label="Class 2", alpha=0.
5)

plt.scatter(clusters[:,0], clusters[:,1], c='black', s=10000, alpha


=0.2)

plt.title("Fake notes x Authentic notes")


plt.legend()
plt.show()

Discussion
With our data of banknotes already normalized, now we can use the KMean algorithm and it will
find the two clusters for us, giving a list of which of our elements belong to which class. We can
then add a label column into our table, and now print out our grapic again with different colours

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
and see the two classes. If you run the code several times, there are little changes in the position
of the centroids, but they change very slightly. Using this model now we can classify any new
data into Class 1 or Class 2.

Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD

S-ar putea să vă placă și