Documente Academic
Documente Profesional
Documente Cultură
This project aims to classify currency notes as fake or genuine using clustering algorithm.
data=pd.read_csv("php50jXam.csv")
data.dropna()
data.describe()
Out[14]:
V1 V2 V3 V4 Class
Data Analysis
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
As a last data analysis step, we wanted to see the relationship between the different features in
our dataset. The “pairplot()” function takes dataset as a parameter and plots a graph that
contains relationships between all the features in the dataset as shown below:
In [6]: sns.pairplot(data)
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
It is visible from the output that V4(entropy) and V1(variance) have a slight linear correlation.
Similarly, there is an inverse linear correlation between the V3(curtosis) and V2(skew). Finally,
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
we can see that the values for V3 and V4 are slightly higher for real banknotes, while the values
for V2 and V1 are higher for the fake banknotes.
Data Preprocessing
The dataset we will use consists of two features : V1 and V2 which are two measured features of
both fake and genuine notes.
In [12]: plt.xlabel('V1')
plt.ylabel('V2')
plt.scatter(norm_data['V1'], norm_data['V2'], alpha=0.25)
plt.scatter(norm_mean[0], norm_mean[1], label="Mean")
plt.title("Banknotes")
plt.legend()
plt.show()
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
In [13]: # Using K_means
from sklearn.cluster import KMeans
for i in range(1):
kmeans = KMeans(n_clusters=2).fit(norm_data)
#print(clusters)
norm_data['Class'] = y_kmeans
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
plt.xlabel('V1')
plt.ylabel('V2')
plt.scatter(class_1['V1'], class_1['V2'], label="Class 1", alpha=0.
5)
plt.scatter(class_2['V1'], class_2['V2'], label="Class 2", alpha=0.
5)
Discussion
With our data of banknotes already normalized, now we can use the KMean algorithm and it will
find the two clusters for us, giving a list of which of our elements belong to which class. We can
then add a label column into our table, and now print out our grapic again with different colours
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD
and see the two classes. If you run the code several times, there are little changes in the position
of the centroids, but they change very slightly. Using this model now we can classify any new
data into Class 1 or Class 2.
Create PDF in your applications with the Pdfcrowd HTML to PDF API PDFCROWD