Machine learning
K - Nearest Neighbor Classifier
Question 1: Given two categories such as category A = [ ] and category B =[
],
1
1
1
2
2
2
2
4
7
4
8
9
10
8
7
8
9
8
4
4
10
Groups
1
2
2
2
1
1
2
Sorted Distance
1
2
5
5
10
13
18
Neighbors
Yes
Yes
Yes
No
No
No
No
7
8
9
8
4
10
8
Groups
1
2
2
2
1
2
2
Sorted Distance
1
2
5
5
10
13
18
Neighbors
Yes
Yes
Yes
No
No
No
No
Categories
1
2
2
2
1
1
2
Helwan University
Machine learning
5- Determine the category based on the majority vote.
Input
4
Category
2
Matlab implementation:
close all; clear all; clc
% Initialization.
class1=[2 4;3 7;5 4];
class2=[3 8;5 9;7 10;6 8];
input=[4 7];
k=3;
labels=[ones(size(class1,1),1); 2*ones(size(class2,1),1)];
% Combines the two classes into one class
classes=[class1;class2];
% Compute the distance
distances=zeros(size(classes,1),1);
for i=1:size(classes,1)
tmp=(classes(i,1)-input(1))^2+(classes(i,2)-input(2))^2;
distances(i,1)=tmp;
end
% Sort the distances
[dist,pos]=sort(distances);
% Gather categories
neighborsIdx=pos(1:k);
neighborsLabels=labels(neighborsIdx);
% Majority vote
numClass1=length(find(neighborsLabels==1));
numClass2=length(find(neighborsLabels==2));
jointNum=[numClass1;numClass2];
category=max(jointNum);
12
11
10
9
8
7
6
5
4
3
2
1
Fig.1 Shows KNN plot such as blue squares represent class A, red circles represent class B and
black asterisk represents the input X.
Helwan University
Machine learning
K- Means Classifier
Question 2: Determine which of the following data points belong to cluster one and which
belong to the other cluster:
Solution:We should repeat below steps until convergence
1- Determine the centroid coordinates
Initialize the first two centroids. For example, 1 = (3,3) and 2 = (2,3).
2- Calculate the distance of each data point to the centroids (Euclidean distance).
3- Gather the data points based on minimum distance
Num
1
2
3
4
Data
3
-1
2
0
3
-4
3
-5
Iteration 1
1 = (3,3) , 2 = (2,3)
0
1
8.1
7.6
1
0
8.5
8.2
Iteration 2
1 = (3,3) , 2 = (0.3, 2)
0
5.7
8.1
2.4
1
5.3
8.5
3
Iteration 3
1 = (2.5,3) , 2 = (0.5, 4.5)
0
5.7
8.1
2.4
1
5.3
8.5
3
Matlab implementation:
clear all; close all; clc
% Initialization
data=[3 3;-1 -4;2 3;0 -5];
% Number of clusters
k=2;
[nRows,nCols] = size(data);
% Determine the centroid coordinates
r = randperm(nRows);
centroid(1 :k,:) = data(r(1 :k),:);
tempCentroid = zeros(size(centroid));
clusters = zeros(size(data,1 ));
while (true)
tempCentroid = centroid;
dist = pdist2(data,centroid);
[~,clusters] = min(dist,[],2);
for i = 1 : k
centroid(i,:) = mean(data(clusters == i,:));
end
if(tempCentroid==centroid)
break;
end
end
Helwan University
Machine learning
-2
-4
-6
-5
-4
-3
-2
-1
Helwan University