Documente Academic
Documente Profesional
Documente Cultură
(a) Consider the data as 2-D data points. Given a new data point, x = (1.4,1.6) as a query, rank the
database points based on similarity with the query using Euclidean distance, Manhattan distance,
supremum distance, and cosine similarity.
(b) Normalizethedatasettomakethenormofeachdatapointequalto1.UseEuclidean distance on the
transformed data to rank the data points.
Ans a) Formula for Euclidean distance,
Therefore, d(x,x1)=0.141
d(x,x2)=0.67
d(x,x3)=0.28
d(x,x4)=0.223
d(x,x5)=0.60
Thus, rank of the data points based on similarity with x using Eucledian distance is
x2,x5,x3,x4,x1
Therefore, d(x,x1)=0.2
d(x,x2)=0.9
d(x,x3)=0.4
d(x,x4)=0.3
d(x,x5)=0.7
Thus, rank of the data points based on similarity with x using Manhattan distance is
X2, x5, x3, x4, and x1
Therefore, d(x,x1)=0.1
d(x,x2)=0.6
d(x,x3)=0.2
d(x,x4)=0.2
d(x,x5)=0.6
Thus, rank of the data points based on similarity with x using Supremum distance is
X2, x5, x3, x4, and x1
Cosine similarity:
x. x1
x.x 1
( x , x 1) =
where
( x , x 1) =
( x , x 2) =
( x , x 3 )=
(1.4 ) (2 )+(1.6)(1.9)
( 1.4 2 +1.62 )( 22 +1.92 )
=0.9957
x 12+ x 22 ++ xn 2
=0.9999
( x , x 4) =
( x , x 5 )=
=0.9990
=0.9653
Thus, rank of the data points based on similarity with x using Supremum distance is x1, x3, x4, x2, x5.
D(x,x1)=0.8
D(x,x2)=0.71
D(x,x3)=0.02
D(x,x4)=0.04
D(x,x5)=0.27
Thus, rank of the data points based on similarity with x using Euclidean distance in normalized form is
x1, x2, x5, x4, x3.