Sunteți pe pagina 1din 7

Statistical Machine Learning

Assignment 03

01 - **************************************************************************
1.01
We know that in the assignment step of K-Means, the corresponding cluster centers , (k=1,. ..K)
are unique. Then, we must try a new cluster assignment at each iteration. We also know that all
possible cluster assignments are nite ( ). With these conditions, K-Means converges in nite
iterations.
1.02
( = 1| ) =

( | , )

=1 ( | , )

1
exp{ 2 }
2
( = 1| ) =
1
2

=1 exp { 2 }
( = 1| ) =

1
1 + exp{ ( 2 2 )}

Then, if = , for each ,


we have 2 2 < 0.
1

As 0+ , exp {2 ( 2 2 )} 0. So, ( = 1| ) 1.
But, if . Let ,
we have 2 2 > 0.
1

Thus as 0+ , exp {2 ( 2 2 )} +.
So, ( = 1| )

1
1+

= 0.

02 - **************************************************************************
2.01

Size constrain of F: (n x k)
Size constrain of G: (k x d)

Where n is the number of data points, k is the number of clusters and d is the number of
features.

The additional constrains on F are:


o All its values must be 1 or 0;
o In the rows of F, there must be only one 1, indicating the membership of each
data point;

2.02
Step (a) corresponds to the Update step in K-Means, or the step that calculates the new
means to be the centroids of the observations in the new clusters.
Step (b) corresponds to the Assignment step in K-Means, or the step that assigns each
observation to the closest centroid.

03 - **************************************************************************
a)

LOOCV = 2/2 = 1. Using the circulated data points to test, we can verify that SVM using LOOCV
classifies all the data points incorrectly.
b)

LOOCV = 3/3 = 1. Using the circulated data points to test, we can verify that SVM using LOOCV
classifies all the data points incorrectly.
c)

LOOCV = 4/4 = 1. Using the circulated data points to test, we can verify that SVM using LOOCV
classifies all the data points incorrectly.
d)

LOOCV = 2/9. Using the circulated data points to test, we can verify that SVM using LOOCV
classifies 2 of the 9 data points incorrectly.

04 - **************************************************************************
4.01

LOOCV = 0/9 = 0. Using the circulated point to test, we can verify that SVM using LOOCV classifies
all data points correctly.

4.02
The (tight) upper-bound of LOOCV for the SVM classifier is k/n, since the only data points relevant
for this kind of validation are the support vectors. The other n-k data points will not affect the
result, since they do not change the decision boundary.

05 - **************************************************************************
5.01

5.02
We can verify that all given data points formed a perfect line, as demonstrated in the figure below.

Thus, this line is the best curve that fits with minimum error all data points. The first principle
component is the unit vector (or normalized vector) that best fits this line. We know that we should
use PCA to reduce the data dimensionality (in this case from 2-D space to 1-D) and 1 is the unit
vector that points at the same direction of this line. Therefore, we need to find this unit vector.
The definition of the unit vector of a vector is:

. Using the data point (1, 2) we have 1 =

->
() + ()

1 = [

].

5.03
The second principle component is the vector perpendicular to the first one. This way, we
have:
1 2 = 0, which is:
2
(1
, 2 ) ( ) = 0 -> 1
x + 5
y = 0 -> x = 2y
5 5
5

As 2 is also a unit vector, we have:


2 + 2 = 1
Plotting in , we have:

4 2 + 2 = 1 = 1 =

Plotting in , we have:

X =

This way, 2 = [

].

06 - **************************************************************************

6.01
The size of the state transition probability matrix in the HMM model is 3X3.
6.02
The size of the state-observation probability matrix is 3X4.
6.03
In a particular trial, we see 100 observations. The length of the path of states is 100.
6.04
There are 399 different possible state paths.

S-ar putea să vă placă și