Sunteți pe pagina 1din 2

Statistical Learning Assignment 3 Fall 2019

Assignment 3: Gaussian distribution, Linear Classification, and


Linear Regression

1. (10 points) The Gaussian distribution


q R  
β ∞ R∞ b2
exp − β2 (s − γ)2 ds = 1, show that −∞ exp(−as2 +bs)ds = π/a exp( 4a
p
(a) Using the fact that 2π −∞
).
(b) Suppose that µ ∼ N (0, 1/α) and x|µ ∼ N (µ, 1/β). By integrating out µ, show that the marginal
distribution of x is given by x ∼ N (0, 1/β + 1/α) [Hint: Write down the joint distribution p(x, µ) =
p(x|µ)p(µ). To perform the integral over µ, use the identity from part b)]
(c) In the lecture, we showed that a product of two Gaussian probability density functions is proportional
to a Gaussian density function, but we did not derive the proportionality-factor. Suppose that
p1 (x) = N (x, µ1 , 1/β1 ) and p2 (x) = N (x, µ2 , 1/β2 ). Find Z such that
1
p(x) = p1 (x)p2 (x) = N (x, 1/β(β1 µ1 + β2 µ2 ), 1/β) where β = β1 + β2 . (1)
Z
[Hint: Go through the calculations we did in the lecture again, but carefully keep track of the factors
we had dropped.]
2. (10 points) Linear Regression (based on Bishop exercise 3.3). Consider a data-set in which each
data point tn has a weighting rn > 0, so that the sum-of-square error function is
N
X
E(ω) = rn (tn − ω > xn )2 . (2)
n=1

(a) Find the parameter-vector ω̂ which minimizes this error function.


(b) Describe two interpretation of this error functions in terms of i) replicated measurements and ii) a
data-dependent noise-variance.
3. (15 points) Linear classification and the logistic function. This exercise will be concerned with the
logistic function σ(s) = 1/(1 + exp(−s)) as well as its connection with linear classification in Gaussian
models.
(a) Show that the logistic function satisfies σ(−s) + σ(s) = 1 and find the first two derivatives of σ(s),
σ 0 (s) and σ 00 (s).
(b) Plot σ(s) as well as log(σ(s)) as a function of s (either using Python or with pen and paper, a rough
plot which captures the qualitative features of the functions is sufficient). Explain why, for large
s > 0, log(σ(s)) ≈ 0 and log(σ(−s)) ≈ −s.
(c) Suppose that we have data from two classes, and the data within each class is Gaussian distributed
with the same covariance, i.e. x|t = 1 ∼ N (µ+ , Σ) and x|t = −1 ∼ N (µ− , Σ), and that the
two classes the same prior probabilities π+ = P (t = +1) = π− = P (t = −1) = 0.5. Show that
the conditional probability of belonging to the positive class can be written as a logistic function
P (t = 1|x) = σ(ω > x + ωo ) and identify the corresponding parameters ω and ωo .
4. (15 points) Linear Classification [Python] Download the file LinearClassification.mat, in which
you will find training data xTrain (a matrix of size N = 500 by D = 2) with labels tTrain. Your job
will be to train and compare two classification algorithms on this data.
(a) Calculate the means and the covariances of each of the two classes, as well as the average covariance
Σ = 12 Σ+ + 12 Σ− . Use µ+ , µ− and Σ to the weight vector ω and offset ωo . of the Gaussian linear
discriminant analysis used in lectures.
(b) Plot the data as well as the decision boundary into a 2-D plot, and calculate the (training) error
rate of the algorithm, i.e. the proportion of points in the training set which were misclassified by
it. Use the data in xTest and tTest to also calculate its error rate on the test set.
(c) Calculate the parameters of the decision function y(x) = x> Ax + b> x + c of the ’quadratic discrim-
inant analysis’ that can be derived by doing classification in a Gaussian model without assuming
that Σ+ = Σ− , and calculate the training- and the test-error rate of this algorithm.
(d) For each data-point in the test-set, calculate its (scaled and signed) distance to the decision boundary
(i.e. the values of y(x) for each x). Make a plot which contains the histogram of all points in the
positive class (in blue) as well as a histogram of the points in the negative class (in red).
(e) Calculate the decision boundary of the quadratic algorithm and add it to the plot used in (b).
5. (20 points) Regression with basis functions [Python] Download the file LinearRegression.mat,
in which you will find training data xTrain (a vector of length N = 20) with outputs tTrain. Your job
will be to train a nonlinear regression model from x to t using basis functions.
(a) We want to use a 50-dimensional basis-set, i.e. the ‘feature-vector’ z(x) should be 50-dimensional
with zi (x) = 2 exp(−(x − i)2 /σ 2 ) with σ = 5 and i = 1, . . . 50. Make a plot of the 50 basis functions
(use the x-values in xPlot). Calculate the 50 × N matrix zTrain for which the n-th row is z(xn ),
and produce an image of the matrix (using matplotlib.pyplot.matshow).
(b) Using α = β = 1 (same notation as in lectures), calculate the posterior mean µ = E(ω|D) (a 50 × 1
vector) and plot it.
(c) The posterior mean µ is a vector of weights of the basis functions. Calculate the corresponding
P50
predictive mean by fµ (x) = E(t(x)|D) = i=1 µi zi (x) and plot the predictive mean and the
observed training data into the same plot.
(d) Calculate the posterior covariance over weights Σ = Cov(ω|D) and display it as an image. Extract
the diagonal of Σ go obtain the posterior variance, and use it to plot ± 2 standard deviation error
bars on the mean in part b)
(e) Calculate, for each x (use the values in xPlot), the predictive p variance Var(t|D, x), and use it to
plot ’error bars’ for the predictive distribution, i.e. fµ (x) ± 2 Var(t|D, x).

Page 2

S-ar putea să vă placă și