Sunteți pe pagina 1din 10

University of Wisconsin Madison

Computer Sciences Department

CS 760 - Machine Learning


Spring 2010

Exam
11am-12:30pm, Monday, April 26, 2010 Room 1240 CS CLOSED BOOK (one sheet of notes and a calculator allowed)

Write your answers on these pages and show your work. If you feel that a question is not fully specified, state any assumptions you need to make in order to solve the problem. You may use the backs of these sheets for scratch work. If you use the back for any of your final answers, be sure to clearly mark that on the front side of the sheets. Neatly write your name on this and all other pages of this exam.

Name

________________________________________________________________

Problem

Score

Max Score

1 2 3 4 5

______ ______ ______ ______ ______

20 20 20 20 20

TOTAL

______

100

Name: ______________________

Problem 1 Learning from Labeled Examples (20 points) You have a dataset that involves three features. Feature Cs values are in [0, 1000]. The other two features are Boolean-valued.
A Ex1 Ex2 Ex3 Ex4 Ex5 F T T F T B T F T F T C 115 890 257 509 753 Category false false true true true

a) How much information about the category is gained by knowing whether or not the value of feature C is less than 333?

b) How much information is there in knowing whether or not features A and B have the same value?

c) A knowledgeable reviewer says that the above data set was not very well pre-processed for nearest-neighbor algorithms. Briefly explain why a reviewer might say that.

Page 2 of 10

Name: ______________________

d) Assume a one-norm SVM puts weight = -3 on feature A, weight = 2 on feature B, and weight = 0 on feature C. What would the cost of this solution be, based on this questions five training examples? If you need to make any additional assumptions, be sure to state and briefly justify them.

The training examples repeated for your convenience: A Ex1 Ex2 Ex3 Ex4 Ex5 F T T F T B T F T F T C 115 890 257 509 753 Category false false true true true

Page 3 of 10

Name: ______________________

Problem 2 Aspects of Supervised Learning (20 points) a) Explain what active learning means. Also briefly describe how you might use Bagging to address the task of active learning.

b) Assume we have a supervised-learning task where the examples are represented by 26 Boolean features, A-Z. We guess that the true concept is of the form: Literal1 Literal2 Literal3

Where Literali is a one of the features A-Z or its negation and where a given feature can appear at most once in the concept (so C M A is a valid concept, but C M M is not). If 90% of the time we want to learn a concept whose accuracy is at least 95%, how many training examples should we collect?

Page 4 of 10

Name: ______________________

c) Assume that our learning algorithm is to simply (and stupidly) learn the model f(x) = maximum output value seen in the training set.

We want to estimate the error due to bias (in the bias-variance sense) of this algorithm, so we collect a number of possible training sets, where the notation NM means for input N the output is M (i.e., there is one input feature and the output is a single number). { 1 3, 2 2} { 4 5, 3 0 } { 2 2, 4 5 } { 3 0, 3 0 } { 2 2, 1 3 }

Based on this sample of possible training sets, what is the estimated error, due to this algorithms bias, for the input value of 2? Be sure to show your work and explain your answer.

Page 5 of 10

Name: ______________________

Problem 3 Reinforcement Learning (20 points) Consider the deterministic reinforcement environment drawn below (let =0.5). The numbers on the arcs indicate the immediate rewards. Once the agent reaches the end state the current episode ends and the agent is magically transported to the start state. The probability of an exploration step is 0.02.
start -3

-5 2 4 7

-1000

end

a) A one-step, Q-table learner follows the path start b end. On the graph below, show the Q values that have changed, and show your work to the right of the graph. Assume that for all legal actions, the initial values in the Q table are 6.

start

end

b) Starting with the Q table you produced in Part a, again follow the path start b end and show the Q values below that have changed. Show your work to the right.
start

end

Page 6 of 10

Name: ______________________

c) State and informally explain the optimal path from start to end that a Q-table learner will learn after a large number of trials in this environment. (You do not need to show the score of every possible path. The original RL graph appears below for convenience.) start end

d) Repeat Part c but this time assume the SARSA algorithm is being used. start end

e) In class and in the text, a convergence proof for Q learning was presented. If we use a function approximator, this proof no longer applies. Briefly explain why.

Here again is the version of the RL graph with the immediate rewards shown. -3 9

start

-5 2 4 7

-1000

end

Page 7 of 10

Name: ______________________

Problem 4 Experimental Methodology (20 points) a) Assume on some Boolean-prediction task, you train a perceptron on 1000 examples and get 850 correct, then test your learned model on a fresh set of 100 examples and find it predicts 80 correctly. Give an estimate, including the 95% confidence interval, for the expected accuracy on the next 100 randomly drawn examples.

b) Sketch a pair of learning curves that might result from an experiment where one evaluated whether or not a given feature-selection algorithm helped. Be sure to label the axes and informally explain what your curves show.

Why would a learning curve even be used for an experiment like this?

c) Assume you have trained a Bayesian network for a Boolean-valued task. For each of the test-set examples below, the second column reports the probability the trained Bayesian network computed for this example, while the third column lists the correct category. Example 1 3 2 4 5 Probability(Output is True) Correct Category 0.99 positive 0.81 negative 0.53 positive 0.26 negative 0.04 negative

Draw to the right of this table the ROC curve for this ensemble (it is fine to simply connect the dots, that is make your curve piece-wise linear). Be sure to label your axes.

Page 8 of 10

Name: ______________________

Problem 5 Miscellaneous Short Answers (20 points) Briefly define and discuss the importance in machine learning of each of the following: weight decay
definition:

importance:

kernels that compute the distance between graph-based examples [graph here is in the sense of arcs and nodes, as opposed to plots of x vs. f(x)]
definition:

importance:

structure search
definition:

importance:

State and briefly explain two ways that the Random Forest algorithm reduces the chances of overfitting a training set. i)

ii)

Page 9 of 10

Name: ______________________

Feel free to tear off this page and use it for scratch paper.

Page 10 of 10

S-ar putea să vă placă și