Stats216 hw2

2/15/2017 Homework.2.
html
Problem 1.
Suppose we collect data for a group of students in a statistics class with variables X1 = hours studied, X2 =
undergrad GPA, and Y = receive an A. We t a logistic regression and produce estimated coecient, 0 = 4.5,
1 = 0.07, 2 = 1.2.
a. Estimate the probability that a student who studies for 44 hours and has an undergrad GPA of 3.5 gets an A
in the class.
p(X) = 0 + 1X1 + 2X2
#generate coefficients
.0 = -4.5
.1 = 0.07
.2 = 1.2
#making the prediction

prob.Y= (exp(1)^(.0 + (.1*44) + (.2*3.5)))/(1+exp(1)^(.0 + (.1*44) + (.2*3.5)))
prob.Y
## [1] 0.9415854
b. How many hours would the student in part (a) need to study to have a 50% chance of getting an A in the
class?
log(prob.Y/(1-prob.Y)) = 0 + 1X1 + 2X2 log(0.5/(1-0.5)) = -4.5 + 0.07(X1) + 1.2(3.5)
b.X1 = (log(0.5/(1-0.5)) + 4.5 - (1.2*3.5)) / .07

b.X1
## [1] 4.285714
Problem 2.
Binary logistic regression can give poor results when the two classes are perfectly separated by a linear decision
boundary. One way to address this problem is to use the Lasso applied to logistic regression.
a. Write the likelihood function for the logistic regression problem in terms of x, y, 0 and 1. Assume for
simplicity that we have n observations and only one variable (i.e.xi is a real number, for i = 1,,n).
l(0,1) = [(i:yi = 1) p(xi)][(i:y1 = 0) (1p(xi))],
where p(xi) = [(exp(0 + 1xi))/(1 + exp(0 + 1xi))] (Equation 4.5)
b. Show that the likelihood function L(0,1) is always strictly less than 1
As 0 + 1 goes to , p(x) will approach [exp()/(1+exp())], per the logistic function denition. Because exp() is
equal to innity, p(x) will approach 1, but never quite reach it because the numerator will always be 1 less than the
denominator.
If 0 + 1 approaches -, this will also yield [exp(-)/(1+exp(-))], which will never reach one, as the exp() will
prevent the denominator from reaching 1.
le://localhost/Users/alexnutkiewicz/Desktop/Homework.2.html 1/21
2/15/2017 Homework.2.html
Finally, if 0 + 1 = 0, the p(x) would simplify to exp(0)/(1+e(0)) = 0.5.
Because the likelihood function is made of two separate probabilities dened by the equation p(X) =
exp(0+1)/(1+exp(0+1)), the likelihood function will always rest somewhere between 0 and 1.
c. Recall that the logistic regression coecients 0, 1 are obtained by maximizing L(0, 1). Suppose that
all of the xi corresponding to yi = 0 are negative, all other xi are positive. In this case, note that we can get
L(0, 1) arbitrarily close to 1. Explain why this means that 0 and 1 are undened.
When there is a decision boundary and the classes are well separated, the parameters of logistic regression ()
are more unstable. When you maximize the likelihood (aka, as likelihood approaches 1), the model must yield
probabilities that are either 0 or 1. In order to achieve this type of binary classication, the coecients must be
close to +/- . As you increase towards towards , it makes 0 and 1 undened.
d. For computational convenience, it is common to nd 0 and 1 by minimizing the negative log-likelihood

log L(0, 1), rather than by maximizing the likelihood itself. Explain why both these problems yield the
same 0,1.
When we maximize the log-likelihood, were also minimizing the negative log-likelihood. The natural log of
anything is an increasing function, so if the likelihood increases, the log-likelihood will also increase. If we do this
with a negative, that when we nd the maximum negative likelihood, were nding the most negative (or largest)
negative log-likelihood.
Negative log-likelihood decreases as likelihood increases. To maximize one, you minimize the other.
e. Inspired by the Lasso, suggest a way to modify the negative log-likelihood function log L(0, 1) so that
0 and 1 become dened even in the separable case above. Justify your answer.
Taking Equation 6.7 (Lasso formula) from the ISL book, the lasso coecients minimize the function
[yi0jxij]^2 + |j| = RSS + |j|.
If we add the lambda times the sum of the absolute values of the coecients (|j|), we are able to penalize any
coecient that is too large. And as we said, we dont want our coecients to maximize, because that will result in
undened 0 and 1.
Problem 3
Suppose that we estimate the regression coecients in a linear regression model by minimizing (y1 - 0 -
j*xij)^2 subject to |j| s for a particular value of s. For parts (a) through (e), indicate which of i. through v. is
correct. Justify your answer.
a. As we decrease s from to 0, the training RSS will:
a. steadily increase. By decreasing s to 0, we are giving the coecients less budget, so the model is
becoming less exible, which will increase the training RSS.
b. Repeat (a) for test RSS:
e. decrease initially, and then eventually start increasing in U shape. By decreasing s, we are allowing our set
of coecients less budget, which, while it will increase the training RSS, will also lead to overtting that
model, which will eventually cause an increase in test RSS as a result of increased bias.
c. Repeat (a) for variance.
b. steadily decrease. As we decrease s, our model becomes less exible and the coecients will drop as a
result of increased penalties of non-zero coecients. Because of the decreased exibility, there will likewise
be an decrease in variance.
d. Repeat (a) for (squared) bias.
a. steadily increase. As noted earlier, the decrease in exibility in our model resulting from a lower budget s
causes a increased bias because these two concepts are inversely related.
e. Repeat (a) for the irreducible error.
c. remain constant. Irreducible error is independent of our model, regardless of its exibility or bias - so no
change in s should eect our irreducible error.
Problem 4.
This question should be answered using the Weekly dataset, which is part of the ISLR package. This data is
similar in nature to the SMarket data used in section 4.6 of out textbook, except that it contains 1,089 weekly
returns for 21 years, from the beginning of 1990 to the end of 2010.
a. Use the full dataset to perform a logistic regression with Direction as the response and the ve lag variables
plus Volume as predictors. Call your model glm.t. Use the summary function to print the results. Do any of
the predictors appear to be statistically signicant? If so, which ones?
#acquire Smarket dataset

require(ISLR)
## Loading required package: ISLR
data("Weekly")
names(Weekly)
## [1] "Year" "Lag1" "Lag2" "Lag3" "Lag4" "Lag5"

## [7] "Volume" "Today" "Direction"
dim(Weekly)
## [1] 1089 9
#create logistic regression function

glm.fit = glm(Direction~Lag1+Lag2+Lag3+Lag4+Lag5+Volume, data=Weekly, family=binomial)
summary(glm.fit)
##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3 + Lag4 + Lag5 +
## Volume, family = binomial, data = Weekly)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.6949 -1.2565 0.9913 1.0849 1.4579
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.26686 0.08593 3.106 0.0019 **
## Lag1 -0.04127 0.02641 -1.563 0.1181
## Lag2 0.05844 0.02686 2.175 0.0296 *
## Lag3 -0.01606 0.02666 -0.602 0.5469
## Lag4 -0.02779 0.02646 -1.050 0.2937
## Lag5 -0.01447 0.02638 -0.549 0.5833
## Volume -0.02274 0.03690 -0.616 0.5377
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1496.2 on 1088 degrees of freedom
## Residual deviance: 1486.4 on 1082 degrees of freedom
## AIC: 1500.4
##
## Number of Fisher Scoring iterations: 4
Based on our summary, we see that Lag2 has the greatest coecient, but none of our variables have a low
enough p value to be considered reliable, so I would argue that we do not have statistically signicant variables.
This makes sense because our stock market is traditionally so unpredictable. As a note, the negative coecient
on some of our variables suggests that if the stock market went up today, it is less likely to continue going up
tomorrow.
b. Explain what the confusion matrix is telling you about the types of mistakes made by logistic regression.
glm.probs = predict(glm.fit, type = "response")

glm.pred = rep("Down", length(glm.probs))
glm.pred[glm.probs > 0.5] = "Up"
dir.vector = Weekly$Direction
log.table = table(glm.pred, dir.vector)

log.table
## dir.vector
## glm.pred Down Up
## Down 54 48
## Up 430 557
#misclassification rate
sum(diag(log.table))/sum(log.table)
## [1] 0.5610652
Looking at this confusion matrix, we see almost half of the total values wrongly estimated by our logistic
regression model. So looking at our misclassication rate, we see about 56.1% of our predictions as correct. And
based on the values of our confusion matrix, it seems like our model leans towards predicting UP rather than
down.
c. Now t the logistic regression model using a training data period from 1990 to 2007, with Lag1, Lag2, and
Lag3 as the only predictors. Compute the confusion matrix and the overall fraction of correct predictions
for the held out data (that is, the data from 2008 and 2010).
#create training and test sets based on yearly values

train = subset(Weekly, Year<2008)
test = subset(Weekly, Year >= 2008)
#create new logistic regression model

train.glm.fit = glm(Direction~Lag1+Lag2+Lag3, data=train, family=binomial)
summary(train.glm.fit)
##
## Call:
## glm(formula = Direction ~ Lag1 + Lag2 + Lag3, family = binomial,
## data = train)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.457 -1.267 1.012 1.080 1.410
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.23889 0.06698 3.567 0.000361 ***
## Lag1 -0.04643 0.03256 -1.426 0.153810
## Lag2 0.04232 0.03259 1.298 0.194123
## Lag3 -0.01370 0.03235 -0.423 0.671932
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1280.6 on 932 degrees of freedom
## Residual deviance: 1276.2 on 929 degrees of freedom
## AIC: 1284.2
##
## Number of Fisher Scoring iterations: 4
Again, as we saw in part a, none of our coecients have low enough p-values to be considered statistically
signicant.
#confusion matrix
test.glm.probs = predict(train.glm.fit, test, type = "response")
test.glm.pred = rep("Down", length(test.glm.probs))
test.glm.pred[test.glm.probs > 0.5] = "Up"
#create direction vector from train table

dir.vector = test[,"Direction"]
#create confusion matrix table

test.table = table(test.glm.pred, dir.vector)
test.table
## dir.vector
## test.glm.pred Down Up
## Down 10 9
## Up 62 75
#compute test error rate

sum(diag(test.table))/sum(test.table)
## [1] 0.5448718
After computing the test error rate on our new logistic regression model, we see a classication rate indicating a
correct prediction rate of 54.4%.
d. Repeat (c) using LDA. Use library(MASS) to work with the lda() command.
#create LDA model

library(MASS)
lda.fit = lda(Direction~Lag1+Lag2+Lag3, data=train)
lda.fit
## Call:
## lda(Direction ~ Lag1 + Lag2 + Lag3, data = train)
##
## Prior probabilities of groups:
## Down Up
## 0.4415863 0.5584137
##
## Group means:
## Lag1 Lag2 Lag3
## Down 0.29930825 0.07329612 0.2227597
## Up 0.08858541 0.27110173 0.1431939
##
## Coefficients of linear discriminants:
## LD1
## Lag1 -0.33396602
## Lag2 0.30485524
## Lag3 -0.09792095
plot(lda.fit)
Based on the LDA output, 1 = 0.441 and 2 = 0.558. This means that 44.1% of the training observations
correspond to days when the market went down and 55.8% of which correspond to up days. The group means
refer to the average value of each predictor within each class.
The attached plot shows the plot of our linear discriminants.
#create vector of predictions

lda.pred = predict(lda.fit, test)
#create confusion matrix

names(lda.pred)
## [1] "class" "posterior" "x"
lda.class = lda.pred$class
test.direction = test$Direction
lda.table = table(lda.class, test.direction)
lda.table
## test.direction
## lda.class Down Up
## Down 10 9
## Up 62 75
#classification rate
sum(diag(lda.table))/sum(lda.table)
## [1] 0.5448718
Our classication rate is 54.4%, which is equal to our logistic regression model from part (c).
e. Repeat (c) using KNN with K = 1. Invoke library(class) to work with the knn() command. Set your seed to
3017 via set.seed(3017).
#create K-nearest neighbors predictions

library(class)
train.direction = train$Direction
train.X = cbind(train$Lag1, train$Lag2, train$Lag3)
test.X = cbind(test$Lag1, test$Lag2, test$Lag3)
set.seed(3017)
knn.pred = knn(train.X, test.X, train.direction, k=1)
#confusion matrix
knn.table = table(knn.pred,test.direction)
knn.table
## test.direction
## knn.pred Down Up
## Down 28 37
## Up 44 47
#classification rate
sum(diag(knn.table))/sum(knn.table)
## [1] 0.4807692
Our classication rate is now 48.1%, denitely not as good as our LDA or logistic models. This may because our
K=1 value could have resulted in overtting the data.
f. Which of the models from parts (c), (d), and (e) appears to provide the best results on this data?
test.table
## dir.vector
## test.glm.pred Down Up
## Down 10 9
## Up 62 75
sum(diag(test.table))/sum(test.table)
## [1] 0.5448718
lda.table
## test.direction
## lda.class Down Up
## Down 10 9
## Up 62 75
sum(diag(lda.table))/sum(lda.table)
## [1] 0.5448718
knn.table
## test.direction
## knn.pred Down Up
## Down 28 37
## Up 44 47
sum(diag(knn.table))/sum(knn.table)
## [1] 0.4807692
Based on our models from parts (c), (d), and (e), we see that the best results are provided by logistic regression
and LDA models, and the worst model comes from K-nearest neighbors.
g. What is one scenario in which you might expect an LDA model to outperform a logistic regression model?
I would expect an LDA model to outperform a logistic regression model when the classes are well separated. In
this situation, because the distribution of predictors X are modeled separately in each response class, the
parameter estimates for the LDA model are not as unstable as they are in logistic regression. Additionally, if n is
small and the distribution of predictors X is normal in each class, the LDA model is again more stable than the
logistic regression model. LDA is also a popular option when dealing with more than two response classes.
h. What is one scenario in which you might expect a KNN model to outperform a logistic regression model?
I would expect a logistic regression model to outperform an LDA model when the decision boundary is highly
non-linear. In order to make a prediction for an observation, the K training observations closest to x are identied.
X is then assigned to a class where these observations belong. Because of this, KNN is completely non-
parametric because no assumptions are made about the shape of the decision boundary.
Problem 5
a. Load the als.RData le and t a regression model to the training data using the Lasso. Select the
regularization parameter via cross-validation. To do so, youll need to install (and then load) the package
glmnet. Use the function cv.glmnet() within this package to t the model with cross-validation. Before you
do so, use set.seed(3017) to ensure you get the same results as in the solutions. Store the result of
cv.glmnet() in a variable called lasso.cv.
require(glmnet)
## Loading required package: glmnet
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-5
load("/Users/alexnutkiewicz/Desktop/als.rdata")
#create grid of potential lambda values

grid=10^seq(10,-2,length=100)
#create lasso model, alpha = 1 refers to lasso method
lasso.mod = glmnet(as.matrix(train.X), train.y, alpha = 1, lambda = grid)
plot(lasso.mod,xvar="lambda",xlim=c(-5,5),label=TRUE)
#select regularization parameter via cross-validation

set.seed(3017)
lasso.cv = cv.glmnet(as.matrix(train.X), train.y, alpha = 1)
b. Produce a plot via plot(lasso.cv) to visualize the cross-validated error for dierent values of your parameter.
#create plot
plot(lasso.cv)
Looking at this plot, we see the very familiar U-shape curve, representing the transition from an optimized t to
overt of our model. So based on the curve here, we see that our log(Lambda) value minimizes around -4.
However, this also requires a model with about 52 parameters - pretty complex. To simplify our model, we
traditionally look at models about one standard error away, the right-most dotted line. This gives us a log(Lambda)
value of about -3 and will only require 16 parametersmust better!
c. Lets use the 1-standard-error rule to pick the tuning parameter. Print the value of the my.lambda.
my.lambda = lasso.cv$lambda.1se
my.lambda
## [1] 0.04908192
Based on our calculation, we want a lambda of about 0.049.
d. Display the predictors with non-zero coecients. Out of the 323 original predictors, how many have non-
zero coecients?
nonzero = predict(lasso.cv, s = 'lambda.1se', type = 'nonzero')

colnames(train.X)[c(unlist(nonzero))]
## [1] "Onset.Delta" "sd.alsfrs.score"

## [3] "alsfrs.score.slope" "last.speech"
## [5] "meansquares.speech" "meansquares.dressing"
## [7] "fvc.liters.slope" "min.slope.alsfrs.score"
## [9] "sum.slope.alsfrs.score" "min.slope.speech"
## [11] "sum.slope.speech" "min.slope.turning"
## [13] "mean.slope.svc.liters" "mean.slope.weight"
Based on the above code, we see 14 predictors with non-zero coecients.
e. Compute the test RMSE for the model t in part (b), using the regularization parameter chosen in part (c).
lasso.pred = predict(lasso.mod, s = my.lambda, newx = as.matrix(test.X))

sqrt(mean((lasso.pred - test.y)^2))
## [1] 0.5209173
We have a RMSE of 0.521.
f. Repeat parts (a,d,e) using ridge regression instead of the Lasso. Set again the seed to set.seed(2016)
before calling cv.glmnet(). Store the result of cv.glmnet() in a variable called ridge.cv. Comment on your
ndings.
#create lasso model, alpha = 0 refers to ridge method

set.seed(2016)
ridge.mod = glmnet(as.matrix(train.X), train.y, alpha = 0, lambda = grid)
#select regularization parameter via cross-validation

ridge.cv = cv.glmnet(as.matrix(train.X), train.y, alpha = 0)
plot(ridge.cv)
Were
seeing a plot also with a U-shape, although occurring much earlier on in our model. Also note the 323 values
across the top axis of our plot. This occurs because of the fundamental theory of ridge regression: all coecients
are minimized towards zero, but dont quite reach it. That is one of the benets of using lasso!
#select optimal tuning parameter

ridge.lambda = ridge.cv$lambda.1se
ridge.lambda
## [1] 2.500301
#display predictors with non-zero coefficients

ridge.out = glmnet(as.matrix(train.X), train.y, alpha = 0)
ridge.coef = predict(ridge.out, type="coefficients", s=ridge.lambda)[1:323,]
ridge.coef[ridge.coef != 0] #we would expect this to be 323
## (Intercept) Onset.Delta
## -9.471124e-02 -7.263746e-05
## Symptom.Speech Symptom.WEAKNESS
## -1.184925e-02 3.682292e-03
## Site.of.Onset.Onset..Bulbar Site.of.Onset.Onset..Limb
## -1.329722e-02 1.237215e-02
## Race...Caucasian Age
## -1.176081e-02 1.780190e-04
## Sex.Female Sex.Male
## -1.368859e-02 -6.092950e-03
## Mother Family
## 3.608844e-04 2.983456e-03
## Study.Arm.PLACEBO Study.Arm.ACTIVE
## -9.718710e-03 -1.987171e-04
## max.alsfrs.score min.alsfrs.score
## -3.543227e-04 8.562095e-05
## last.alsfrs.score mean.alsfrs.score
## 2.784219e-04 -1.525390e-04
## num.alsfrs.score.visits sum.alsfrs.score
## -3.259999e-04 -2.493122e-05
## first.alsfrs.score.date last.alsfrs.score.date
## 2.956470e-05 -5.656635e-06
## meansquares.alsfrs.score sd.alsfrs.score
## 3.870616e-06 -9.213446e-03
## alsfrs.score.slope max.speech
## 6.064759e-03 4.370285e-03
## min.speech last.speech
## 4.786069e-03 5.840760e-03
## mean.speech sum.speech
## 4.682628e-03 8.158483e-04
## meansquares.speech sd.speech
## 1.224233e-03 -2.052212e-02
## speech.slope max.salivation
## 2.948051e-02 -4.644490e-03
## min.salivation last.salivation
## 1.078356e-03 1.063275e-03
## mean.salivation sum.salivation
## -1.367844e-03 -4.882882e-04
## meansquares.salivation sd.salivation
## -5.267394e-05 -2.070416e-02
## salivation.slope max.swallowing
## 1.566105e-02 8.485103e-03
## min.swallowing last.swallowing
## 4.606128e-03 6.692572e-03
## mean.swallowing sum.swallowing
## 6.025562e-03 7.362275e-04
## meansquares.swallowing sd.swallowing
## 1.068585e-03 -3.268278e-03
## swallowing.slope max.handwriting
## 2.740736e-02 -5.842774e-03
## min.handwriting last.handwriting
## -2.837310e-03 -2.053043e-03
## mean.handwriting sum.handwriting
## -5.002289e-03 -1.086859e-03
## meansquares.handwriting sd.handwriting
## -4.949645e-04 -8.674367e-03
## handwriting.slope max.cutting
## 9.294557e-03 -6.583103e-04
## min.cutting last.cutting
## 6.720428e-04 7.715364e-04
## mean.cutting sum.cutting
## -2.572927e-04 3.939645e-06
## meansquares.cutting sd.cutting
## 5.409603e-04 -1.276643e-02
## cutting.slope max.dressing
## 3.929666e-03 2.464514e-03
## min.dressing last.dressing
## 1.643566e-03 3.202094e-03
## mean.dressing sum.dressing
## 2.226016e-03 5.362644e-04
## meansquares.dressing sd.dressing
## 1.159819e-03 5.259115e-03
## dressing.slope max.turning
## 2.274409e-03 -1.233876e-03
## min.turning last.turning
## 2.383256e-03 1.100549e-03
## mean.turning sum.turning
## 6.010031e-04 1.186888e-04
## meansquares.turning sd.turning
## 6.292235e-04 -2.211976e-02
## turning.slope max.walking
## 1.548778e-02 -3.571445e-03
## min.walking last.walking
## -2.350995e-03 -1.748790e-03
## mean.walking sum.walking
## -2.868929e-03 -5.864747e-04
## meansquares.walking sd.walking
## -3.792148e-04 -8.783336e-03
## walking.slope max.climbing.stairs
## 1.717149e-02 -2.760396e-03
## min.climbing.stairs last.climbing.stairs
## -1.715845e-03 -2.132804e-03
## mean.climbing.stairs sum.climbing.stairs
## -2.602363e-03 -6.057730e-04
## meansquares.climbing.stairs sd.climbing.stairs
## -4.653829e-04 -1.016849e-02
## climbing.stairs.slope max.fvc.liters
## 6.810588e-03 6.529707e-04
## min.fvc.liters last.fvc.liters
## 2.187014e-03 3.390748e-03
## mean.fvc.liters num.fvc.liters.visits
## 1.336267e-03 -1.817522e-03
## sum.fvc.liters last.fvc.liters.date
## 1.221746e-04 9.312853e-05
## meansquares.fvc.liters sd.fvc.liters
## 1.638917e-04 -3.349519e-02
## fvc.liters.slope lessthan2.fvc.liters
## 7.189631e-02 -7.577860e-03
## no.fvc.liters.data max.svc.liters
## -7.653768e-03 4.392341e-03
## min.svc.liters last.svc.liters
## 6.284036e-03 7.548913e-03
## mean.svc.liters num.svc.liters.visits
## 5.519187e-03 -3.281258e-02
## sum.svc.liters last.svc.liters.date
## 8.939294e-04 -9.671412e-04
## meansquares.svc.liters sd.svc.liters
## 8.110504e-04 -1.887875e-02
## svc.liters.slope max.weight
## 5.887447e-02 -1.819626e-05
## min.weight last.weight
## -6.713220e-07 1.137940e-05
## mean.weight num.weight.visits
## -4.528221e-06 1.864666e-03
## sum.weight last.weight.date
## 1.326025e-05 2.423986e-05
## meansquares.weight sd.weight
## 2.559118e-07 -1.934736e-03
## weight.slope lessthan2.weight
## 3.687717e-03 -2.139268e-03
## max.height min.height
## -3.486922e-04 -3.382450e-04
## mean.height num.height.visits
## -3.436043e-04 -8.058173e-03
## sum.height last.height.date
## -4.613516e-05 -2.884705e-04
## meansquares.height no.height.data
## -1.054976e-06 1.301104e-02
## max.resp.rate min.resp.rate
## 3.262835e-04 -8.683191e-04
## last.resp.rate mean.resp.rate
## -2.326800e-04 -1.888957e-05
## num.resp.rate.visits sum.resp.rate
## -4.702950e-04 3.327157e-05
## last.resp.rate.date meansquares.resp.rate
## -3.087147e-05 7.385420e-06
## sd.resp.rate resp.rate.slope
## 2.368167e-03 -8.246077e-04
## lessthan2.resp.rate no.resp.rate.data
## -5.292537e-03 -6.020873e-03
## max.bp.diastolic min.bp.diastolic
## -6.307081e-04 -3.642241e-04
## last.bp.diastolic mean.bp.diastolic
## -3.240846e-04 -5.405992e-04
## num.bp.diastolic.visits sum.bp.diastolic
## 2.984199e-03 2.122371e-05
## last.bp.diastolic.date meansquares.bp.diastolic
## -1.245626e-05 -3.298911e-06
## sd.bp.diastolic bp.diastolic.slope
## -4.441197e-04 7.906537e-05
## lessthan2.bp.diastolic no.bp.diastolic.data
## -2.573690e-03 -3.489445e-03
## max.bp.systolic min.bp.systolic
## -7.192745e-05 1.031548e-05
## last.bp.systolic mean.bp.systolic
## -3.655396e-05 -4.361626e-05
## num.bp.systolic.visits sum.bp.systolic
## 3.133594e-03 2.096201e-05
## meansquares.bp.systolic sd.bp.systolic
## -7.901380e-08 -3.988512e-04
## bp.systolic.slope max.slope.alsfrs.score
## -4.841882e-04 1.471886e-03
## min.slope.alsfrs.score last.slope.alsfrs.score
## 2.548648e-03 1.324987e-03
## mean.slope.alsfrs.score num.slope.alsfrs.score.visits
## 5.451058e-03 -1.245137e-03
## sum.slope.alsfrs.score first.slope.alsfrs.score.date
## 1.775126e-03 1.022332e-04
## last.slope.alsfrs.score.date meansquares.slope.alsfrs.score
## 6.915809e-05 -5.133320e-05
## sd.slope.alsfrs.score slope.alsfrs.score.slope
## -2.034297e-03 9.933424e-04
## max.slope.speech min.slope.speech
## 6.566780e-03 1.290735e-02
## last.slope.speech mean.slope.speech
## 7.586128e-03 4.104672e-02
## sum.slope.speech meansquares.slope.speech
## 1.511597e-02 -9.847950e-03
## sd.slope.speech slope.speech.slope
## -1.118279e-02 2.845038e-03
## max.slope.salivation min.slope.salivation
## -4.045782e-03 -1.177369e-05
## last.slope.salivation mean.slope.salivation
## 2.304175e-04 -2.653829e-03
## sum.slope.salivation meansquares.slope.salivation
## -2.980576e-03 8.125658e-05
## sd.slope.salivation slope.salivation.slope
## -4.044578e-03 1.056792e-02
## max.slope.swallowing min.slope.swallowing
## -4.825094e-03 1.403095e-03
## last.slope.swallowing mean.slope.swallowing
## -1.370731e-02 -3.752619e-03
## sum.slope.swallowing meansquares.slope.swallowing
## -2.723125e-03 -3.115666e-03
## sd.slope.swallowing slope.swallowing.slope
## -5.361264e-03 -3.091051e-03
## max.slope.handwriting min.slope.handwriting
## 9.363576e-04 5.415301e-03
## last.slope.handwriting mean.slope.handwriting
## 5.810347e-03 1.445528e-02
## sum.slope.handwriting meansquares.slope.handwriting
## 5.421528e-03 6.135562e-04
## sd.slope.handwriting slope.handwriting.slope
## -6.246278e-03 6.592339e-04
## max.slope.cutting min.slope.cutting
## -2.462259e-03 6.854423e-03
## last.slope.cutting mean.slope.cutting
## 1.803098e-03 7.171304e-03
## sum.slope.cutting meansquares.slope.cutting
## 3.817408e-03 -2.278919e-03
## sd.slope.cutting slope.cutting.slope
## -8.811983e-03 1.355919e-03
## max.slope.dressing min.slope.dressing
## 7.042798e-03 3.270186e-03
## last.slope.dressing mean.slope.dressing
## 3.802627e-03 1.297696e-02
## sum.slope.dressing meansquares.slope.dressing
## 5.840847e-03 1.880906e-03
## sd.slope.dressing slope.dressing.slope
## 3.150878e-03 1.247906e-03
## max.slope.turning min.slope.turning
## -6.249151e-04 1.095738e-02
## last.slope.turning mean.slope.turning
## 6.260142e-03 1.155853e-02
## sum.slope.turning meansquares.slope.turning
## 2.528820e-03 1.161036e-04
## sd.slope.turning slope.turning.slope
## -7.083757e-03 -1.163152e-03
## max.slope.walking min.slope.walking
## 3.341237e-03 3.754184e-03
## last.slope.walking mean.slope.walking
## -6.928144e-03 6.586786e-03
## sum.slope.walking meansquares.slope.walking
## 3.156188e-03 -1.515454e-04
## sd.slope.walking slope.walking.slope
## -2.291958e-03 -1.070850e-02
## max.slope.climbing.stairs min.slope.climbing.stairs
## 3.536874e-03 3.830096e-03
## last.slope.climbing.stairs mean.slope.climbing.stairs
## 7.400436e-03 1.642270e-02
## sum.slope.climbing.stairs meansquares.slope.climbing.stairs
## 4.947554e-03 1.962554e-03
## sd.slope.climbing.stairs slope.climbing.stairs.slope
## -2.320875e-03 3.161874e-04
## max.slope.fvc.liters min.slope.fvc.liters
## 6.604573e-04 1.699255e-03
## last.slope.fvc.liters mean.slope.fvc.liters
## 1.959646e-02 2.339126e-02
## num.slope.fvc.liters.visits sum.slope.fvc.liters
## -5.280753e-03 4.838410e-03
## first.slope.fvc.liters.date last.slope.fvc.liters.date
## 5.267614e-04 -6.146828e-05
## meansquares.slope.fvc.liters sd.slope.fvc.liters
## -9.683821e-04 -2.075210e-03
## slope.fvc.liters.slope lessthan2.slope.fvc.liters
## 4.771321e-03 -6.054746e-03
## no.slope.fvc.liters.data max.slope.svc.liters
## -8.882399e-03 3.753652e-02
## min.slope.svc.liters last.slope.svc.liters
## 2.019616e-02 8.437915e-02
## mean.slope.svc.liters first.slope.svc.liters.date
## 8.993814e-02 -1.805908e-03
## last.slope.svc.liters.date meansquares.slope.svc.liters
## -2.157554e-03 -3.095133e-03
## sd.slope.svc.liters slope.svc.liters.slope
## 2.813078e-03 -3.217525e-02
## max.slope.weight min.slope.weight
## 1.635580e-03 4.090876e-04
## last.slope.weight mean.slope.weight
## 3.684090e-03 6.708575e-03
## num.slope.weight.visits sum.slope.weight
## 1.394721e-03 1.540022e-03
## first.slope.weight.date last.slope.weight.date
## 6.058102e-06 -2.601744e-04
## meansquares.slope.weight sd.slope.weight
## 1.006160e-04 1.095619e-03
## slope.weight.slope lessthan2.slope.weight
## -3.368484e-04 -1.101088e-03
## first.slope.height.date max.slope.resp.rate
## 9.919440e-04 4.008932e-04
## min.slope.resp.rate last.slope.resp.rate
## -4.800272e-04 -1.177635e-04
## mean.slope.resp.rate num.slope.resp.rate.visits
## -6.331433e-04 -1.868482e-03
## sum.slope.resp.rate first.slope.resp.rate.date
## -2.480037e-04 2.539826e-04
## last.slope.resp.rate.date meansquares.slope.resp.rate
## -2.911917e-04 7.774687e-05
## sd.slope.resp.rate slope.resp.rate.slope
## 7.299060e-04 7.020859e-05
## lessthan2.slope.resp.rate no.slope.resp.rate.data
## -4.970226e-03 -5.947214e-03
## max.slope.bp.diastolic min.slope.bp.diastolic
## -2.616815e-05 -3.895980e-05
## last.slope.bp.diastolic mean.slope.bp.diastolic
## 2.286470e-04 -2.251192e-04
## num.slope.bp.diastolic.visits sum.slope.bp.diastolic
## 3.218326e-03 -7.302417e-05
## first.slope.bp.diastolic.date last.slope.bp.diastolic.date
## -6.319174e-05 -3.499944e-04
## meansquares.slope.bp.diastolic sd.slope.bp.diastolic
## 1.208004e-06 -3.203965e-05
## slope.bp.diastolic.slope lessthan2.slope.bp.diastolic
## 1.439710e-04 -1.160081e-03
## max.slope.bp.systolic min.slope.bp.systolic
## -6.640468e-05 -3.789883e-05
## last.slope.bp.systolic mean.slope.bp.systolic
## 4.198933e-04 -1.389061e-04
## num.slope.bp.systolic.visits sum.slope.bp.systolic
## 3.421348e-03 -3.919103e-05
## first.slope.bp.systolic.date meansquares.slope.bp.systolic
## -6.448544e-05 -1.130313e-06
## sd.slope.bp.systolic
## -5.369808e-05
Because ridge regression models never truly allow for parameters to reach zero, displayed above are all 323
nonzero coecients.
#find test RMSE for ridge regression model

ridge.pred = predict(ridge.mod, s=ridge.lambda, newx = as.matrix(test.X))
sqrt(mean((ridge.pred - test.y)^2))
## [1] 0.5356745
In calculating our RMSE for the ridge regression model, we get a slighly higher value of 0.536 than our lasso
model.

Stats216 hw2

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Stats216 hw2

Încărcat de

Drepturi de autor:

Formate disponibile

2/15/2017 Homework.2.

p(X) = 0 + 1X1 + 2X2

#making the prediction

log(prob.Y/(1-prob.Y)) = 0 + 1X1 + 2X2 log(0.5/(1-0.5)) = -4.5 + 0.07(X1) + 1.2(3.5)

b.X1 = (log(0.5/(1-0.5)) + 4.5 - (1.2*3.5)) / .07

l(0,1) = [(i:yi = 1) p(xi)][(i:y1 = 0) (1p(xi))],

where p(xi) = [(exp(0 + 1xi))/(1 + exp(0 + 1xi))] (Equation 4.5)

Finally, if 0 + 1 = 0, the p(x) would simplify to exp(0)/(1+e(0)) = 0.5.

d. For computational convenience, it is common to nd 0 and 1 by minimizing the negative log-likelihood

[yi0jxij]^2 + |j| = RSS + |j|.

a. As we decrease s from to 0, the training RSS will:

b. Repeat (a) for test RSS:

c. Repeat (a) for variance.

d. Repeat (a) for (squared) bias.

e. Repeat (a) for the irreducible error.

#acquire Smarket dataset

## Loading required package: ISLR

## [1] "Year" "Lag1" "Lag2" "Lag3" "Lag4" "Lag5"

#create logistic regression function

glm.probs = predict(glm.fit, type = "response")

log.table = table(glm.pred, dir.vector)

#create training and test sets based on yearly values

#create new logistic regression model

#create direction vector from train table

#create confusion matrix table

#compute test error rate

#create LDA model

The attached plot shows the plot of our linear discriminants.

#create vector of predictions

#create confusion matrix

## [1] "class" "posterior" "x"

#create K-nearest neighbors predictions

## Loading required package: glmnet

## Loading required package: Matrix

## Loading required package: foreach

## Loaded glmnet 2.0-5

#create grid of potential lambda values

#select regularization parameter via cross-validation

Based on our calculation, we want a lambda of about 0.049.

nonzero = predict(lasso.cv, s = 'lambda.1se', type = 'nonzero')

## [1] "Onset.Delta" "sd.alsfrs.score"

Based on the above code, we see 14 predictors with non-zero coecients.

lasso.pred = predict(lasso.mod, s = my.lambda, newx = as.matrix(test.X))

We have a RMSE of 0.521.

#create lasso model, alpha = 0 refers to ridge method

#select regularization parameter via cross-validation

#select optimal tuning parameter

#display predictors with non-zero coefficients

#find test RMSE for ridge regression model

S-ar putea să vă placă și