Documente Academic
Documente Profesional
Documente Cultură
01
(a)
Code:
mydata <- read.table("clipboard",header=T)
attach(mydata)
model <- lm(y ~ x1 + x2 + x3, data= mydata)
summary(model)
Output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13354.6016 6485.4190 2.059 0.0619 .
x1 -36.2819 6.3563 -5.708 9.79e-05 ***
x2 26.3375 10.1264 2.601 0.0232 *
x3 -0.1925 0.3069 -0.627 0.5422
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation:
Coefficient of x1 is -36.2819 means 1 unit change in wholesale price of rose$/dozen causes -36.2819 unit
average change in sales of quantity of rose.
Coefficient of x2 is 26.3375 and means 1 unit change in wholesale price of rose$/dozen causes 26.3375
unit average change in sales of quantity of rose.
Coefficient of x3 is -0.1925 means 1 unit change in wholesale price of rose$/dozen causes -0.1925 unit
average change in sales of quantity of rose.
(b)
Code:
c <- c(0,1,-1,0)
b <- coef(model)
est <- sum(c*b)
sd <- sqrt(t(c)%*%vcov(model)%*%c)
t <- est/sd
pval <- 2*(1-pt(abs(t),12))
pval
Output:
[,1]
[1,] 0.001010749
Interpretation:
Since p value 0.001010749 is less than 0.05 we have enough evidence to conclude that null hypothesis
𝐻0 : 𝛽1 = 𝛽2 is rejected. Hence effect of x1 and x2 on y significantly differs.
(c)
Code:
ll <- est + qt(0.025,12)*sd
ul <- est + qt(1-0.025,12)*sd
ci <- c(ll,ul)
ci
Output:
[1] -94.26238 -30.97643
Interpretation:
Upper limit of the confidence interval is -30.97643 and lower limit of the confidence interval is -
94.26238.
(d)
Code:
nestedmodel <- lm(y ~ x1, data= mydata)
anova(nestedmodel, model)
Output:
Analysis of Variance Table
Model 1: y ~ x1
Model 2: y ~ x1 + x2 + x3
Res.Df RSS Df Sum of Sq F Pr(>F)
1 14 24105953
2 12 13900824 2 10205128 4.4048 0.03677 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation:
Since P value is 0.03677 *< 0.05. We have enough evidence to conclude that covariates x2 & x3 effect on
y is not zero.
Practical No. 02
Hypothesis:
H0: All treatments are equal
H1: At least two of them are unequal
Assumpltions:
1. Observations are normally distributed
2. Observations are independent
3. Observations have equal population variances
Rcode:
y <-c(82,71,64,93,62,73,61,85,87,74,94,91,69,78,56,70,66,78,53,71,87)
treat <-as.factor(rep(c("Y1","Y2","Y3"),7))
crd <- data.frame(treat,y)
anova(model)
Output:
Analysis of Variance Table
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
treat 2 88.67 44.333 0.2837 0.7563
Residuals 18 2812.57 156.254
Interpretation:
As the p-value is 0.7563 which is greater than 0.05 we have enough evidence to accept the null
hypothesis.
Practical No 03
R code:
> gpa<-c(2.4,3.9,2.8,2,3,2.7,2.1,4,2.8,3,3.9,3.3,3.1,3.8,3)
> method<-as.factor(rep(c("Online","Lecture","Hybrid"),times=5))
> instructor<-as.factor(rep(1:5,each=3))
> rbd<-data.frame(gpa,method,instructor)
Assumptions:
Hypothesis:
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3
here,
R code:
> m<-lm(gpa~method+instructor)
> anova(m)
Output:
Response: gpa
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation:
So, we have enough evidence to reject the null hypothesis at 5% level of significance.
From the Anova table, we get additional information that, the effect of is also significant in this case.
Practical No. 04
Code:
mydf <- read.table("clipboard",header=T)
attach(mydf)
head(mydf)
model <- glm(Admit ~ SAT + GPA + as.factor(Race), family="binomial")
summary(model)
Output:
Call:
glm(formula = Admit ~ SAT + GPA + as.factor(Race), family = "binomial")
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5420 -0.8761 -0.6408 1.1510 2.0998
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.1535282 1.0638161 -3.904 9.45e-05 ***
SAT 0.0010930 0.0005273 2.073 0.038203 *
GPA 0.8304239 0.3127104 2.656 0.007917 **
as.factor(Race)2 -0.5188735 0.2988188 -1.736 0.082490 .
as.factor(Race)3 -1.1671195 0.3268685 -3.571 0.000356 ***
as.factor(Race)4 -1.4326967 0.4076618 -3.514 0.000441 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Code:
mydata <- data.frame(SAT=1600,GPA=4,Race=factor(1)) #for SAT score 1600, GPA 4 and Race 1.
predict(model,mydata,type="response")
Output:
1
0.7144157
Interpretation:
Race (2): Race (2) have (1-exp(-0.5188735))% lower odd of getting admission
compared to Race(1), keeping all other variables at a fixed level. Under 5% level of
significance it is insignificant.
Race (3): Race (3) have (1-exp(-1.1671195))% lower odd of getting admission
compared to Race(1), keeping all other variables at a fixed level. Under 5% level of
significance it is significant.
Practical No 05
R Code
x1<-c(3,5,5,7,7,7,8,9,10,11)
x2<-c(2.3,1.9,1,.7,.3,1,1.05,.45,.7,.3)
X<-matrix(c(x1,x2),ncol=2)
xbar<-apply(X,2,mean)
S<-cov(X)
d2<-c()
for(i in 1:nrow(X)){
d2[i]<-(X[i,]-xbar)%*%solve(S)%*%(X[i,]-xbar)}
d2
D2<-sort(d2)
D2
Q<-c()
for(i in 1:length(D2))
{
Q[i]<-qchisq(((i-.5)/10),2)
}
plot(Q,D2)
Output
a) Squared distances
d2
[1] 4.058682373 2.109580755 2.107431801 0.636114372 3.265479436
0.007903377 0.521861631 0.647933621 2.059080327 2.585932308
b) Ordered distances
D2
[1] 0.007903377 0.521861631 0.636114372 0.647933621 2.059080327
2.107431801 2.109580755 2.585932308 3.265479436 4.058682373
Fig Chi Square plot
c)
Hypothesis
H0: Mean vectors (fever, Pressure and Aches) for treatment and placebo are equal
H1: Mean vectors (fever, Pressure and Aches) for treatment and placebo are unequal
Rcode
Sp <- (cov(treat)+cov(placebo))/(20+18-2)
T2 <- t(X1-X2)%*%solve(((1/20)+(1/18))*Sp)%*%(X1-X2)
Output
2.175351e-08
Interpretation
As the p value is 2.175351e-08 our null hypothesis is rejected at 5% level of significance.
Practical No 07
R code
Output
Test statistic = 13.63931, P Value = 0.02082779
Comment:
Since p –value is less than .05 so we have enough evidence to reject null
hypothesis. Thus we can say that mean values for either BOD or SS differs in state
lab and private lab at 5% level of significance.
Practical No 08
Creating data:
y<-c(28,36,18,31,25,32,19,30,27,32,23,29)
A<-as.factor(rep(1:2,times=6))
B<-as.factor(rep(1:2,each=2,times=3))
data<-data.frame(A,B,y)
out<-lm(y~A+B+A*B)
#Checking normality assumption
For checking normality Assumption we can simply examine the Q-Q plot of the
error terms of our model(out). The codes are
qqnorm(out$resid)
qqline(out$resid)
Comment:
The above figure presents the Q-Q plot.Since in the Q-Q plot most of the
points lie on the 450 straight line, we can say that the residuals
approximately follow normal distribution.
Comment
From the above plot we can say that there is homoscedasticity because the data
do not follow any systematic pattern.
time.order<-c(sample(1:12))
time.order
[1] 10 5 7 11 3 12 6 1 2 9 8 4
plot(time.order,out$resid)
abline(h=0)
Figure10.3 :Checking independency of the observations
Comment:
From the above plot it is clear that there is no pattern. So we can conclude that
the observations are independent.
There from the analysis we can conclude that all of the assumptions are fulfilled
for the given data.
Analysis of variance
In analysis of variance we interested in testing the following hypothesis
Well firstly we will judge the interaction plot to see if there is any interaction
present or not. For this we will use the following codes.
par(mfrow=c(1,2))
interaction.plot(A,B,y)
interaction.plot(B,A,y)
Now we need to carry out an analysis of variance for the given data.for this
purpose we will use the following r-code.
out<-lm(y~A+B+A*B)
anova(out)
We get the following analysis of variance table.
Analysis of Variance Table
Response: y
Comment :
Since for main effect of A, p-value < .001 , we can reject the Null hypothesis (𝐻01 )
i.e. the main effect of A is significant.
Since for main effect of B, p-value < .01 , we can reject the Null hypothesis (𝐻02 )
i.e. the main effect of B is significant.
Since for interaction effect of A & B, p-value > .1 , we cannot reject the Null
hypothesis (𝐻03 ) i.e. the interaction effect of A & B is insignificant.
Ans of question 2
Yes there are two significant main effects A and B. But there is no significant
interaction effect.
Problem 9
Assumption:
1. The data is under random censoring.
2. The survival probability is same in the interval
[ tj , tj+1)
3. Survival function is a non-increasing function.
R Code:
library(survival)
Time<-c(3,4,5,22,34, 2, 3,4, 7,11)
Indicator<-c(1,0, 1, 1, 1, 1, 1, 1, 1, 0)
Group<-c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
data<-data.frame(Time, Indicator, Group)
km.both.group<-survfit(Surv(Time,Indicator)~Group,data=data)
summary(km.both.group)
Output
Group=1
34 1 1 0.000 NaN NA NA
Group=2
From above tables we can see the survival probabilities for both groups.
Now for sketching the survival probabilities for both group in the same graph we
use the following codes.
plot(km.both.group,xlab="Time",ylab="survival
probability",col=c("red","blue"),lwd=2)
legend(25,.9,c("Yes","No"),col=c("red","blue"),lwd=2)
From the graph we can see that those subjects who are using drug are
experiencing better survival than those who don’t use drug.so the next thing is to
do is to test if the survival experience differ significantly between these two
groups.
Test of equality
Here we want to test the following hypothesis
eq.test<-survdiff(Surv(Time,Indicator)~Group,data=data)
eq.test
Call:
Interpretation:
The p-value is less than .05 so we do not have enough evidence to reject the hull
hypothesis. That is we can say the two survival functions are equal.
Problem 10
R-code:
data<-read.table("drugs.txt",header=T)
library(survival)
data
1 3 1 1
2 4 0 1
3 5 1 1
4 22 1 1
5 34 1 1
6 2 1 2
7 3 1 2
8 4 1 2
9 7 1 2
10 11 0 2
km.both.group<-survfit(Surv(Time,Indicator)~Group,data=data)
summary(km.both.group)
34 1 1 0.000 NaN NA NA
Group=1 (yes)
Time n.risk n.event survival
3 5 1 0.800
5 3 1 0.533
22 2 1 0.267
34 1 1 0.000
Group=2 (no)
Group=2 (No)
time n.risk n.event survival
2 5 1 0.8
3 4 1 0.6
4 3 1 0.4
7 2 1 0.2
#Survival curves for two groups
plot(km.both.group,xlab="Time",ylab="survival
probability",col=c("red","blue"),lwd=2)
legend(25,.9,c("Yes","No"),col=c("red","blue"),lwd=2)
Hypothesis
Ho:𝑆𝑦𝑒𝑠 (t)= 𝑆𝑁𝑜 (t)
H1: 𝑆𝑦𝑒𝑠 (t)≠ 𝑆𝑁𝑜 (t)
Where,
Syes(t)=population survival probability for drug used group.
Sno(t)= population survival probability for drug not used group.
eq.test<-survdiff(Surv(Time,Indicator)~Group,data=data)
eq.test
Call:
>
Comment:
As p value>0.05
so for 5% level of significance we cannot reject the null hypothesis that is the two
drug used groups survival functions are same. Therfore we can conclude the
survival functions for both groups are same.