Assignment R New 1

Practical No.
01
(a)
Code:
mydata <- read.table("clipboard",header=T)
attach(mydata)
model <- lm(y ~ x1 + x2 + x3, data= mydata)
summary(model)
Output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13354.6016 6485.4190 2.059 0.0619 .
x1 -36.2819 6.3563 -5.708 9.79e-05 ***
x2 26.3375 10.1264 2.601 0.0232 *
x3 -0.1925 0.3069 -0.627 0.5422
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation:
Coefficient of x1 is -36.2819 means 1 unit change in wholesale price of rose$/dozen causes -36.2819 unit
average change in sales of quantity of rose.
Coefficient of x2 is 26.3375 and means 1 unit change in wholesale price of rose$/dozen causes 26.3375
unit average change in sales of quantity of rose.
Coefficient of x3 is -0.1925 means 1 unit change in wholesale price of rose$/dozen causes -0.1925 unit
average change in sales of quantity of rose.
(b)
Code:
c <- c(0,1,-1,0)
b <- coef(model)
est <- sum(c*b)
sd <- sqrt(t(c)%*%vcov(model)%*%c)
t <- est/sd
pval <- 2*(1-pt(abs(t),12))
pval
Output:
[,1]
[1,] 0.001010749
Interpretation:
Since p value 0.001010749 is less than 0.05 we have enough evidence to conclude that null hypothesis
𝐻0 : 𝛽1 = 𝛽2 is rejected. Hence effect of x1 and x2 on y significantly differs.
(c)
Code:
ll <- est + qt(0.025,12)*sd
ul <- est + qt(1-0.025,12)*sd
ci <- c(ll,ul)
ci
Output:
[1] -94.26238 -30.97643
Interpretation:
Upper limit of the confidence interval is -30.97643 and lower limit of the confidence interval is -
94.26238.
(d)
Code:
nestedmodel <- lm(y ~ x1, data= mydata)
anova(nestedmodel, model)
Output:
Analysis of Variance Table
Model 1: y ~ x1
Model 2: y ~ x1 + x2 + x3
Res.Df RSS Df Sum of Sq F Pr(>F)
1 14 24105953
2 12 13900824 2 10205128 4.4048 0.03677 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation:
Since P value is 0.03677 *< 0.05. We have enough evidence to conclude that covariates x2 & x3 effect on
y is not zero.
Practical No. 02
Hypothesis:
H0: All treatments are equal
H1: At least two of them are unequal
Assumpltions:
1. Observations are normally distributed
2. Observations are independent
3. Observations have equal population variances
Rcode:
y <-c(82,71,64,93,62,73,61,85,87,74,94,91,69,78,56,70,66,78,53,71,87)
treat <-as.factor(rep(c("Y1","Y2","Y3"),7))
crd <- data.frame(treat,y)
anova(model)
Output:
Response: y
Df Sum Sq Mean Sq F value Pr(>F)
treat 2 88.67 44.333 0.2837 0.7563
Residuals 18 2812.57 156.254
Interpretation:
As the p-value is 0.7563 which is greater than 0.05 we have enough evidence to accept the null
hypothesis.
Practical No 03
R code:
> gpa<-c(2.4,3.9,2.8,2,3,2.7,2.1,4,2.8,3,3.9,3.3,3.1,3.8,3)
> method<-as.factor(rep(c("Online","Lecture","Hybrid"),times=5))
> instructor<-as.factor(rep(1:5,each=3))
> rbd<-data.frame(gpa,method,instructor)
Assumptions:
1.Observations are normally distributed.
2.Observations are independent.
3.Populations have equal variance.
Hypothesis:
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3
𝐻1 : at least two are unequal
here,
𝜇1 = average GPA under online method
𝜇2 = average GPA under lecture method
𝜇3 = average GPA under hybrid method
R code:
> m<-lm(gpa~method+instructor)
> anova(m)
Output:
Response: gpa
Df Sum Sq Mean Sq F value Pr(>F)
method 2 3.7333 1.86667 23.0928 0.0004751 ***
instructor 4 1.2773 0.31933 3.9505 0.0466507 *
Residuals 8 0.6467 0.08083
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Interpretation:
From the above Anova table, p-value=0.0004751 < 𝛼(0.05)
So, we have enough evidence to reject the null hypothesis at 5% level of significance.
That means ,GPA vary significantly depending on the type of instruction.
From the Anova table, we get additional information that, the effect of is also significant in this case.
Practical No. 04
Code:
mydf <- read.table("clipboard",header=T)
attach(mydf)
head(mydf)
model <- glm(Admit ~ SAT + GPA + as.factor(Race), family="binomial")
summary(model)
Output:
Call:
glm(formula = Admit ~ SAT + GPA + as.factor(Race), family = "binomial")
Deviance Residuals:
Min 1Q Median 3Q Max
-1.5420 -0.8761 -0.6408 1.1510 2.0998
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.1535282 1.0638161 -3.904 9.45e-05 ***
SAT 0.0010930 0.0005273 2.073 0.038203 *
GPA 0.8304239 0.3127104 2.656 0.007917 **
as.factor(Race)2 -0.5188735 0.2988188 -1.736 0.082490 .
as.factor(Race)3 -1.1671195 0.3268685 -3.571 0.000356 ***
as.factor(Race)4 -1.4326967 0.4076618 -3.514 0.000441 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 548.84 on 441 degrees of freedom

Residual deviance: 507.15 on 436 degrees of freedom
AIC: 519.15
Number of Fisher Scoring iterations: 4
Code:
mydata <- data.frame(SAT=1600,GPA=4,Race=factor(1)) #for SAT score 1600, GPA 4 and Race 1.
predict(model,mydata,type="response")
Output:
1
0.7144157
Interpretation:
Intercept: The odd of getting admission increased by (1-exp(-4.1535282))% if all

other variables are zero. Under 5% level of significance it is significant.
SAT: The odd of having getting admission increased by (exp(0.0010930)-1)% for 1

unit increase in SAT score keeping all other covariates at a fixed level. Under 5%
level of significance it is significant.
GPA: The odd of having getting admission increased by (exp(0.8304239)-1)% for 1

unit increase in GPA keeping all other covariates at a fixed level. Under 5% level
of significance it is significant.
Race (2): Race (2) have (1-exp(-0.5188735))% lower odd of getting admission
compared to Race(1), keeping all other variables at a fixed level. Under 5% level of
significance it is insignificant.
Race (3): Race (3) have (1-exp(-1.1671195))% lower odd of getting admission
compared to Race(1), keeping all other variables at a fixed level. Under 5% level of
significance it is significant.
Practical No 05
R Code
x1<-c(3,5,5,7,7,7,8,9,10,11)
x2<-c(2.3,1.9,1,.7,.3,1,1.05,.45,.7,.3)
X<-matrix(c(x1,x2),ncol=2)
xbar<-apply(X,2,mean)
S<-cov(X)
d2<-c()
for(i in 1:nrow(X)){
d2[i]<-(X[i,]-xbar)%*%solve(S)%*%(X[i,]-xbar)}
d2
D2<-sort(d2)
D2
Q<-c()
for(i in 1:length(D2))
{
Q[i]<-qchisq(((i-.5)/10),2)
}
plot(Q,D2)
Output
a) Squared distances
d2
[1] 4.058682373 2.109580755 2.107431801 0.636114372 3.265479436
0.007903377 0.521861631 0.647933621 2.059080327 2.585932308
b) Ordered distances
D2
[1] 0.007903377 0.521861631 0.636114372 0.647933621 2.059080327
2.107431801 2.109580755 2.585932308 3.265479436 4.058682373
Fig Chi Square plot
c)
Fig:- Checking bivariate normality

Comment:
To comment on this plot is difficult because the sample size is very small.
However from the plot we can see most of the points do not lie along the line
with slope 1. So we are not convinced to conclude that the data came from
bivariate normal distribution.
Practical No 06
Hypothesis
H0: Mean vectors (fever, Pressure and Aches) for treatment and placebo are equal
H1: Mean vectors (fever, Pressure and Aches) for treatment and placebo are unequal
Rcode
placebo <- read.table("e:/placebo.txt",header=T)

treat <- read.table("e:/treat.txt",header=T)
X1 <- apply(treat,2,mean)
X2 <- apply(placebo,2,mean)
Sp <- (cov(treat)+cov(placebo))/(20+18-2)
T2 <- t(X1-X2)%*%solve(((1/20)+(1/18))*Sp)%*%(X1-X2)
Pval <- pf(((18+20-3-1)/(3*(18+20-2)))*T2,3,(18+20-3-1),lower.tail=0)

Pval
Output
2.175351e-08
Interpretation
As the p value is 2.175351e-08 our null hypothesis is rejected at 5% level of significance.
Practical No 07
R code
mydata <- read.table("clipboard",header=T)

attach(mydata)
X <- cbind(BOD_S - BOD_P,SS_S - SS_P)
s <- cov(X)
dbar<- apply(X,2,mean) #or dbar <- colMeans(X)
n <- 11
p <- 2
T2 <- n*(dbar%*%solve(s)%*%dbar)
T2
Pval <- pf(((n-p)/(p*(n-1)))*T2,p,(n-p),lower.tail=0)
Pval
Output
Test statistic = 13.63931, P Value = 0.02082779
Comment:
Since p –value is less than .05 so we have enough evidence to reject null
hypothesis. Thus we can say that mean values for either BOD or SS differs in state
lab and private lab at 5% level of significance.
Practical No 08
Creating data:
y<-c(28,36,18,31,25,32,19,30,27,32,23,29)
A<-as.factor(rep(1:2,times=6))
B<-as.factor(rep(1:2,each=2,times=3))
data<-data.frame(A,B,y)
1.Checking assumptions and computing ANOVA table

Assumptions:
1. observations are drawn from normal distribution.

2.observations are independent.
3.population have equal variance.
out<-lm(y~A+B+A*B)
#Checking normality assumption
For checking normality Assumption we can simply examine the Q-Q plot of the
error terms of our model(out). The codes are
qqnorm(out$resid)
qqline(out$resid)
Figure 10.1: Normal Q-Q plot for checking normality assumption
Comment:
The above figure presents the Q-Q plot.Since in the Q-Q plot most of the
points lie on the 450 straight line, we can say that the residuals
approximately follow normal distribution.
Checking assumption of constant variance

plot(predict(out),out$resid^2)
Figure 10.2:Plot of predicted values and residuals^2 for checking

homoscedasticity
Comment
From the above plot we can say that there is homoscedasticity because the data
do not follow any systematic pattern.
#Checking Independency Assumption :
time.order<-c(sample(1:12))
time.order
[1] 10 5 7 11 3 12 6 1 2 9 8 4
plot(time.order,out$resid)
abline(h=0)
Figure10.3 :Checking independency of the observations
Comment:
From the above plot it is clear that there is no pattern. So we can conclude that
the observations are independent.
There from the analysis we can conclude that all of the assumptions are fulfilled
for the given data.
Analysis of variance
In analysis of variance we interested in testing the following hypothesis
𝐻01 : The main effect of A is not significant.

𝐻𝑎1 : The main effect of A is significant.
𝐻02 : The main effect of B is not significant.
𝐻𝑎2 : The main effect of B is significant.
𝐻03 : The interaction effect of A & B is not significant.

𝐻𝑎3 : The interaction effect of A & B is significant.
Well firstly we will judge the interaction plot to see if there is any interaction
present or not. For this we will use the following codes.
par(mfrow=c(1,2))
interaction.plot(A,B,y)
interaction.plot(B,A,y)
Figure 10.4: plot for studying interaction effect

From the above plot it seems that there is no interaction effect between A & B.
Now we need to carry out an analysis of variance for the given data.for this
purpose we will use the following r-code.
out<-lm(y~A+B+A*B)
anova(out)
We get the following analysis of variance table.
Response: y
s/v Df Sum Sq Mean Sq F value Pr(>F)

A 1 208.333 208.333 53.1915 8.444e-05 ***
B 1 75.000 75.000 19.1489 0.002362 **
A:B 1 8.333 8.333 2.1277 0.182776
Residuals 8 31.333 3.917
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Comment :
Since for main effect of A, p-value < .001 , we can reject the Null hypothesis (𝐻01 )
i.e. the main effect of A is significant.
Since for main effect of B, p-value < .01 , we can reject the Null hypothesis (𝐻02 )
i.e. the main effect of B is significant.
Since for interaction effect of A & B, p-value > .1 , we cannot reject the Null
hypothesis (𝐻03 ) i.e. the interaction effect of A & B is insignificant.
Ans of question 2
Yes there are two significant main effects A and B. But there is no significant
interaction effect.
Problem 9
Assumption:
1. The data is under random censoring.
2. The survival probability is same in the interval
[ tj , tj+1)
3. Survival function is a non-increasing function.
R Code:
library(survival)
Time<-c(3,4,5,22,34, 2, 3,4, 7,11)
Indicator<-c(1,0, 1, 1, 1, 1, 1, 1, 1, 0)
Group<-c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
data<-data.frame(Time, Indicator, Group)
km.both.group<-survfit(Surv(Time,Indicator)~Group,data=data)
summary(km.both.group)
Output
Call: survfit(formula = Surv(Time, Indicator) ~ Group, data = data)
Group=1
time n.risk n.event survival std.err lower 95% CI upper 95% CI
3 5 1 0.800 0.179 0.5161 1
5 3 1 0.533 0.248 0.2142 1
22 2 1 0.267 0.226 0.0507 1
34 1 1 0.000 NaN NA NA
Group=2
2 5 1 0.8 0.179 0.5161 1
3 4 1 0.6 0.219 0.2933 1
4 3 1 0.4 0.219 0.1367 1
7 2 1 0.2 0.179 0.0346 1
From above tables we can see the survival probabilities for both groups.
Now for sketching the survival probabilities for both group in the same graph we
use the following codes.
plot(km.both.group,xlab="Time",ylab="survival
probability",col=c("red","blue"),lwd=2)
legend(25,.9,c("Yes","No"),col=c("red","blue"),lwd=2)
title("Survival curves for both group")
and we get the following figure.
Figure 9.1: Survival curves for both group
From the graph we can see that those subjects who are using drug are
experiencing better survival than those who don’t use drug.so the next thing is to
do is to test if the survival experience differ significantly between these two
groups.
Test of equality
Here we want to test the following hypothesis
H0: The two survival functions are equal
H1: They are not equal
R Code for testing hypothesis
eq.test<-survdiff(Surv(Time,Indicator)~Group,data=data)
eq.test
Call:
survdiff(formula = Surv(Time, Indicator) ~ Group, data = data)
N Observed Expected (O-E)^2/E (O-E)^2/V
Group=1 5 4 5.28 0.311 1.16
Group=2 5 4 2.72 0.605 1.16
Chisq= 1.2 on 1 degrees of freedom, p= 0.281
Interpretation:
The p-value is less than .05 so we do not have enough evidence to reject the hull
hypothesis. That is we can say the two survival functions are equal.
Problem 10
R-code:
data<-read.table("drugs.txt",header=T)
library(survival)
Loading required package: splines
data
Time Indicator Group
1 3 1 1
2 4 0 1
3 5 1 1
4 22 1 1
5 34 1 1
6 2 1 2
7 3 1 2
8 4 1 2
9 7 1 2
10 11 0 2
km.both.group<-survfit(Surv(Time,Indicator)~Group,data=data)
summary(km.both.group)
Call: survfit(formula = Surv(Time, Indicator) ~ Group, data = data)

Group=1 (yes)
3 5 1 0.800 0.179 0.5161 1
5 3 1 0.533 0.248 0.2142 1
22 2 1 0.267 0.226 0.0507 1
34 1 1 0.000 NaN NA NA
Group=1 (yes)
Time n.risk n.event survival
3 5 1 0.800
5 3 1 0.533
22 2 1 0.267
34 1 1 0.000
Group=2 (no)
2 5 1 0.8 0.179 0.5161 1
3 4 1 0.6 0.219 0.2933 1
4 3 1 0.4 0.219 0.1367 1
7 2 1 0.2 0.179 0.0346 1
Group=2 (No)
time n.risk n.event survival
2 5 1 0.8
3 4 1 0.6
4 3 1 0.4
7 2 1 0.2
#Survival curves for two groups
plot(km.both.group,xlab="Time",ylab="survival
probability",col=c("red","blue"),lwd=2)
legend(25,.9,c("Yes","No"),col=c("red","blue"),lwd=2)
title("Survival curves for both group")
Figure 10.1: Survival curves for two groups

Comment:
From figure 10.1 it is clear those survival probabilities for yes group is greater
than no group for each time point. That is the survival probabilities for drug used
group are greater than the group that did not use the drug.
# Test for equality
Hypothesis
Ho:𝑆𝑦𝑒𝑠 (t)= 𝑆𝑁𝑜 (t)
H1: 𝑆𝑦𝑒𝑠 (t)≠ 𝑆𝑁𝑜 (t)
Where,
Syes(t)=population survival probability for drug used group.
Sno(t)= population survival probability for drug not used group.
eq.test<-survdiff(Surv(Time,Indicator)~Group,data=data)
eq.test
Call:
survdiff(formula = Surv(Time, Indicator) ~ Group, data = data)
N Observed Expected (O-E)^2/E (O-E)^2/V
Group=1 5 4 5.28 0.311 1.16
Group=2 5 4 2.72 0.605 1.16
Chisq= 1.2 on 1 degrees of freedom, p= 0.281
>
Comment:
As p value>0.05
so for 5% level of significance we cannot reject the null hypothesis that is the two
drug used groups survival functions are same. Therfore we can conclude the
survival functions for both groups are same.

Assignment R New 1

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Assignment R New 1

Încărcat de

Drepturi de autor:

Formate disponibile

Practical No.

1.Observations are normally distributed.

2.Observations are independent.

3.Populations have equal variance.

𝐻1 : at least two are unequal

𝜇1 = average GPA under online method

𝜇2 = average GPA under lecture method

𝜇3 = average GPA under hybrid method

Analysis of Variance Table

Df Sum Sq Mean Sq F value Pr(>F)

method 2 3.7333 1.86667 23.0928 0.0004751 ***

instructor 4 1.2773 0.31933 3.9505 0.0466507 *

Residuals 8 0.6467 0.08083

From the above Anova table, p-value=0.0004751 < 𝛼(0.05)

That means ,GPA vary significantly depending on the type of instruction.

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 548.84 on 441 degrees of freedom

Number of Fisher Scoring iterations: 4

Intercept: The odd of getting admission increased by (1-exp(-4.1535282))% if all

SAT: The odd of having getting admission increased by (exp(0.0010930)-1)% for 1

GPA: The odd of having getting admission increased by (exp(0.8304239)-1)% for 1

Fig:- Checking bivariate normality

placebo <- read.table("e:/placebo.txt",header=T)

Pval <- pf(((18+20-3-1)/(3*(18+20-2)))*T2,3,(18+20-3-1),lower.tail=0)

mydata <- read.table("clipboard",header=T)

1.Checking assumptions and computing ANOVA table

1. observations are drawn from normal distribution.

Figure 10.1: Normal Q-Q plot for checking normality assumption

Checking assumption of constant variance

Figure 10.2:Plot of predicted values and residuals^2 for checking

#Checking Independency Assumption :

𝐻01 : The main effect of A is not significant.

𝐻03 : The interaction effect of A & B is not significant.

Figure 10.4: plot for studying interaction effect

s/v Df Sum Sq Mean Sq F value Pr(>F)

Call: survfit(formula = Surv(Time, Indicator) ~ Group, data = data)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

3 5 1 0.800 0.179 0.5161 1

5 3 1 0.533 0.248 0.2142 1

22 2 1 0.267 0.226 0.0507 1

time n.risk n.event survival std.err lower 95% CI upper 95% CI

2 5 1 0.8 0.179 0.5161 1

3 4 1 0.6 0.219 0.2933 1

4 3 1 0.4 0.219 0.1367 1

7 2 1 0.2 0.179 0.0346 1

title("Survival curves for both group")

and we get the following figure.

Figure 9.1: Survival curves for both group

H0: The two survival functions are equal

H1: They are not equal

R Code for testing hypothesis

survdiff(formula = Surv(Time, Indicator) ~ Group, data = data)

N Observed Expected (O-E)^2/E (O-E)^2/V

Group=1 5 4 5.28 0.311 1.16

Group=2 5 4 2.72 0.605 1.16

Chisq= 1.2 on 1 degrees of freedom, p= 0.281

Loading required package: splines

Time Indicator Group

Call: survfit(formula = Surv(Time, Indicator) ~ Group, data = data)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

3 5 1 0.800 0.179 0.5161 1

5 3 1 0.533 0.248 0.2142 1

22 2 1 0.267 0.226 0.0507 1

time n.risk n.event survival std.err lower 95% CI upper 95% CI

Pval <- pf(((18+20-3-1)/(3(18+20-2)))T2,3,(18+20-3-1),lower.tail=0)