Sunteți pe pagina 1din 26

Practical No.

01
(a)
Code:
mydata <- read.table("clipboard",header=T)
attach(mydata)
model <- lm(y ~ x1 + x2 + x3, data= mydata)
summary(model)

Output:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13354.6016 6485.4190 2.059 0.0619 .
x1 -36.2819 6.3563 -5.708 9.79e-05 ***
x2 26.3375 10.1264 2.601 0.0232 *
x3 -0.1925 0.3069 -0.627 0.5422
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Interpretation:
Coefficient of x1 is -36.2819 means 1 unit change in wholesale price of rose$/dozen causes -36.2819 unit
average change in sales of quantity of rose.
Coefficient of x2 is 26.3375 and means 1 unit change in wholesale price of rose$/dozen causes 26.3375
unit average change in sales of quantity of rose.
Coefficient of x3 is -0.1925 means 1 unit change in wholesale price of rose$/dozen causes -0.1925 unit
average change in sales of quantity of rose.

(b)
Code:
c <- c(0,1,-1,0)
b <- coef(model)
est <- sum(c*b)
sd <- sqrt(t(c)%*%vcov(model)%*%c)
t <- est/sd
pval <- 2*(1-pt(abs(t),12))
pval

Output:
[,1]
[1,] 0.001010749

Interpretation:
Since p value 0.001010749 is less than 0.05 we have enough evidence to conclude that null hypothesis
𝐻0 : 𝛽1 = 𝛽2 is rejected. Hence effect of x1 and x2 on y significantly differs.

(c)
Code:
ll <- est + qt(0.025,12)*sd
ul <- est + qt(1-0.025,12)*sd
ci <- c(ll,ul)
ci

Output:
[1] -94.26238 -30.97643

Interpretation:
Upper limit of the confidence interval is -30.97643 and lower limit of the confidence interval is -
94.26238.

(d)
Code:
nestedmodel <- lm(y ~ x1, data= mydata)
anova(nestedmodel, model)

Output:
Analysis of Variance Table

Model 1: y ~ x1
Model 2: y ~ x1 + x2 + x3
Res.Df RSS Df Sum of Sq F Pr(>F)
1 14 24105953
2 12 13900824 2 10205128 4.4048 0.03677 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Interpretation:
Since P value is 0.03677 *< 0.05. We have enough evidence to conclude that covariates x2 & x3 effect on
y is not zero.
Practical No. 02
Hypothesis:
H0: All treatments are equal
H1: At least two of them are unequal

Assumpltions:
1. Observations are normally distributed
2. Observations are independent
3. Observations have equal population variances

Rcode:

y <-c(82,71,64,93,62,73,61,85,87,74,94,91,69,78,56,70,66,78,53,71,87)
treat <-as.factor(rep(c("Y1","Y2","Y3"),7))
crd <- data.frame(treat,y)

anova(model)

Output:
Analysis of Variance Table

Response: y
Df Sum Sq Mean Sq F value Pr(>F)
treat 2 88.67 44.333 0.2837 0.7563
Residuals 18 2812.57 156.254

Interpretation:
As the p-value is 0.7563 which is greater than 0.05 we have enough evidence to accept the null
hypothesis.
Practical No 03
R code:

> gpa<-c(2.4,3.9,2.8,2,3,2.7,2.1,4,2.8,3,3.9,3.3,3.1,3.8,3)

> method<-as.factor(rep(c("Online","Lecture","Hybrid"),times=5))

> instructor<-as.factor(rep(1:5,each=3))

> rbd<-data.frame(gpa,method,instructor)

Assumptions:

1.Observations are normally distributed.

2.Observations are independent.

3.Populations have equal variance.

Hypothesis:

𝐻0 : 𝜇1 = 𝜇2 = 𝜇3

𝐻1 : at least two are unequal

here,

𝜇1 = average GPA under online method

𝜇2 = average GPA under lecture method

𝜇3 = average GPA under hybrid method

R code:

> m<-lm(gpa~method+instructor)

> anova(m)
Output:

Analysis of Variance Table

Response: gpa

Df Sum Sq Mean Sq F value Pr(>F)

method 2 3.7333 1.86667 23.0928 0.0004751 ***

instructor 4 1.2773 0.31933 3.9505 0.0466507 *

Residuals 8 0.6467 0.08083

---

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Interpretation:

From the above Anova table, p-value=0.0004751 < 𝛼(0.05)

So, we have enough evidence to reject the null hypothesis at 5% level of significance.

That means ,GPA vary significantly depending on the type of instruction.

From the Anova table, we get additional information that, the effect of is also significant in this case.
Practical No. 04
Code:
mydf <- read.table("clipboard",header=T)
attach(mydf)
head(mydf)
model <- glm(Admit ~ SAT + GPA + as.factor(Race), family="binomial")
summary(model)

Output:
Call:
glm(formula = Admit ~ SAT + GPA + as.factor(Race), family = "binomial")

Deviance Residuals:
Min 1Q Median 3Q Max
-1.5420 -0.8761 -0.6408 1.1510 2.0998

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.1535282 1.0638161 -3.904 9.45e-05 ***
SAT 0.0010930 0.0005273 2.073 0.038203 *
GPA 0.8304239 0.3127104 2.656 0.007917 **
as.factor(Race)2 -0.5188735 0.2988188 -1.736 0.082490 .
as.factor(Race)3 -1.1671195 0.3268685 -3.571 0.000356 ***
as.factor(Race)4 -1.4326967 0.4076618 -3.514 0.000441 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 548.84 on 441 degrees of freedom


Residual deviance: 507.15 on 436 degrees of freedom
AIC: 519.15

Number of Fisher Scoring iterations: 4

Code:
mydata <- data.frame(SAT=1600,GPA=4,Race=factor(1)) #for SAT score 1600, GPA 4 and Race 1.
predict(model,mydata,type="response")

Output:
1
0.7144157

Interpretation:

Intercept: The odd of getting admission increased by (1-exp(-4.1535282))% if all


other variables are zero. Under 5% level of significance it is significant.

SAT: The odd of having getting admission increased by (exp(0.0010930)-1)% for 1


unit increase in SAT score keeping all other covariates at a fixed level. Under 5%
level of significance it is significant.

GPA: The odd of having getting admission increased by (exp(0.8304239)-1)% for 1


unit increase in GPA keeping all other covariates at a fixed level. Under 5% level
of significance it is significant.

Race (2): Race (2) have (1-exp(-0.5188735))% lower odd of getting admission
compared to Race(1), keeping all other variables at a fixed level. Under 5% level of
significance it is insignificant.

Race (3): Race (3) have (1-exp(-1.1671195))% lower odd of getting admission
compared to Race(1), keeping all other variables at a fixed level. Under 5% level of
significance it is significant.
Practical No 05
R Code

x1<-c(3,5,5,7,7,7,8,9,10,11)
x2<-c(2.3,1.9,1,.7,.3,1,1.05,.45,.7,.3)
X<-matrix(c(x1,x2),ncol=2)
xbar<-apply(X,2,mean)
S<-cov(X)
d2<-c()
for(i in 1:nrow(X)){
d2[i]<-(X[i,]-xbar)%*%solve(S)%*%(X[i,]-xbar)}
d2
D2<-sort(d2)
D2
Q<-c()
for(i in 1:length(D2))
{
Q[i]<-qchisq(((i-.5)/10),2)
}

plot(Q,D2)

Output

a) Squared distances
d2
[1] 4.058682373 2.109580755 2.107431801 0.636114372 3.265479436
0.007903377 0.521861631 0.647933621 2.059080327 2.585932308

b) Ordered distances
D2
[1] 0.007903377 0.521861631 0.636114372 0.647933621 2.059080327
2.107431801 2.109580755 2.585932308 3.265479436 4.058682373
Fig Chi Square plot
c)

Fig:- Checking bivariate normality


Comment:
To comment on this plot is difficult because the sample size is very small.
However from the plot we can see most of the points do not lie along the line
with slope 1. So we are not convinced to conclude that the data came from
bivariate normal distribution.
Practical No 06

Hypothesis

H0: Mean vectors (fever, Pressure and Aches) for treatment and placebo are equal

H1: Mean vectors (fever, Pressure and Aches) for treatment and placebo are unequal

Rcode

placebo <- read.table("e:/placebo.txt",header=T)


treat <- read.table("e:/treat.txt",header=T)
X1 <- apply(treat,2,mean)
X2 <- apply(placebo,2,mean)

Sp <- (cov(treat)+cov(placebo))/(20+18-2)

T2 <- t(X1-X2)%*%solve(((1/20)+(1/18))*Sp)%*%(X1-X2)

Pval <- pf(((18+20-3-1)/(3*(18+20-2)))*T2,3,(18+20-3-1),lower.tail=0)


Pval

Output
2.175351e-08

Interpretation
As the p value is 2.175351e-08 our null hypothesis is rejected at 5% level of significance.

Practical No 07

R code

mydata <- read.table("clipboard",header=T)


attach(mydata)
X <- cbind(BOD_S - BOD_P,SS_S - SS_P)
s <- cov(X)
dbar<- apply(X,2,mean) #or dbar <- colMeans(X)
n <- 11
p <- 2
T2 <- n*(dbar%*%solve(s)%*%dbar)
T2
Pval <- pf(((n-p)/(p*(n-1)))*T2,p,(n-p),lower.tail=0)
Pval

Output
Test statistic = 13.63931, P Value = 0.02082779

Comment:

Since p –value is less than .05 so we have enough evidence to reject null
hypothesis. Thus we can say that mean values for either BOD or SS differs in state
lab and private lab at 5% level of significance.

Practical No 08

Creating data:
y<-c(28,36,18,31,25,32,19,30,27,32,23,29)
A<-as.factor(rep(1:2,times=6))
B<-as.factor(rep(1:2,each=2,times=3))
data<-data.frame(A,B,y)

1.Checking assumptions and computing ANOVA table


Assumptions:

1. observations are drawn from normal distribution.


2.observations are independent.
3.population have equal variance.

out<-lm(y~A+B+A*B)
#Checking normality assumption

For checking normality Assumption we can simply examine the Q-Q plot of the
error terms of our model(out). The codes are

qqnorm(out$resid)
qqline(out$resid)

Figure 10.1: Normal Q-Q plot for checking normality assumption

Comment:
The above figure presents the Q-Q plot.Since in the Q-Q plot most of the
points lie on the 450 straight line, we can say that the residuals
approximately follow normal distribution.

Checking assumption of constant variance


plot(predict(out),out$resid^2)

Figure 10.2:Plot of predicted values and residuals^2 for checking


homoscedasticity

Comment
From the above plot we can say that there is homoscedasticity because the data
do not follow any systematic pattern.

#Checking Independency Assumption :

time.order<-c(sample(1:12))

time.order

[1] 10 5 7 11 3 12 6 1 2 9 8 4

plot(time.order,out$resid)

abline(h=0)
Figure10.3 :Checking independency of the observations
Comment:

From the above plot it is clear that there is no pattern. So we can conclude that
the observations are independent.

There from the analysis we can conclude that all of the assumptions are fulfilled
for the given data.

Analysis of variance
In analysis of variance we interested in testing the following hypothesis

𝐻01 : The main effect of A is not significant.


𝐻𝑎1 : The main effect of A is significant.
𝐻02 : The main effect of B is not significant.
𝐻𝑎2 : The main effect of B is significant.

𝐻03 : The interaction effect of A & B is not significant.


𝐻𝑎3 : The interaction effect of A & B is significant.

Well firstly we will judge the interaction plot to see if there is any interaction
present or not. For this we will use the following codes.
par(mfrow=c(1,2))

interaction.plot(A,B,y)
interaction.plot(B,A,y)

Figure 10.4: plot for studying interaction effect


From the above plot it seems that there is no interaction effect between A & B.

Now we need to carry out an analysis of variance for the given data.for this
purpose we will use the following r-code.

out<-lm(y~A+B+A*B)

anova(out)
We get the following analysis of variance table.
Analysis of Variance Table

Response: y

s/v Df Sum Sq Mean Sq F value Pr(>F)


A 1 208.333 208.333 53.1915 8.444e-05 ***
B 1 75.000 75.000 19.1489 0.002362 **
A:B 1 8.333 8.333 2.1277 0.182776
Residuals 8 31.333 3.917
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Comment :

Since for main effect of A, p-value < .001 , we can reject the Null hypothesis (𝐻01 )
i.e. the main effect of A is significant.

Since for main effect of B, p-value < .01 , we can reject the Null hypothesis (𝐻02 )
i.e. the main effect of B is significant.
Since for interaction effect of A & B, p-value > .1 , we cannot reject the Null
hypothesis (𝐻03 ) i.e. the interaction effect of A & B is insignificant.

Ans of question 2
Yes there are two significant main effects A and B. But there is no significant
interaction effect.

Problem 9
Assumption:
1. The data is under random censoring.
2. The survival probability is same in the interval
[ tj , tj+1)
3. Survival function is a non-increasing function.

R Code:
library(survival)
Time<-c(3,4,5,22,34, 2, 3,4, 7,11)
Indicator<-c(1,0, 1, 1, 1, 1, 1, 1, 1, 0)
Group<-c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
data<-data.frame(Time, Indicator, Group)
km.both.group<-survfit(Surv(Time,Indicator)~Group,data=data)
summary(km.both.group)
Output

Call: survfit(formula = Surv(Time, Indicator) ~ Group, data = data)

Group=1

time n.risk n.event survival std.err lower 95% CI upper 95% CI

3 5 1 0.800 0.179 0.5161 1

5 3 1 0.533 0.248 0.2142 1

22 2 1 0.267 0.226 0.0507 1

34 1 1 0.000 NaN NA NA

Group=2

time n.risk n.event survival std.err lower 95% CI upper 95% CI

2 5 1 0.8 0.179 0.5161 1

3 4 1 0.6 0.219 0.2933 1

4 3 1 0.4 0.219 0.1367 1

7 2 1 0.2 0.179 0.0346 1

From above tables we can see the survival probabilities for both groups.

Now for sketching the survival probabilities for both group in the same graph we
use the following codes.
plot(km.both.group,xlab="Time",ylab="survival
probability",col=c("red","blue"),lwd=2)

legend(25,.9,c("Yes","No"),col=c("red","blue"),lwd=2)

title("Survival curves for both group")

and we get the following figure.

Figure 9.1: Survival curves for both group

From the graph we can see that those subjects who are using drug are
experiencing better survival than those who don’t use drug.so the next thing is to
do is to test if the survival experience differ significantly between these two
groups.
Test of equality
Here we want to test the following hypothesis

H0: The two survival functions are equal

H1: They are not equal

R Code for testing hypothesis

eq.test<-survdiff(Surv(Time,Indicator)~Group,data=data)

eq.test

Call:

survdiff(formula = Surv(Time, Indicator) ~ Group, data = data)

N Observed Expected (O-E)^2/E (O-E)^2/V

Group=1 5 4 5.28 0.311 1.16

Group=2 5 4 2.72 0.605 1.16

Chisq= 1.2 on 1 degrees of freedom, p= 0.281

Interpretation:

The p-value is less than .05 so we do not have enough evidence to reject the hull
hypothesis. That is we can say the two survival functions are equal.
Problem 10
R-code:

data<-read.table("drugs.txt",header=T)

library(survival)

Loading required package: splines

data

Time Indicator Group

1 3 1 1

2 4 0 1

3 5 1 1

4 22 1 1

5 34 1 1

6 2 1 2

7 3 1 2

8 4 1 2

9 7 1 2

10 11 0 2

km.both.group<-survfit(Surv(Time,Indicator)~Group,data=data)

summary(km.both.group)

Call: survfit(formula = Surv(Time, Indicator) ~ Group, data = data)


Group=1 (yes)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

3 5 1 0.800 0.179 0.5161 1

5 3 1 0.533 0.248 0.2142 1

22 2 1 0.267 0.226 0.0507 1

34 1 1 0.000 NaN NA NA

Group=1 (yes)
Time n.risk n.event survival
3 5 1 0.800
5 3 1 0.533
22 2 1 0.267
34 1 1 0.000

Group=2 (no)

time n.risk n.event survival std.err lower 95% CI upper 95% CI

2 5 1 0.8 0.179 0.5161 1

3 4 1 0.6 0.219 0.2933 1

4 3 1 0.4 0.219 0.1367 1

7 2 1 0.2 0.179 0.0346 1

Group=2 (No)
time n.risk n.event survival
2 5 1 0.8
3 4 1 0.6
4 3 1 0.4
7 2 1 0.2
#Survival curves for two groups

plot(km.both.group,xlab="Time",ylab="survival
probability",col=c("red","blue"),lwd=2)

legend(25,.9,c("Yes","No"),col=c("red","blue"),lwd=2)

title("Survival curves for both group")

Figure 10.1: Survival curves for two groups


Comment:
From figure 10.1 it is clear those survival probabilities for yes group is greater
than no group for each time point. That is the survival probabilities for drug used
group are greater than the group that did not use the drug.

# Test for equality

Hypothesis
Ho:𝑆𝑦𝑒𝑠 (t)= 𝑆𝑁𝑜 (t)
H1: 𝑆𝑦𝑒𝑠 (t)≠ 𝑆𝑁𝑜 (t)

Where,
Syes(t)=population survival probability for drug used group.
Sno(t)= population survival probability for drug not used group.

eq.test<-survdiff(Surv(Time,Indicator)~Group,data=data)

eq.test

Call:

survdiff(formula = Surv(Time, Indicator) ~ Group, data = data)

N Observed Expected (O-E)^2/E (O-E)^2/V

Group=1 5 4 5.28 0.311 1.16

Group=2 5 4 2.72 0.605 1.16

Chisq= 1.2 on 1 degrees of freedom, p= 0.281

>

Comment:
As p value>0.05
so for 5% level of significance we cannot reject the null hypothesis that is the two
drug used groups survival functions are same. Therfore we can conclude the
survival functions for both groups are same.

S-ar putea să vă placă și