Sunteți pe pagina 1din 75

UNIVERSITY OF LUCKNOW

DEPARTMENT OF STATISTICS
SESSION 2017-18

Name: Amit Kumar


Class: M.Sc. Statistics
Semester: 2nd
Roll no. : 180012910002

1
Index
S.No. Experiment Date Page No. Remark

1. Test the hypothesis regarding population means. 22/01/2018 1-5

2. Test hypothesis that Mean head length and breadth length of 22/01/2018 6–9
1st Son and 2nd Son are equal at significance Level 0.01.
3. Test hypothesis at 5% level of significance and significance 31/01/2018 10 – 16
of difference in mean vector of two groups of females.
4. Test for significance of difference in the mean vectors of two 31/01/2018 17 – 20
species. Obtain Fisher’s linear discriminant function and
classify into one of the two groups. Test the adequacy of the
assigned discriminant function Are x1 and x2 alone sufficient
for discrimination?
5. 1. State the regression model Estimate the parameters of the 06/02/2018 21 – 25
model Test for the significance of the average hourly
temperature. Construct 95% CI for regression coefficient.
2. Predict the amount of coal that should be ordered. What will
be the change in the amount of coal to be ordered.

6. Estimate the parameters.Write 90% C.I and test for the 06/02/2018 26 – 29
significance for β0, β1, β2 .
7. Test the hypothesis that the samples come from two 14/02/2018 30 – 35
trivariate normal populations with the same mean.
8. Check whether centroids of three populations are identical. 14/02/2018 36 -39
9. Compare two types of coating for resistance to corrosion. 26/02/2018 40-48
10. Test the hypothesis Ho:μ1=μ2 26/02/2018 44-54
11. Test whether two grades of eggs differ with respect to their 12/03/2018 55-59
means. What is the distance between two grades of eggs?

12. Obtain the best linear function which would discriminate 14/03/2018 59-56
between two groups.How would you classify and individual
into either of the two groups on the basis of observations

13. Find an appropriate linear model and establish analysis of 19/03/2018 67-69
variance table and derive estimable functions, find their
estimates and test hypothesis about them.
14. Find least estimate of β’s in the model Write out analysis of 19/03/2018 70-75
variance table.Test to determine if overall regression is
suitable for given data.Calculate Var(β1), Var(β2) and Cov
(β1,β2).How useful is the regression using X1 alone ? What
does X2 contribute given that X1 is already in the equation ?

2
Date: 22-Jan-2018

Experiment no 1
A random sample of 50 observations was taken on X1, X2, X3 . The observation yielded the following
information. The sample collected sum of squares matrix is
X1 X2 X3
X1 19.4 9.04 9.76
X2 9.04 11.87 4.62
X3 9.76 4.62 12.3
Means are: X̅1 = 5.936
X̅2 = 2.77
X̅3 = 4.26
If μ1, μ2, μ3,denoted the population means of X1,X2,X3, then test the following hypothesis.
1 – Ho: μ1 = 10.0, μ2 = 5.0, μ3 = 10.0
2- Ho: μ1,μ2,μ3

3
Report Page
Object:- To test the hypothesis.
1- Ho:μ1 = 10.0 , μ2 = 5.0 , μ3= 10.0
2- Ho:μ1 = μ2 = μ3
Procedure / Formula used:-
1. Calculate the matrix A by the following formula;
A = given matrix – Nx̅(x̅)’
2. Now calculate the matrix S by the formula;
S = A / (N – 1)
3. Calculate the value of T2 statistics by the formula;
T2 = N(x̅ - μ)’S-1(x̅ - μ)
4. Obtained the value of T2 statistics by the formula;
T2 = [(N – 1) / (N – p)p]*F(α, p,( N –P))
Where μ = Population Mean Vector
x̅ = Sample Mean Vector.
N is the number of observations and αis the level of significance.
5. Now compare the tabulated value with calculated value of T2, then Hypothesis will be
Accepted if T2<To2 and rejected if T2>To2.

Result:-
1. T2cal = 2346.153 and T2tab= 0.9738489 → Calculated>Tabulated
Thus, the null hypothesis H0: μ1 = 10.0 μ2 = 5.0 μ3 = 10.0 will be Rejected
2
as T cal > T2tab.

2. T2cal = -50.29351 and T2tab = 0.9738489 → Calculated<Tabulated


Thus, the null hypothesis H0: μ1 = μ2 = μ3 will be accepted as T2cal < T2tab.

4
Calculation
#1
> N<-50
> p<-3
> x<-c(19.4,9.04,9.76,9.04,11.87,4.62,9.76,4.62,12.3)
> A1<-matrix(data=x,nrow=3,ncol=3,byrow=T)
> colnames(A1)<-c("x1","x2","x3")
> rownames(A1)<-c("x1","x2","x3")
> A1
x1 x2 x3
x1 19.40 9.04 9.76
x2 9.04 11.87 4.62
x3 9.76 4.62 12.30
> x.bar<-c(5.936,2.77,4.26) #mean vector
> x.bar
[1] 5.936 2.770 4.260
> mx<-N*(x.bar%*%t(x.bar))
> mx
[,1] [,2] [,3]
[1,] 1761.805 822.136 1264.368
[2,] 822.136 383.645 590.010
[3,] 1264.368 590.010 907.380
> det(A1)
[1] 1097.704
> A<-A1-mx
> S<-A/N-1
> S
x1 x2 x3
x1 -35.84810 -17.26192 -26.09216
x2 -17.26192 -8.43550 -12.70780
x3 -26.09216 -12.70780 -18.90160
> mu<-c(10,5,10)
> S.inv<-solve(S) #to calculate inverse
> Tsq<-N*((t(x.bar)-mu)%*%S.inv%*%(x.bar-mu)) #test statistic under H0
> Tsq
[,1]
[1,] 1128.303
> F<-2.8023
> T.tab<-(p*(N-1))/(N-p)
> T.tab
[1] 3.12766

Second-

> #2
> z<-c(1,-1,0,1,0,-1)
> C1<-matrix(data=z,nrow=2,ncol=3,byrow=T)
> C1
[,1] [,2] [,3]
[1,] 1 -1 0
[2,] 1 0 -1
> Y.bar<-C1%*%x.bar
> Y.bar
[,1]
[1,] 3.166
[2,] 1.676
> S1<-(C1%*%A%*%t(C1))/(N-1)
> S1
[,1] [,2]
[1,] -9.958935 -5.307976
[2,] -5.307976 -2.617731

5
> S1.inv<-solve(S1)
> S1.inv
[,1] [,2]
[1,] 1.243698 -2.521848
[2,] -2.521848 4.731544
> T.sq<-N*(t(Y.bar)%*%S1.inv%*%Y.bar)
> T.sq
[,1]
[1,] -50.29351

6
Date: 22-Jan-2018

Experiment No. 2

Four characteristics : head length of 1st son (X1) , head breadth of 1st son (X2), head length of
2nd son (X3) , head breadth of 2nd son (X4) were measured in a sample of 25 families & the
following were obtained .

X̅ = 18.57 15.11 18.38 14.92

S = 0.9529 0.5287 0.6966 0.4611

0.5287 0.5436 0.5131 0.3503

0.6966 0.5131 1.0081 0.5654

0.4611 0.3505 0.5654 0.4502

Test the hypothesis that the mean head length & breadth of 1st son and 2nd son equal at
significance level 0.01. State the assumptions, If any.

7
Report Page
Object:To Test the hypothesis that the Mean head length and breadth length of 1st Son and
2nd Son are equal at significance Level 0.01.
Procedure and formula used:

1. Let C be any (p-1)*p Matrix of RANK (p-1).


2. Calculate y̅ = CX̅ where X̅ is given and y~( Cμ , CΣC’).
3. Obtain A = (N-1)*S .
4. We have the Test Statistics as-
T²cal = Ny̅’Sˉ¹y̅
5. T2 tabulated can be calculated as :
(N-p-1)To2/(N-1)(p-1) = Fp-1,N-p(α)
Where,
S =CAC’/(N-1)
6. Hypothesis will be Accepted if T2cal < T2tab and Vice Versa.

Result:
T2cal= 1373.355 and T2tab= 17.54657
→ T2ca > T2tab
Thus, the Hypothesis that the mean head length and breadth of 1st son and 2nd son are
equal will be rejected.

8
Calculation
> N<-25
> p<-4
> X<-
matrix(data=c(0.9529,0.5287,0.6966,0.4611,0.5287,0.5436,0.5131,0.3503,0.69
66,0.5131,1.0081,0.5654,0.4611,0.3505,0.5654,0.4502),nrow=4)
> X
[,1] [,2] [,3] [,4]
[1,] 0.9529 0.5287 0.6966 0.4611
[2,] 0.5287 0.5436 0.5131 0.3505
[3,] 0.6966 0.5131 1.0081 0.5654
[4,] 0.4611 0.3503 0.5654 0.4502
> Xbar<-matrix(data=c(18.57,15.11,18.38,14.92),nrow=4)
> Xbar
[,1]
[1,] 18.57
[2,] 15.11
[3,] 18.38
[4,] 14.92
> c<-matrix(data=c(1,-1,0,0,0,1,-1,0,0,0,1,-1),nrow=3)
> c
[,1] [,2] [,3] [,4]
[1,] 1 0 -1 0
[2,] -1 0 0 1
[3,] 0 1 0 -1
> ybar<-c%*%Xbar
> ybar
[,1]
[1,] 0.19
[2,] -3.65
[3,] 0.19
> A<-(N-1)*X
> A
[,1] [,2] [,3] [,4]
[1,] 22.8696 12.6888 16.7184 11.0664
[2,] 12.6888 13.0464 12.3144 8.4120
[3,] 16.7184 12.3144 24.1944 13.5696
[4,] 11.0664 8.4072 13.5696 10.8048
> S<-(c%*%A%*%(t(c))/(N-1)) # sample covariance
> S
[,1] [,2] [,3]
[1,] 0.5678 -0.3606 0.1199
[2,] -0.3606 0.4809 -0.1675
[3,] 0.1199 -0.1673 0.2930
> Sinv<-solve(S)
> Sinv
[,1] [,2] [,3]
[1,] 3.3639280 2.550834 0.08167142
[2,] 2.5501194 4.529386 1.54577749
[3,] 0.0795222 1.542393 4.26217123
> T2<- N*t(ybar)%*%Sinv%*%ybar # calculating t- sq statistic
> T2
[,1]
[1,] 1373.618
> F<-4.938193382
> Ttab<-((N-1)*(p-1)*F/(N-p-1))
> Ttab

9
[1] 17.7775

>

Date- 24- Jan-2018

Experiment 3

a) Process engineer of ABC Company is interested in understanding the features of the process
which produces product XYZ. She observed the following four characteristic of the finished products
coming out of the process from one of their production plants:

Units (X1) (X2) (X3) (X4)


1 23 276 289.6 51
2 22 281 289 51.7
3 22.8 270 288.2 51.3
4 22.1 278 288 52.3
5 22.5 275 288 53
6 22.2 273 288 51
7 22 275 290 53
8 22.1 268 289 54
9 22.5 277 289 52
10 22.5 278 289 52
11 22.3 269 287 54
12 21.8 274 287.6 52
13 22.3 270 288.4 51
14 22.2 273 290.2 51
15 22.1 274 286 51
16 22.1 277 287 52
17 21.8 277 287 51
18 22.6 276 290 51
19 22.3 278 287 51.7
20 23 266 289.1 51

Test whether the μ = (20, 275, 300, 50) at 5% level of significance.

b) The following table gives the estimate of the mean and common dispersion matrix of three
character, x1 ,way if living ; x2 = way of eating ; x3 = way of talking of two groups of females, one
living in villages and other in cities.
Character mean mean
Villages Cities

10
x1 72.20 76.32
x2 30.56 30.28
x3 21.44 21.64
Dispersion matrix is

x1 x2 x3

S= 18.94002.2488 5.8740

.5652 .8700

2.2848

n1= 22 ( villages) n2 = 62 ( cities)

Test for the significance in the mean vector of the two groups of females.

11
Report Page

Object:

(a) To test the hypothesis that μ= (20, 275, 300, 50) at 5% level of significance.

(b) To test the significance of the difference in mean vector of two groups of females.

Formula Used/Procedure:
a) μ= (20, 275, 300, 50) at 5% level of significance.

 Find a matrix M by following way


∑x12∑x1x2∑x1x3…. ∑x1xp
∑x1x2∑x22∑x2x3…. ∑x2xp
:
:
∑x1xp ∑x2xp∑xpx3…. ∑xp2px p

 First we need to calculate the matrix A by the following formula:


𝐴 = 𝑀 − 𝑁𝑥̄ (𝑥̄ )′
 Now we calculate the matrix S by the formula:
𝐴
𝑆=
𝑁−1
 Calculate the inverse of matrix S.
 Finally the value of T2 statistic is given by:
𝑇 2 = 𝑁(𝑥̄ − 𝜇)′ 𝑆 −1 (𝑥̄ − 𝜇)
 Now calculate the value of T tabulated by:
2

(𝑁 − 1)𝑝
𝑇2 = 𝐹
(𝑁 − 𝑝) (𝛼,𝑝,𝑁−𝑝)
 Now compare tabulated value with calculated value of T2
 If calculated>Tabulated, reject the null hypothesis
 If Calculated<tabulated, Accept the null hypothesis

b) The difference in mean vector of two groups of females.

 Firstly find the inverse of the dispersion matrix S

 Compute the t2 square statistic by the following formula:


𝑛1 𝑛2
𝑇2 = (𝑋 − 𝑋2 )𝑇 𝑆 −1 (𝑋1 − 𝑋2 )
𝑛1 + 𝑛2 1

 Now calculate the value of T2 tabulated by:

12
(𝑛1 + 𝑛2 − 2)𝑝
𝑇2 = 𝐹
(𝑛1 + 𝑛2 − 𝑝 − 1) (𝛼,𝑝,(𝑛1 +𝑛2 −𝑝−1))
 Now compare tabulated value with calculated value of T2
 If calculated>Tabulated, reject the null hypothesis
 If Calculated<tabulated, Accept the null hypothesis

13
Calculation
> N<-20
> p<-4
> x<-read.table("D:/amitr.csv",sep = ",",header = T)
> m<-as.matrix(x)
> m
Units X.X1. X.X2. X.X3. X.X4.
[1,] 1 23.0 276 289.6 51.0
[2,] 2 22.0 281 289.0 51.7
[3,] 3 22.8 270 288.2 51.3
[4,] 4 22.1 278 288.0 52.3
[5,] 5 22.5 275 288.0 53.0
[6,] 6 22.2 273 288.0 51.0
[7,] 7 22.0 275 290.0 53.0
[8,] 8 22.1 268 289.0 54.0
[9,] 9 22.5 277 289.0 52.0
[10,] 10 22.5 278 289.0 52.0
[11,] 11 22.3 269 287.0 54.0
[12,] 12 21.8 274 287.6 52.0
[13,] 13 22.3 270 288.4 51.0
[14,] 14 22.2 273 290.2 51.0
[15,] 15 22.1 274 286.0 51.0
[16,] 16 22.1 277 287.0 52.0
[17,] 17 21.8 277 287.0 51.0
[18,] 18 22.6 276 290.0 51.0
[19,] 19 22.3 278 287.0 51.7
[20,] 20 23.0 266 289.1 51.0
> x1bar<-sum(m[,1])/20
> x1bar
[1] 10.5
> x2bar<-sum(m[,2])/20
> x2bar
[1] 22.31
> x3bar<-sum(m[,3])/20
> x3bar
[1] 274.25
> x4bar<-sum(m[,4])/20
> x4bar
[1] 288.355
> a11<-sum(m[,1]*m[,1])
> a11
[1] 2870
> a12<-sum(m[,1]*m[,2])
> a12
[1] 4681.6
> a13<-sum(m[,1]*m[,3])
> a13
[1] 57515
> a14<-sum(m[,1]*m[,4])
> a14
[1] 60521.4
> a22<-sum(m[,2]*m[,2])
> a22
[1] 9957.02
> a23<-sum(m[,2]*m[,3])
> a23
[1] 122362.4
> a24<-sum(m[,2]*m[,4])
> a24
[1] 128666.9
> a33<-sum(m[,3]*m[,3])

14
> a33
[1] 1504553
> a34<-sum(m[,3]*m[,4])
> a34
[1] 1581624
> a44<-sum(m[,4]*m[,4])
> a44
[1] 1662999
> m2<-matrix(data =
c(a11,a12,a13,a14,a12,a22,a23,a24,a13,a23,a33,a34,a14,a24,a34,a44),nrow
= 4,ncol = 4,byrow = T)
> m2
[,1] [,2] [,3] [,4]
[1,] 2870.0 4681.60 57515.0 60521.4
[2,] 4681.6 9957.02 122362.4 128666.9
[3,] 57515.0 122362.40 1504553.0 1581624.2
[4,] 60521.4 128666.90 1581624.2 1662998.6
> xbar<-matrix(data = c(x1bar,x2bar,x3bar,x4bar), nrow=4 byrow=F)
Error: unexpected symbol in "xbar<-matrix(data =
c(x1bar,x2bar,x3bar,x4bar), nrow=4 byrow"
> xbar
[,1]
[1,] 22.310
[2,] 274.250
[3,] 288.355
[4,] 51.850
> A<-m2-(N*(xbar%*%t(xbar)))
> A
[,1] [,2] [,3] [,4]
[1,] -7084.722 -117688.8 -71149.0 37385.93
[2,] -117688.750 -1494304.2 -1459264.8 -155730.35
[3,] -71149.001 -1459264.8 -158419.1 1282600.06
[4,] 37385.930 -155730.3 1282600.1 1609230.12
> s<-A/(N-1)
> s
[,1] [,2] [,3] [,4]
[1,] -372.8801 -6194.145 -3744.684 1967.681
[2,] -6194.1447 -78647.591 -76803.409 -8196.334
[3,] -3744.6843 -76803.409 -8337.848 67505.267
[4,] 1967.6805 -8196.334 67505.267 84696.322
> mu<-matrix(data = c(20,275,300,50), nrow = 4,byrow = F)
> mu
[,1]
[1,] 20
[2,] 275
[3,] 300
[4,] 50
> sinverse<-solve(s)
> sinverse
[,1] [,2] [,3] [,4]
[1,] 0.028698505 -0.005422065 0.003677707 -0.004122672
[2,] -0.005422065 0.044996654 -0.050419536 0.044666175
[3,] 0.003677707 -0.050419536 0.056684887 -0.050144094
[4,] -0.004122672 0.044666175 -0.050144094 0.044396277
> Tsq<-N*(t(xbar-mu)%*%sinverse%*%(xbar-mu))
> Tsq
[,1]
[1,] 179.1756
> F<-3.0069
> Ttab<-(((N-1)*p)/(N-p))*F
> Ttab
[1] 14.28277

>

Second Part –

15
>villagem<-matrix(data = c(72.20,30.56,21.44),nrow = 3,byrow = F)
>villagem
[,1]
[1,] 72.20
[2,] 30.56
[3,] 21.44
>citym<-matrix(data = c(76.32,30.28,21.64),nrow = 3,byrow = F)
>citym
[,1]
[1,] 76.32
[2,] 30.28
[3,] 21.64
>s<-matrix(data =
c(18.9400,2.2488,5.8740,2.2488,0.5652,0.8700,5.8740,0.8700,2.848),byrow =
T,nrow = 3)
>s
[,1] [,2] [,3]
[1,] 18.9400 2.2488 5.874
[2,] 2.2488 0.5652 0.870
[3,] 5.8740 0.8700 2.848

> Sinverse<-solve(s)
> Sinverse
[,1] [,2] [,3]
[1,] 0.1629929 -0.2473598 -0.2606101
[2,] -0.2473598 3.7150238 -0.6246767
[3,] -0.2606101 -0.6246767 1.0794566
> n1<-22
> n2<-62
> p<-3
> Tsquare<-(n1*n2*(t(villagem-
citym))%*%Sinverse%*%(villagem-citym))/(n1+n2)
> Tsquare
[,1]
[1,] 53.78594
> F<-2.718
> Tsqtab<-((p*(n1+n2-2))/(n1+n2-p-1))*F
> Tsqtab
[1] 8.35785

>

16
Date: 31/01/2018

Experiment-4

We have three measurements on two species flea beetles namely

x1 = length of the alytra (in 0.01 min.)

x2 = length of the second antennal joint (in microns)

x3 = length of the third antennal joint (in microns)

Population I Population II
Haltica Oleraces Haltica Carduorum
n1  19 n2  20
 267.05   290.80 
   
x1   137.37  x 2   157.20 
 185.95   209.25 
   

Unbiased estimate of the common covariance matrix is

367.79 121.88 106.24 


S   118.31 42.06 
 208.07

(i) Test for the significance of difference in the mean vectors of two species.
(ii) Obtain Fisher’s linear discriminant function and classify the observation vector (270.51, 150.12,
190.74) into one of the two groups or populations.
(iii) Test the adequacy of the assigned discriminant function 3x1 – 2x2 + x3 at level =.05.
(iv) Are x1 and x2 alone sufficient for discrimination?

17
Report Page

Object: 1)Test for the significance of difference in the mean vectors of two species.

2) Obtain Fisher’s linear discriminant function and classify the observation vector (270.51,
150.12, 190.74) into one of the two groups or populations.

Procedure and Formula Used:

1. The difference in mean vector of two species.


a. Firstly find the inverse of the dispersion matrix S
b. Compute the T2 square statistic by the following formula:
n1 n2
T2 = (X̅ − X̅ 2 )T S −1 (X̅ 1 − X̅ 2 )
n1 + n2 1
c. Now calculate the value of T2 tabulated by:
(n1 + n2 − 2)p
T2 = F
(n1 + n2 − p − 1) (α,p,(n1 +n2−p−1))
d. Now compare tabulated value with calculated value of T2
 If calculated>Tabulated, reject the null hypothesis
 If Calculated< Tabulated, Accept the null hypothesis

2. Classification of observation vector


a. Firstly find the inverse of the common covariance matrix S
b. Compute the linear discriminant function by the following formula:
1
b = (X̅ 1 − X̅ 2 )T S −1 xo − ( ) (X̅ 1 − X̅ 2 )T S −1 (X̅ 1 + X̅ 2 )
2
c. Now compare tabulated value with calculated value of the function
 If calculated>0, we classify the vector in group 1.
 If calculated<0, we classify the vector in group 2.

Result :

1. Ho: There is no significane difference in the mean vectors of two species or the two
means are equal
Since Tcal= 45.8794> Ttab = 9.115003, therefore we reject the null hypothesis.
Therefore, there is significane difference in the mean vectors of two species or the two means are not
equal.

2. Value of linear discriminant function = 0.1123424 > 0.


Hence we classify the vector in group 1

18
Calculations

1.)
>n1<-19
>n2<-20
>p<-3
>X1_bar<-matrix(data=c(267.05,137.37,185.95),nrow=3)
>X1_bar
[,1]
[1,] 267.05
[2,] 137.37
[3,] 185.95
>X2_bar<-matrix(data=c(290.80,157.20,209.25),nrow=3)
>X2_bar
[,1]
[1,] 290.80
[2,] 157.20
[3,] 209.25
>S<-
matrix(data=c(367.79,121.88,106.24,121.88,118.31,42.06,106.24,42.06,208.07),nrow=3)
#Sample covariance
>S
[,1] [,2] [,3]
[1,] 367.79 121.88 106.24
[2,] 121.88 118.31 42.06
[3,] 106.24 42.06 208.07
>S_inv<-solve(S)
>S_inv
[,1] [,2] [,3]
[1,] 0.004509824 -0.0041236181 -0.0014691418
[2,] -0.004123618 0.0128773030 -0.0004975546
[3,] -0.001469142 -0.0004975546 0.0056567923
>T_sq<-n1*n2*(t(X1_bar-X2_bar)%*%S_inv%*%(X1_bar-X2_bar))/(n1+n2) #Hotelling
Tsq
>T_sq
[,1]
[1,] 45.8794
>F=2.874187 #5% CI, p=3,(n1+n2-p-1)=35 d.f.
>T_tab<-(p*(n1+n2-2))/((n1+n2-p-1))*F
>T_tab
[1] 9.115003

19
2.)
>x1mean<-matrix(data =c(267.05,137.37,185.95),byrow = FALSE,nrow=3 )

>x1mean
[,1]
[1,] 267.05
[2,] 137.37
[3,] 185.95
>x2mean<-matrix(data=c(290.80,157.20,209.25),byrow=FALSE,nrow = 3)
>x2mean
[,1]
[1,] 290.80
[2,] 157.20
[3,] 209.25
>xo=matrix(data = c(270.51,150.12,190.70),byrow = FALSE,nrow = 3)
>s<-matrix(data=c(367.79,121.88,106.24,121.88,118.31,42.06,106.24,42.06,208.07),byrow =
TRUE,nrow = 3)
>s
[,1] [,2] [,3]
[1,] 367.79 121.88 106.24
[2,] 121.88 118.31 42.06
[3,] 106.24 42.06 208.07
>sinverse=solve(s)
>sinverse
[,1] [,2] [,3]
[1,] 0.004509824 -0.0041236181 -0.0014691418
[2,] -0.004123618 0.0128773030 -0.0004975546
[3,] -0.001469142 -0.0004975546 0.0056567923
>b=((t(x1mean-x2mean))%*%sinverse%*%xo)-((1/2)*(t(x1mean-
x2mean))%*%sinverse%*%(x1mean+x2mean))
>b
[,1]
[1,] 0.1123424

20
Date:06- Feb-2018

Experiment 5

ABC company owns a large nine building complex in Central America and heats this complex by
using a modern coal- fuelled heating system. The company is facing problems in determining proper
amount of coal to be ordered each week to heat the complex adequately for the next week. You are
approached by the company to develop a regression model to predict the amount of coal (in tons) that
should be ordered each week to heat the complex adequately for the next week, on the basis of
following data;

Week Average Hourly Temperature Amount of Coal Ordered


( degree Fahrenheit) ( tons)
1 28 12.4
2 28 11.7
3 32.5 12.4
4 39 10.8
5 45.9 9.4
6 57.8 9.5
7 58.1 8
8 62.5 7.5

a) State the regression model along with its assumptions.

b) Estimate the parameters of the model and write down the fitted model.

c) Test for the significance of the average hourly temperature in predicting the amount of coal
ordered at 5% level of significance.

d) Construct 95% CI for regression coefficient.

e) Predict the amount of coal that should be ordered when value of average hourly temperature is 50
degree Fahrenheit.

f) What will be the change in the amount of coal to be ordered when average hourly temperature
increase by one degree Fahrenheit?

21
Report page

Object: From the given data:


3. State the regression model along with its assumptions .
4. Estimate the parameters of the model and write down the fitted model.
5. Test for the significance of the average hourly temperature in predicting the amount of coal
ordered at 5% level of significance.
6. Construct 95% CI for regression coefficient.
7. Predict the amount of coal that should be ordered when the value of average hourly
temperature is 50 degree Fahrenheit.
8. What will be the change in the amount of coal to be ordered when hourly temperature
increases by one degree Fahrenheit?

Procedure/ Formula Used:


a. Here ,Regression model can be written as :
Y=β0+X1β1+ε
where ε is the random error vector.
b. Fit the linear model by using the R function: lm( ) and obtain its summary i.e. minimum, first
quartile, median, third quartile and maximum using the function summary().
c. To obtain the Confidence Interval use formula :
Upper Limit = Estimate value + (t value * Std. Error )
Lower Limit = Estimate value - (t value * Std. Error )

Result:
1. The regression model can be written as :
Y=β0+X1β1+ε
The assumptions of the regression model are as follows –
i. The regression model is linear in parameters.
ii. The mean of residuals is zero.
iii. Homoscedasticity of residuals or equal variance.
iv. No auto correlation of residuals.
v. The X variables and residuals are uncorrelated.
vi. The number of observations must be great than the number of Xs.
vii. The variability in X values is positive.
viii. The regression model is correctly specified.
ix. No perfect multicolinearity.
x. Normality of residuals.
2. The fitted model is as follows:

22
Y=Intercept + β*X
The estimates of the parameters, i.e., intercept (a) = 15.83786 and β = (-0.1279)
→ Y= 15.83786-0.1279*X
3. Here amount of coal ordered(Y) is the dependent variable and Average hourly
temperature(X) is the predictor variable. Regression model can be written as :
Y= a+ Xβ + ε
p- Value=0.0003301; as the p-value is much less than 0.05, we reject the null
hypothesis that β = 0. Hence there is a significant relationship between the variables in the
linear regression model of the data set faithful.
4. Confidence interval for β1: (2.688e-05, -2.558)
Confidence interval for β0: (-0.003, 31.676)
5. The amount of coal that should be ordered when average hourly temperature is 50 degree
Fahrenheit =9.44186
6. The change in the amount of coal to be ordered when average hourly temperature increases
by one degree Fahrenheit would be = intercept + regression coefficient
= 15.83786-0.1279
= 15.70996

23
CALCULATION
>
x<-read.table("C:/Users/dell/Desktop/D.csv",sep
= "," , header = FALSE)
> m<-as.matrix(x)
> m
V1 V2
[1,] 28.0 12.4
[2,] 28.0 11.7
[3,] 32.5 12.4
[4,] 39.0 10.8
[5,] 45.9 9.4
[6,] 57.8 9.5
[7,] 58.1 8.0
[8,] 62.5 7.5
> reg<-lm(m[,2]~m[,1])
> reg

Call:
lm(formula = m[, 2] ~ m[, 1])

Coefficients:
(Intercept) m[, 1]
15.8379 -0.1279
> summary(reg)

Call:
lm(formula = m[, 2] ~ m[, 1])

Residuals:
Min 1Q Median 3Q Max
-0.5663 -0.4432 -0.1958 0.2879 1.0560

Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 15.83786 0.80177 19.754 1.09e-
06
m[, 1] -0.12792 0.01746 -7.328
0.00033

(Intercept) ***
m[, 1] ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.6542 on 6 degrees of


freedom
Multiple R-squared: 0.8995, Adjusted R-
squared: 0.8827
F-statistic: 53.69 on 1 and 6 DF, p-value:
0.0003301
> y<-(-0.12792*50)+15.83786 #Predicted value of
amount of coal at 50 degree F
> y
[1] 9.44186
> cibeeta1u<-(-0.12792)+(-7.328*0.01746)
> cibeeta1u
[1] -0.2558669
> cibeeta1l<-(-0.12792)-(-7.328*0.01746)
> cibeeta1l
[1] 2.688e-05
> cibeeta0u<-(15.83786)+(19.754*0.80177)
> cibeeta0u

24
[1] 31.67602
> cibeeta0u<-(15.83786)-(19.754*0.80177)
> cibeeta0u
[1] -0.00030458

>

25
Date:- 06/02/2018

Experiment 6

Wildcats are wells drilled to find and produce oil and / or gas in an improved area or to find a new
reservoir in a field previously found to be productive of oil or gas or to extend the limit of known oil or
gas reservoir. Table gives data related to wildcat activity in Iran.

SR.No. Thousands of wild cats Per barrel Price in $ GNP in $ hundred


billion

(Y) (X1) (X2)

1 8 4.8 4.8

2 9 4.8 4.9

3 10 4.6 5.3

4 12 4.4 5.7

5 13 4.3 5.9

6 13 4.5 6.2

7 13 4.6 6.1

8 14 4.5 6.5

9 16 4.4 6.6

10 14 4.7 6.8

Consider the model

Y = βo + β1X1 + β2X2 +

1- Estimate the parameter of the model.

2- Write the 90% C.I. for βo , β1 , β2

3- Test for the significance of βo , β1 , β2

26
Report Page

Object: Our objective is to find-


1. Estimate the parameters of the model
2. Write 90% C.I for 𝛽 0, 𝛽 1, 𝛽 2
3. Test for the significance of 𝛽 0, 𝛽 1, 𝛽 2

Procedure/ Formula Used:


a. Here ,Regression model can be written as :
Y=𝛽 0+X1𝛽 1+X2𝛽 2+𝜀
where ε is the random error vector.
b. Fit the linear model by using the R function: lm( ) and obtain its summary i.e. minimum, first
quartile, median, third quartile and maximum using the function summary().
c. To obtain the Confidence Interval use formula :
Upper Limit = Estimate value + (t value * Std. Error )
Lower Limit = Estimate value - (t value * Std. Error )

Result:
1. The fitted model is as follows:
Y=Intercept + β1* X1+X2*β2
Y = -0.6794 + (-14276) * X1 + 3.2635 * X2
And the estimates of the parameters i.e.-
intercept (𝛽 0,) = (-0.6794) ,𝛽 1 = (-1.427) and 𝛽 2= 3.2635.
2. 90% C.I for -
𝛽 0 lies between -0.0002908 and-1.358509.
𝛽 1 lies between -1.358509 and 0.0005943
𝛽 2 lies between -0.0003398 and 6.52734
3. p-value = 7.581e-05. As the p-value is much less than 0.05, we reject the null hypothesis
that β = 0. Hence there is a significant relationship between the variables in the linear
regression model of the data set and the regression model statistically significantly predicts
the outcome variable (i.e., it is a good fit for the data).

27
CALCULATION

>x<-read.table("C:/Users/dell/Desktop/
rajamit.CSV" ,sep = ",", header = TRUE)
> m<-as.matrix(x)
> m
Y X1 X2
[1,] 8 4.8 4.8
[2,] 9 4.8 4.9
[3,] 10 4.6 5.3
[4,] 12 4.4 5.7
[5,] 13 4.3 5.9
[6,] 13 4.5 6.2
[7,] 13 4.6 6.1
[8,] 14 4.5 6.5
[9,] 16 4.4 6.6
[10,] 14 4.7 6.8
> reg<-lm(m[,1]~m[,2]+m[,3])
> reg

Call:
lm(formula = m[, 1] ~ m[, 2] + m[, 3])

Coefficients:
(Intercept) m[, 2] m[, 3]
11.266 -3.617 2.964

> summary(reg)

Call:
lm(formula = m[, 1] ~ m[, 2] + m[, 3])

Residuals:
Min 1Q Median 3Q Max
-0.4205 -0.3158 -0.2225 0.1868 1.0871

Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 11.2662 6.7810 1.661
0.1406
m[, 2] -3.6173 1.2521 -2.889
0.0233
m[, 3] 2.9641 0.3068 9.660 2.69e-
05

(Intercept)
m[, 2] *
m[, 3] ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5608 on 7 degrees of


freedom
Multiple R-squared: 0.9604, Adjusted R-
squared: 0.9491
F-statistic: 84.91 on 2 and 7 DF, p-value:
1.235e-05

> cibeeta1u<-(-1.4276)+(-0.877*1.6285)
> cibeeta1u
[1] -2.855795
> cibeeta1l<-(-1.4276)-(-0.877*1.6285)
> cibeeta1l

28
[1] 0.0005945
> cibeeta0u<-(-0.6794)+(-0.077*8.8196)
> cibeeta0u
[1] -1.358509
> cibeeta0l<-(-0.6794)-(-0.077*8.8196)
> cibeeta0l
[1] -0.0002908
> cibeeta2u<-(3.2635)+(8.178*0.3991)
> cibeeta2u
[1] 6.52734
> cibeeta2l<-(3.2635)-(8.178*0.3991)
> cibeeta2l
[1] -0.0003398

>

29
Date: 12-02-2018

Experiment -7

The following measurements are taken as the two trivariate normal populations:

Sample 1 Sample 2
X1 X2 X3 Y1 Y2 Y3
4.8 3.4 1.6 5.6 2.9 3.6
5.4 3.7 1.5 6.1 2.9 3.7
4.9 3.1 1.5 6 2.2 4
4.4 2.9 1.4 5.9 3 4.2
5 3.4 1.5 5 2 3.5
5.1 3.5 1.4 7 3.2 4.7
4.9 3 1.4 6.4 3.2 4.5
4.7 3.2 1.3 6.9 3.1 4.9
4.6 3.1 1.5 5.5 2.3 4
5 3.6 1.4 6.5 2.8 4.6
5.4 3.9 1.7 5.2 2.7 3.9
4.6 3.4 1.4 6.6 2.9 4.6
4.9 2.4 3.3
6.3 3.3 4.7
5.7 2.8 4.5

Test the hypothesis that the samples come from two trivariate normal populations with the
same mean(assume the variance-covariance matrix for the trivariate normal populations are the
same).

30
Report Page

Object: To test the hypothesis that the samples come from two trivariate normal populations
with the same mean(assume the variance-covariance matrix for the trivariate normal
populations are the same).

Procedure / Formula used:

Result:

Here we get
2
𝑇𝑐𝑎𝑙 = 760.9092
2
𝑇𝑡𝑎𝑏 = 9.873908
2 2
Since 𝑇𝑐𝑎𝑙 > 𝑇𝑡𝑎𝑏 , So we reject the null hypothesis at 95% level of significance and conclude
that samples do not come from two trivariate normal populations with the same mean, i.e.,
µ1≠µ2

31
Calculations

> Z1<-read.csv("F:/sem2/exp7(Sample1).csv",header = TRUE,sep=",")


> Z1
X1 X2 X3
1 4.8 3.4 1.6
2 5.4 3.7 1.5
3 4.9 3.1 1.5
4 4.4 2.9 1.4
5 5.0 3.4 1.5
6 5.1 3.5 1.4
7 4.9 3.0 1.4
8 4.7 3.2 1.3
9 4.6 3.1 1.5
10 5.0 3.6 1.4
11 5.4 3.9 1.7
12 4.6 3.4 1.4
> Z2<-read.csv("F:/sem2/exp7(Sample2).csv",header = TRUE,sep=",")
> Z2
Y1 Y2 Y3
1 5.6 2.9 3.6
2 6.1 2.9 3.7
3 6.0 2.2 4.0
4 5.9 3.0 4.2
5 5.0 2.0 3.5
6 7.0 3.2 4.7
7 6.4 3.2 4.5
8 6.9 3.1 4.9
9 5.5 2.3 4.0
10 6.5 2.8 4.6
11 5.2 2.7 3.9
12 6.6 2.9 4.6
13 4.9 2.4 3.3
14 6.3 3.3 4.7
15 5.7 2.8 4.5
32
> x_bar<- as.matrix(colMeans(Z1,na.rm = FALSE,dims = 1L))
> x_bar
[,1]
X1 4.900000
X2 3.350000
X3 1.466667
> y_bar<- as.matrix(colMeans(Z2,na.rm = FALSE,dims = 1L))
> y_bar
[,1]
Y1 5.973333
Y2 2.780000
Y3 4.180000
> n1<-12
> n2<-15
> p<-3
> #calculation of sum of square matrix for sample1
> a11<-sum(Z1[,1]*Z1[,1])
> a11
[1] 289.16
> a12<-sum(Z1[,1]*Z1[,2])
> a12
[1] 197.8
> a13<-sum(Z1[,1]*Z1[,3])
> a13
[1] 86.42
> a22<-sum(Z1[,2]*Z1[,2])
> a22
[1] 135.66
> a23<-sum(Z1[,2]*Z1[,3])
> a23
[1] 59.14
> a33<-sum(Z1[,3]*Z1[,3])
> a33
[1] 25.94
> ZX<-matrix(data = c(a11,a12,a13,a12,a22,a23,a13,a23,a33),nrow = 3,ncol = 3,byrow = T)

33
> ZX
[,1] [,2] [,3]
[1,] 289.16 197.80 86.42
[2,] 197.80 135.66 59.14
[3,] 86.42 59.14 25.94
> A1<-ZX-(n1*(x_bar%*%t(x_bar)))
> A1
X1 X2 X3
X1 1.04 0.82 0.1800000
X2 0.82 0.99 0.1800000
X3 0.18 0.18 0.1266667
> S1<-A1/(n1-1)
> S1
X1 X2 X3
X1 0.09454545 0.07454545 0.01636364
X2 0.07454545 0.09000000 0.01636364
X3 0.01636364 0.01636364 0.01151515
>
> #calculation of sum of square matrix for sample2
> b11<-sum(Z2[,1]*Z2[,1])
> b11
[1] 541.24
> b12<-sum(Z2[,1]*Z2[,2])
> b12
[1] 251.64
> b13<-sum(Z2[,1]*Z2[,3])
> b13
[1] 378.49
> b22<-sum(Z2[,2]*Z2[,2])
> b22
[1] 118.07
> b23<-sum(Z2[,2]*Z2[,3])
> b23
[1] 176.18
> b33<-sum(Z2[,3]*Z2[,3])

34
> b33
[1] 265.65
> ZY<-matrix(data = c(b11,b12,b13,b12,b22,b23,b13,b23,b33),nrow = 3,ncol = 3,byrow = T)
> ZY
[,1] [,2] [,3]
[1,] 541.24 251.64 378.49
[2,] 251.64 118.07 176.18
[3,] 378.49 176.18 265.65
> A2<-ZY-(n2*(y_bar%*%t(y_bar)))
> A2
Y1 Y2 Y3
Y1 6.029333 2.552 3.962
Y2 2.552000 2.144 1.874
Y3 3.962000 1.874 3.564
> S2<-A2/(n2-1)
> S2
Y1 Y2 Y3
Y1 0.4306667 0.1822857 0.2830000
Y2 0.1822857 0.1531429 0.1338571
Y3 0.2830000 0.1338571 0.2545714
> S_combined<-(((n1-1)*S1)+((n2-1)*S2))/(n1+n2-2)
> S_combined
X1 X2 X3
X1 0.2827733 0.13488 0.1656800
X2 0.1348800 0.12536 0.0821600
X3 0.1656800 0.08216 0.1476267
> S_inverse<-solve(S_combined)
> S_inverse
X1 X2 X3
X1 13.519734 -7.244606 -11.141163
X2 -7.244606 16.439384 -1.018606
X3 -11.141163 -1.018606 19.844359
> #Calculation of Hotelling T-square statistic
> t_sq<-((n1*n2)*(t(x_bar-y_bar)%*%S_inverse%*%(x_bar-y_bar)))/(n1+n2)
> t_sq

35
[,1]
[1,] 760.9092
> F<-3.027998384 #F(alpha=0.05,p,n1+n2-p-1)
> T_tab<-((n1+n2-2)*p*F)/(n1+n2-p-1) #Tabulated value of T_sq
> T_tab
[1] 9.873908

36
Date : 14/02/2018
Experiment No. 8
The researchers were interested in comparing three strategies for teaching reading
comprehension to fourth grade students (our “analysis units”). One strategy was to teacher students a
number of reading comprehension monitoring strategies. This approach was called “Think Aloud”
(TA). A second strategy was labeled “Directed Reading and Thinking Activity” (DRTA) which required
students to make predictions and evaluate their predictions as they read stories. The third strategy,
labeled “Directed Reading Activity” (DRA) was an instructed control condition using a common
approach to teaching reading comprehension. Following the intervention period, measures on three
outcome variables were obtained. The first variable was “Error Detection Task” (EDT, Y1) where
students were asked to identify inconsistencies (errors) in a story passage. The second variable was
measured via the “Degrees of Reading Power” (DRP, Y2), a standardized test of reading
comprehension. The third variable was based on a comprehension monitoring questionnaire that
asked students questions on the strategies they used while reading to increase their comprehension.
Scores on the Error Detection Task (Y1) and Degrees of Reading Power (Y2) for the Think Aloud (TA),
Directed Reading Activity (DRA) and Directed Reading and Think Aloud (DRTA) Groups are given below
: Y1 Y2
5 34
TA DRTA 9 36
Y1 Y2 DRA Y1 Y2 5 42
4 43 6 27 7 37
4 34 6 36 4 44
4 45 5 51 9 49
3 39 5 51 3 38
8 40 0 50 4 38
1 27 6 55 2 38
7 46 6 52 5 50
7 39 11 48 7 31
9 31 6 53 8 49
6 39 8 45 10 54
4 40 8 47 9 52
12 52 3 51 12 50
14 53 7 30 5 35
12 53 7 50 8 36
7 41 6 55 12 46
5 41 9 48 4 42
9 46 7 52 8 47
13 52 6 46 6 39
11 55 7 36 5 38
5 36 6 45
11 50 6 49
15 54 6 49

37
Whether the centeroids for the three populations are identical.

38
Report Page
Object-
Procedure and formula used-
1. Here , the hypothesis to be tested is –
HO: 𝜇 1 = 𝜇 2 = 𝜇 3
Where 𝜇 1 is the population mean vector of using 1st strategy (TA) , 𝜇 2 is the
population mean vector of using 2nd strategy (DRTA) and 𝜇 3 population mean vector of using
3rd strategy (DRA) for teaching reading comprehension to fourth-grade students.
2. Since here we have more than two population vector therefore for testing our null
hypothesis we use Multivariate analysis.
3. First we have to calculate an error sum of squares and cross products matrix E and a
matrix for the hypothesis H are needed by using the formula as follows
E= A+B+C
Where A= ∑Xi2 – N1X̅X̅’ B= ∑Yi2 – N2Y̅Y̅’ and C= ∑Zi2 – N3Z̅Z̅’
4. Then calculate the grand mean centroid by using the formula,
m̅ = (X̅+Y̅+Z̅) /3
then separate the group centroids from the grand mean centroids.
5. Then SSCP for each group is calculated as,
̅ )′here y̅j is used for general term.
̅ )(𝒚̅𝒋 − 𝒎
𝒏𝒋(𝒚̅𝒋 − 𝒎
6. H is calculated by summing each SSCP as
H = ∑ 𝑺𝑺𝑪𝑷𝒋
Here j= 1,2,3
7. Then by using Wilk’s criterian we test our hypothesis by using the formula
𝑰𝑬𝑰
λ=
𝑰𝑯+𝑬𝑰
Result-
From all criterion we conclude that calculated value is geater than tabulated value so
we reject the null hypothesis i.e centroids of the populations are not same.

39
Calculation
> X<-read.table("C:/Users/hp/Desktop/Book8(1).csv",header=TRUE,sep=",")
#sample group of TA
>X
X1 X2
1 4 43
2 4 34
3 4 45
4 3 39
5 8 40
6 1 27
7 7 46
8 7 39
9 9 31
10 6 39
11 4 40
12 12 52
13 14 53
14 12 53
15 7 41
16 5 41
17 9 46
18 13 52
19 11 55
20 5 36
21 11 50
22 15 54
>
> Y<-read.table("C:/Users/hp/Desktop/Book8(2).csv",header=TRUE,sep=",")
#sample group of DRTA
>Y
Y1 Y2
1 6 27
2 6 36
3 5 51
4 5 51
5 0 50
6 6 55
7 6 52
8 11 48
9 6 53
10 8 45
11 8 47
12 3 51
13 7 30
14 7 50
15 6 55
16 9 48
17 7 52

40
18 6 46
19 7 36
20 6 45
21 6 49
22 6 49
>
> Z<-read.table("C:/Users/hp/Desktop/Book8(3).csv",header=TRUE,sep=",")
#sample group of DRA
>Z
Z1 Z2
1 5 34
2 9 36
3 5 42
4 7 37
5 4 44
6 9 49
7 3 38
8 4 38
9 2 38
10 5 50
11 7 31
12 8 49
13 10 54
14 9 52
15 12 50
16 5 35
17 8 36
18 12 46
19 4 42
20 8 47
21 6 39
22 5 38
>
> N1<-22 #size of sample 1
> N1
[1] 22
> N2<-22 #size of sample 2
> N2
[1] 22
> N3<-22 #size of sample 3
> N3
[1] 22
> p<-2 #no. of variables
>p
[1] 2
>
>X.bar<-colMeans(X,na.rm=FALSE,dim=1) #sample mean vector for TA group
>X.bar
X1 X2
7.772727 43.454545

41
>Y.bar<-colMeans(Y,na.rm=FALSE,dim=1) #sample mean vector for DRTA group
>Y.bar
Y1 Y2
6.227273 46.636364
>Z.bar<-colMeans(Z,na.rm=FALSE,dim=1) #sample mean vector for DRA group
>Z.bar
Z1 Z2
6.681818 42.045455
>
> a11<-sum(X[,1]*X[,1])
> a11
[1] 1653
> a12<-sum(X[,1]*X[,2])
> a12
[1] 7949
> a22<-sum(X[,2]*X[,2])
> a22
[1] 42840
> x<-matrix(data=c(a11,a12,a12,a22),nrow=2,ncol=2,byrow=T)
>x
[,1] [,2]
[1,] 1653 7949
[2,] 7949 42840
>
> A<-x-(N1*(X.bar%*%t(X.bar))) #variance covariance matrix for TA
>A
X1 X2
[1,] 323.8636 518.2727
[2,] 518.2727 1297.4545
>
> b11<-sum(Y[,1]*Y[,1])
> b11
[1] 945
> b12<-sum(Y[,1]*Y[,2])
> b12
[1] 6337
> b22<-sum(Y[,2]*Y[,2])
> b22
[1] 49076
> y<-matrix(data=c(b11,b12,b12,b22),nrow=2,ncol=2,byrow=T)
>y
[,1] [,2]
[1,] 945 6337
[2,] 6337 49076
>
> B<-y-(N2*(Y.bar%*%t(Y.bar))) #variance covariance matrix for DRTA
>B
Y1 Y2
[1,] 91.86364 -52.18182
[2,] -52.18182 1227.09091

42
>
> c11<-sum(Z[,1]*Z[,1])
> c11
[1] 1143
> c12<-sum(Z[,1]*Z[,2])
> c12
[1] 6372
> c22<-sum(Z[,2]*Z[,2])
> c22
[1] 39811
> z<-matrix(data=c(c11,c12,c12,c22),nrow=2,ncol=2,byrow=T)
>z
[,1] [,2]
[1,] 1143 6372
[2,] 6372 39811
>
> C<-z-(N3*(Z.bar%*%t(Z.bar))) #variance covariance matrix for DRA
>C
Z1 Z2
[1,] 160.7727 191.3182
[2,] 191.3182 918.9545
>
> #the error matrix is given by
> E<-A+B+C
>E
X1 X2
[1,] 576.5000 657.4091
[2,] 657.4091 3443.5000
>
> #the grand mean centroid is given as
>m.bar<-(X.bar+Y.bar+Z.bar)/3
> m.bar
X1 X2
6.893939 44.045455
>
> #SSCP for each group is computed as
> SSCP1<-N1*((X.bar-m.bar)%*%t(X.bar-m.bar))
> SSCP1
X1 X2
[1,] 16.98990 -11.424242
[2,] -11.42424 7.681818
> SSCP2<-N2*((Y.bar-m.bar)%*%t(Y.bar-m.bar))
> SSCP2
Y1 Y2
[1,] 9.777778 -38.0000
[2,] -38.000000 147.6818
> SSCP3<-N3*((Z.bar-m.bar)%*%t(Z.bar-m.bar))
> SSCP3

43
Z1 Z2
[1,] 0.989899 9.333333
[2,] 9.333333 88.000000
>
> H<-SSCP1+SSCP2+SSCP3
>H
X1 X2
[1,] 27.75758 -40.09091
[2,] -40.09091 243.36364
> #by Wilk's lambda criterian,the calculated value is given as
> lambda<-(det(E)/det(H+E))
> lambda
[1] 0.8409394

>

44
Date: 26/02/2018
Experiment – 9
To compare two types of coating for resistance to corrosion, 15 pieces of pipe were coated with
each type of coating (Kramer and Jensen 1969b). Two pipes, one with each type of coating, were
buried together and left for the same length of time at 15 different locations, providing a natural
pairing of the observation. Corrosion for the first type of coating was measured by two variables:
X1 = Maximum depth of pit in thousandths of an inch
X2 = Number of pits
with Y1 and Y2 defined analogously for the second coating.

Depth of Maximum Pits and Number of Pits of coated pipes


Location Coating 1 Coating 2
Depth X1 Number X2 Depth Y1 Number Y2
1 73 31 51 35
2 43 19 41 14
3 47 22 43 19
4 53 26 41 29
5 58 36 47 34
6 47 30 32 26
7 52 29 24 19
8 38 36 43 37
9 61 34 53 24
10 56 33 52 27
11 56 19 57 14
12 34 19 44 19
13 55 26 57 30
14 65 16 40 7
15 75 18 68 13

45
Report Page
Object:Compare two types of coating for resistance to corrosion.

Procedure and formula used:


a. First calculate the difference vector D by calculating differences for each characteristics
di = X1i − X2i
where I is the characteristic measured and 1,2 represent the two groups.
b. Now the difference vector D will be:
D = [d1 d2]
c. Find the mean vector of the differences:
MD = [md1 md2]
where mdi denotes the mean of the ith characteristic.
d. Find the covariance of matrix D let it be S.
e. Now calculate T2 calculated:
T2=MD’S-1MD
f. Now calculate the value of T2 tabulated by :
𝑁−1
𝑇2 = 𝐹(α, 𝑝, 𝑁 − 𝑝)
(𝑁 − 𝑝)𝑝
g. Now compare tabulated value with calculated value of T2
 If calculated>Tabulated, reject the null hypothesis
 If Calculated<tabulated, Accept the null hypothesis

Result:
Ho: There is no significant difference in the two coatings.
Since the calculated value = 10.718 > tabulated = 1.766, we reject the null hypothesis and conclude
that there is a significant difference in the two coating sheets.

46
Calculations
>data<-read.table("D:/amit/Exp-9.csv",sep=",",header=TRUE)
>z<-as.matrix(data)
>z
Depth.X1 Number.X2 Depth.Y1 Number.Y2
[1,] 73 31 51 35
[2,] 43 19 41 14
[3,] 47 22 43 19
[4,] 53 26 41 29
[5,] 58 36 47 34
[6,] 47 30 32 26
[7,] 52 29 24 19
[8,] 38 36 43 37
[9,] 61 34 53 24
[10,] 56 33 52 27
[11,] 56 19 57 14
[12,] 34 19 44 19
[13,] 55 26 57 30
[14,] 65 16 40 7
[15,] 75 18 68 13
>n<-15
>p<-2
>d1<-z[,1]-z[,3]
>d2<-z[,2]-z[,4]
>n<-length(d1)
>d<-matrix(data = c(d1,d2),byrow = FALSE,nrow = n)
>d
[,1] [,2]
[1,] 22 -4
[2,] 2 5
[3,] 4 3
[4,] 12 -3
[5,] 11 2
[6,] 15 4
[7,] 28 10
[8,] -5 -1
[9,] 8 10
[10,] 4 6
[11,] -1 5
[12,] -10 0
[13,] -2 -4
[14,] 25 9
[15,] 7 5

47
>mean_d1<-mean(d1) #Mean of d1
>mean_d1
[1] 8
>mean_d2<-mean(d2) #Mean of d2
>mean_d2
[1] 3.133333
>mean_d<-matrix(data = c(mean_d1,mean_d2),byrow = FALSE,nrow = 2)
>mean_d
[,1]
[1,] 8.000000
[2,] 3.133333
>cv<-cov(d) #Variance Covariance Matrix
>cv
[,1] [,2]
[1,] 121.57143 18.28571
[2,] 18.28571 22.55238
>cvinverse<-solve(cv) #Inverse of covariance matrix
>cvinverse
[,1] [,2]
[1,] 0.009368105 -0.007595761
[2,] -0.007595761 0.050499941
>tsq<-(n*t(mean_d)%*%cvinverse%*%(mean_d)) #Calculated T sq
>tsq
[,1]
[1,] 10.71833
>f<-qf(0.95,df1=p,df2=(n-p)) #F Value
>f
[1] 3.805565
>tsqtab<-((n-p)/(p*(n-1)))*f #Tabulated T sq
>tsqtab
[1] 1.76687

48
Date: 26/02/018

Experiment No. 10

Twenty engineer apprentices and20 pilots were given six tests (Travers 1939). The variable
were
y1= dynamometer,y2 = dotting,y3 =sensory motor coordination,y4= perseveration

Comaparision of Four Tests on Engineer Apprentices and Pilots


Engineer Apprentices Pilots
Y1 Y2 Y3 Y4 Y1 Y2 Y3 Y4
74 223 54 254 77 232 50 249
80 175 40 300 79 192 64 315
87 266 41 223 96 250 55 319
66 178 80 209 67 291 48 310
71 175 38 261 96 239 42 268
57 241 59 245 87 231 40 217
52 194 72 242 87 227 30 324
89 200 85 242 102 234 58 300
91 198 50 277 104 256 58 270
72 162 47 268 82 240 30 322
87 170 60 244 80 227 58 217
88 208 51 228 83 116 39 306
60 232 29 279 83 183 57 242
73 159 39 233 94 227 30 240
83 152 88 233 78 258 42 271
80 195 36 241 89 283 66 291
73 152 42 249 83 257 31 311
76 223 74 268 100 252 30 225
83 164 31 243 105 250 27 243
82 188 57 267 76 187 30 264

49
Report Page
Object: To test the hypothesis that the samples come from two trivariate normal populations with
the same mean.
Procedure/Formula Used:

y1i
a. Calculate the mean vector for each group i.e. y.i = (y2i )
y3i
b. Find a matrix M by following way
∑x12 ∑x1x2∑x1x3…. ∑x1xp
∑x1x2 ∑x22 ∑x2x3…. ∑x2xp
:
:
∑x1xp∑x2xp∑xpx3…. ∑xp2px p

c. First we need to calculate the matrix Ai by the following formula:


Ai = Ai − Ny̅.i (y̅.i )′

d. Now calculate variance covariance matrix Si by the following formula


Si = Ai /(ni − 1)

e. Calculate S pooled by the formula


(n1 − 1)S1 + (n2 − 1)S2
S=
n1 + n2 − 2

f. Finally the value of T2 statistic is given by:


n1 n2 (y̅̄.1 − y̅.2 )′ S −1 (y̅̄.1 − y̅.2 )
T2 =
n1 + n2

g. Now calculate the value of T2 tabulated by:


(𝑛1 + 𝑛2 − 2)𝑝
T2 = 𝐹
(𝑛1 + 𝑛2 − 𝑝 − 1) (𝛼,𝑝,(𝑛1 +𝑛2 −𝑝−1))

h. Now compare tabulated value with calculated value of T2


 If calculated>Tabulated, reject the null hypothesis
 If Calculated<tabulated, Accept the null hypothesis
Result:

Ho: The samples come from the population with same mean.
Now, calculated value = 39.49974 and tabulated value = 11.47151
Since calculated value is more than tabulated value thus we reject Ho and conclude that the samples
come from population with different means.

50
Calculation
> data<-read.table("E:/amitwar/Exp-10.csv",sep=",",header=TRUE)

> m<-as.matrix(data)

>m

Y1 Y2 Y3 Y4 Y1.1 Y2.1 Y3.1 Y4.1

[1,] 74 223 54 254 77 232 50 249

[2,] 80 175 40 300 79 192 64 315

[3,] 87 266 41 223 96 250 55 319

[4,] 66 178 80 209 67 291 48 310

[5,] 71 175 38 261 96 239 42 268

[6,] 57 241 59 245 87 231 40 217

[7,] 52 194 72 242 87 227 30 324

[8,] 89 200 85 242 102 234 58 300

[9,] 91 198 50 277 104 256 58 270

[10,] 72 162 47 268 82 240 30 322

[11,] 87 170 60 244 80 227 58 317

[12,] 88 208 51 228 83 216 39 306

[13,] 60 232 29 279 83 183 57 242

[14,] 73 159 39 233 94 227 30 240

[15,] 83 152 88 23 78 258 42 271

[16,] 80 195 36 241 89 283 66 291

[17,] 73 152 42 249 83 257 31 311

[18,] 76 223 74 268 100 252 30 225

[19,] 83 164 31 243 105 250 27 243

[20,] 82 188 57 267 76 187 30 264

> n1<-20

> n2<-20

> p<-4

> cv<-cov(m)

51
> cv

Y1 Y2 Y3 Y4 Y1.1 Y2.1 Y3.1 Y4.1

Y1 121.116 -37.789 0.021 -79.221 33.968 31.926 45.053 76.484

Y2 -37.789 1000.197 -24.355 441.737 60.684 -140.684 125.487 -268.316

Y3 0.021 -24.355 322.450 -511.863 -39.484 127.905 -17.697 70.547

Y4 -79.221 441.737 -511.863 3057.642 126.926 -575.768 67.211 -40.274

Y1.1 33.968 60.684 -39.484 126.926 110.884 43.853 -7.947 -111.874

Y2.1 31.926 -140.684 127.905 -575.768 43.853 792.568 8.158 129.768

Y3.1 45.053 125.487 -17.697 67.211 -7.947 8.158 173.671 116.211

Y4.1 76.484 -268.316 70.547 -40.274 -111.874 129.768 116.211 1253.747

> s1<-cv[c(1,2,3,4),c(1,2,3,4)]

> s1

Y1 Y2 Y3 Y4

Y1 121.11578947 -37.78947 0.02105263 -79.22105

Y2 -37.78947368 1000.19737 -24.35526316 441.73684

Y3 0.02105263 -24.35526 322.45000000 -511.86316

Y4 -79.22105263 441.73684 -511.86315789 3057.64211

> s2<-cv[c(5,6,7,8),c(5,6,7,8)]

> s2

Y1.1 Y2.1 Y3.1 Y4.1

Y1.1 110.884211 43.852632 -7.947368 -111.8737

Y2.1 43.852632 792.568421 8.157895 129.7684

Y3.1 -7.947368 8.157895 173.671053 116.2105

Y4.1 -111.873684 129.768421 116.210526 1253.7474

> s<-((n1-1)*s1+(n2-1)*s2)/(n1+n2-2)

>s

52
Y1 Y2 Y3 Y4

Y1 116.000000 3.031579 -3.963158 -95.54737

Y2 3.031579 896.382895 -8.098684 285.75263

Y3 -3.963158 -8.098684 248.060526 -197.82632

Y4 -95.547368 285.752632 -197.826316 2155.69474

>

> yebar<-matrix(data = c(mean(m[,1]),mean(m[,2]),mean(m[,3]),mean(m[,4])),byrow =


FALSE,nrow = 4)

> yebar

[,1]

[1,] 76.20

[2,] 192.75

[3,] 53.65

[4,] 239.80

> yqbar<-matrix(data = c(mean(m[,5]),mean(m[,6]),mean(m[,7]),mean(m[,8])),byrow =


FALSE,nrow = 4)

> yqbar

[,1]

[1,] 87.40

[2,] 236.60

[3,] 44.25

[4,] 280.20

> sinverse<-solve(s)

> sinverse

Y1 Y2 Y3 Y4

Y1 0.0090306029 -0.0001759679 0.0005139580 0.0004707568

Y2 -0.0001759679 0.0011701872 -0.0001019960 -0.0001722762

Y3 0.0005139580 -0.0001019960 0.0043861027 0.0004388096

Y4 0.0004707568 -0.0001722762 0.0004388096 0.0005478587

> tsq<-(n1*n2*(t(yebar-yqbar))%*%sinverse%*%(yebar-yqbar))/(n1+n2)

53
> tsq

[,1]

[1,] 39.49975

> f<-qf(0.95,df1=p,df2=(n1+n2-p-1))

>f

[1] 2.641465

> tsqtab<-((p*(n1+n2-2))/(n1+n2-p-1))*f

> tsqtab

[1] 11.47151

>

54
Date: 12/03/2018
Experiment No– 11
Eggs are usually classified into two grades A and B by visual inspection. In order to examine if these
grades differ in respect of four important characters. Yolk shadow (x1), yolk colour (x2), albumn index
(x3) and albumn height (x4).
25 eggs of grade A and 33 eggs of grade B were observed for these characters. The following table
gives the same values and corrected sum of squares and products –

Grade A (sample size 25)

Corrected Sum of squares and products


mean x1 x2 x3 x4
x1 7.16 106.32 10.32 3.6 -12.16
x2 13.92 85.84 -21.8 12.08
x3 21.6 536 -486.6
x4 26.04 532.96

Grade B (sample size 33)

Corrected sum of squares and products


mean x1 x2 x3 x4
x1 10.3 40.97 -1.03 146.67 -104.91
x2 15.3 64.97 -13.33 -5.91
x3 28.33 1133.33 -640
x4 20.09 506.73

Test whether the two grades of eggs differ with respect to their means. What is the distance
between two grades of eggs.

55
Report Page
Object:Test whether the two grades of eggs differ with respect to their means. What is the distance
between two grades of eggs.
Procedure/Formula Used:

a. Now calculate variance covariance matrix Si by the following formula


Si = Ai /(ni − 1)
b. Calculate S pooled by the formula
(n1 − 1)S1 + (n2 − 1)S2
𝑆=
n1 + n2 − 2
c. Finally the value of T2 statistic is given by:
n1 n2 (y̅̄.1 − y̅.2 )′ S−1 (y̅̄.1 − y̅.2 )
𝑇2 =
n1 + n2
d. Now calculate the value of T2 tabulated by:
(𝑛1 + 𝑛2 − 2)𝑝
T2 = 𝐹
(𝑛1 + 𝑛2 − 𝑝 − 1) (α,p,(n1 +n2 −p−1))
e. Now compare tabulated value with calculated value of T2
 If calculated>Tabulated, reject the null hypothesis
 If Calculated<tabulated, Accept the null hypothesis
f. For the calculation of difference:
∆2 = (y̅1-y̅2)’∑-1(y̅1-y̅2)
n1+n2−2
where ∑ = n1+n2

Result
Ho: The two grades of eggs differ with respect to their means.
Now, calculated value = 71.42995 and tabulated value = 10.76161
Since calculated value is more than tabulated value thus we reject Ho and conclude that the two
grades of eggs do not differ with respect to their means.
The distance between two grades of eggs is 5.20109

56
Calculation
> n1<-25
> n2<-33
> p<-4
> x1bar<-matrix(data=c(7.16,13.92,21.6,26.04),nrow=4,byrow=FALSE)
> x1bar
[,1]
[1,] 7.16
[2,] 13.92
[3,] 21.60
[4,] 26.04
> x2bar<-matrix(data=c(10.3,15.3,28.33,20.09),nrow=4,byrow=FALSE)
> x2bar
[,1]
[1,] 10.30
[2,] 15.30
[3,] 28.33
[4,] 20.09
> A1<-matrix(data = c(106.32,10.32,3.6,-12.16,10.32,85.84,-21.8,12.08,3.6,-21.8,536,-486.6,-12.16,12.08,-
486.6,532.96),nrow = 4,byrow = FALSE)
> A1
[,1] [,2] [,3] [,4]
[1,] 106.32 10.32 3.6 -12.16
[2,] 10.32 85.84 -21.8 12.08
[3,] 3.60 -21.80 536.0 -486.60
[4,] -12.16 12.08 -486.6 532.96
> A2<-matrix(data = c(40.97,-1.03,146.67,-104.91,-1.03,64.97,-13.33,-5.91,146.67,-13.33,1133.33,-640,-104.91,-5.91,-
640,506.73),nrow = 4,byrow = FALSE)
> A2
[,1] [,2] [,3] [,4]
[1,] 40.97 -1.03 146.67 -104.91
[2,] -1.03 64.97 -13.33 -5.91
[3,] 146.67 -13.33 1133.33 -640.00
[4,] -104.91 -5.91 -640.00 506.73
> S1<-A1/(n1-1)
> S1
[,1] [,2] [,3] [,4]
[1,] 4.4300000 0.4300000 0.1500000 -0.5066667
[2,] 0.4300000 3.5766667 -0.9083333 0.5033333
[3,] 0.1500000 -0.9083333 22.3333333 -20.2750000
[4,] -0.5066667 0.5033333 -20.2750000 22.2066667
> S2<-A2/(n2-1)
> S2
[,1] [,2] [,3] [,4]
[1,] 1.2803125 -0.0321875 4.5834375 -3.2784375
[2,] -0.0321875 2.0303125 -0.4165625 -0.1846875
[3,] 4.5834375 -0.4165625 35.4165625 -20.0000000
[4,] -3.2784375 -0.1846875 -20.0000000 15.8353125
> S<-((n1-1)*S1+(n2-1)*S2)/(n1+n2-2)
>S
[,1] [,2] [,3] [,4]

57
[1,] 2.6301786 0.1658929 2.6833929 -2.0905357
[2,] 0.1658929 2.6930357 -0.6273214 0.1101786
[3,] 2.6833929 -0.6273214 29.8094643 -20.1178571
[4,] -2.0905357 0.1101786 -20.1178571 18.5658929
> sinverse<-solve(S)
> sinverse
[,1] [,2] [,3] [,4]
[1,] 0.42423410 -0.03266269 -0.02421559 0.02172314
[2,] -0.03266269 0.37843027 0.02570235 0.02192724
[3,] -0.02421559 0.02570235 0.12773922 0.13553800
[4,] 0.02172314 0.02192724 0.13553800 0.20304605
> tsq<-(n1*n2*(t(x1bar-x2bar))%*%sinverse%*%(x1bar-x2bar))/(n1+n2)
> tsq
[,1]
[1,] 71.42995
> f<-qf(0.95,df1=4,df2=(n1+n2-p-1))
>f
[1] 2.546273
> tsqtab<-((p*(n1+n2-2))/(n1+n2-p-1))*f
> tsqtab
[1] 10.76161
> sigma<-((n1+n2-2)*S)/(n1+n2)
> sigma
[,1] [,2] [,3] [,4]
[1,] 2.5394828 0.1601724 2.5908621 -2.0184483
[2,] 0.1601724 2.6001724 -0.6056897 0.1063793
[3,] 2.5908621 -0.6056897 28.7815517 -19.4241379
[4,] -2.0184483 0.1063793 -19.4241379 17.9256897
> deltasq<-(t(x1bar-x2bar))%*%(solve(sigma))%*%(x1bar-x2bar)
> deltasq
[,1]
[1,] 5.20109

>

58
Date: 14/03/2018
Experiment – 12
The means of three biometrical characters and the matrix of pooled variances and covariance were
obtained for the groups of females desert locusts-one in the phase gregaria and the other in an
intermediate phase between gregaria and solitaria.
Matrix of pooled variances and covariances based on 90 d.f.
X1 X2 X3
X1 4.735 0.5622 1.4685
X2 0.1431 0.2174
X3 0.5702

Means
Phase X̅ 1 X̅ 2 X̅ 3
Gregaria (n=20) 25.8 7.81 10.77
Intermediate (n=72) 28.35 7.81 10.75

1. Obtain the best linear function which would discriminate between the two groups.
2. How would you classify and individual into either of the two groups on the basis of
observations (27.06, 8.03, 11.36)?

59
Report Page
Object:Obtain the best linear function for the above data and classify the given individual in one of
the population.

Procedure and Formula used:


a. Discriminant function
𝑥′∑−1 (𝜇1 − 𝜇2 )
where x’=(x1 x2 ...xp)
b. The region of classification into 1 if
1
𝑥′∑−1 (𝜇1 − 𝜇2 ) ≥ (𝜇1 − 𝜇2 )′ ∑−1 (𝜇1 − 𝜇2 )
2
Otherwise in 2.

Result:
1. The best linear discriminant function for the data is:
-2.72973491x1-0.02212616x2+7.07370385x3

2. The value of discriminant function = 6.312976 is greater than region of classification into
1 =2.032676. Hence the given indiviual will be classified in the first population.

60
Calculations
>S<-matrix(data = c(4.735,0.5622,1.4685,0.5622,0.1431,0.2174,1.4685,0.2174,0.5702),nrow
= 3,byrow = FALSE)
>S
[,1] [,2] [,3]
[1,] 4.7350 0.5622 1.4685
[2,] 0.5622 0.1431 0.2174
[3,] 1.4685 0.2174 0.5702
>x1bar<-matrix(data = c(25.8,7.81,10.77),nrow = 3,byrow = FALSE)
>x1bar
[,1]
[1,] 25.80
[2,] 7.81
[3,] 10.77
>x2bar<-matrix(data = c(28.35,7.81,10.75),nrow = 3,byrow = FALSE)
>x2bar
[,1]
[1,] 28.35
[2,] 7.81
[3,] 10.75
>x<-matrix(data = c(27.06,8.03,11.36),nrow = 3,byrow = FALSE)
>x
[,1]
[1,] 27.06
[2,] 8.03
[3,] 11.36
>lhs<-t(x)%*%solve(S)%*%(x1bar-x2bar)
>lhs
[,1]
[1,] 6.312976
>rhs<-((t(x1bar+x2bar))%*%solve(S)%*%(x1bar-x2bar))/2
>rhs
[,1]
[1,] 2.032676
>df<-solve(S)%*%(x1bar-x2bar)
>df
[,1]
[1,] -2.72973491
[2,] -0.02212616
[3,] 7.07370385

61
Date – 19/03/2018
Experiment – 13
LINEAR MODEL NOT OF FULL RANK
A statistician report an analysis of rubber producing plants called guayule, for which the plant
weights were available for 54 plants of three different kinds, 27 of them normal, 15 off types and 12
aberrant. A sample of six plants (3 normal, 2 off types, 1 aberrant) is taken. The following table has
shown the weights of these six plants.

TYPES OF PLANTS
NORMAL OFF TYPES ABERRANT
101 84 32
105 88
94

1. Find an appropriate linear model and establish analysis of variance table.


2. Derive estimable functions, find their estimates and test hypothesis about them.

62
Report Page
Object:Tofind an appropriate linear model and establish analysis of variance table and derive
estimable functions, find their estimates and test hypothesis about them.

Procedure and formula used:


a. For the Model not of full rank we consider,
Y = Xβ + ε
The linear model is represented as
yij = μ+βi+εij
b. For the model not of full rank, we compute generalized inverse of X’X.
c. Then, βᵒ = GX’Y
d. Now, Calculate TSS = Y’Y
e. SSR = Y’XGX’Y = Y’X βᵒ
f. SSE = TSS – SSR
g. Mean sum of square due to regression (MSR) = SSR/ r
h. Mean sum of square due to Error (MSE) = SSE/ n-r
i. The estimable function is given by μ + αᵢ for every i= 1,2,3
[μ + αᵢ ] is of order 3×1 is estimable which are linearly independent.
μ + αᵢ = q’βᵒ

Result:
1. The linear model is represented as yij = μ+βi+εij
ANOVA TABLE
Source of Variation d.f Sum of Mean sum of square F- Ratio
square
Regression 3 45816 15272 654.5143

Error 3 70 23.33333
Total 6 45886
Since, F calculated> F tabulated so we reject the null hypothesis.

2. H₀: All Aᵢ's are equal, i.e.


H₀: A₁ = A₂ = A₃ = 0
Or A₁ - A₂ = A₁ - A₃ = 0
The estimable functions are as follows
[μ + α₁]=100
[μ + α₂]=86
[μ + α₃]=32
Tabulated F-statistics is given by F₂,₃(0.5) = 9.552094
Since Fcal > Ftab therefore null hypothesis will be rejected.

63
Hence the Treatment effects are not equal.

64
Calculation
>x<-read.table("D:/amit/Exp-13.csv",sep = ",",header = TRUE)
>x<-as.matrix(x)
>x
X0 X1 X2 X3
[1,] 1 1 0 0
[2,] 1 1 0 0
[3,] 1 1 0 0
[4,] 1 0 1 0
[5,] 1 0 1 0
[6,] 1 0 0 1
>y=matrix(data = c(101,105,94,84,88,32),byrow = FALSE,nrow = 6)
>y
[,1]
[1,] 101
[2,] 105
[3,] 94
[4,] 84
[5,] 88
[6,] 32
>xtx<-t(x)%*%x
>xtx
X0 X1 X2 X3
X0 6 3 2 1
X1 3 3 0 0
X2 2 0 2 0
X3 1 0 0 1
>library(MASS)
>G<-ginv(xtx)
>G
[,1] [,2] [,3] [,4]
[1,] 0.11458333 -0.03125 0.01041667 0.1354167
[2,] -0.03125000 0.28125 -0.09375000 -0.2187500
[3,] 0.01041667 -0.09375 0.36458333 -0.2604167
[4,] 0.13541667 -0.21875 -0.26041667 0.6145833
>xty<-t(x)%*%y
>xty
[,1]
X0 504
X1 300
X2 172
X3 32
>beta0<-G%*%xty
>beta0
[,1]
[1,] 54.5
[2,] 45.5
[3,] 31.5
[4,] -22.5

65
>tss<-t(y)%*%y
>tss
[,1]
[1,] 45886
>ssr<-t(xty)%*%beta0
>ssr
[,1]
[1,] 45816
>sse<-tss-ssr
>sse
[,1]
[1,] 70
>library(Matrix)
>r<-rankMatrix(x)
>r
[1] 3
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>n<-6
>msr<-ssr/r
>msr
[,1]
[1,] 15272
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>mse<-sse/(n-r)
>mse
[,1]
[1,] 23.33333
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>F<-msr/mse
>F
[,1]
[1,] 654.5143
attr(,"method")
[1] "tolNorm2"

66
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>qt<-matrix(data = c(1,1,1,1,0,0,0,1,0,0,0,1), byrow=FALSE,nrow = 3)
>qt
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 1 0 1 0
[3,] 1 0 0 1
>qtbeta0<-qt%*%beta0
>qtbeta0
[,1]
[1,] 100
[2,] 86
[3,] 32
>f<-qf(0.95,2,3) #F tabulated with 5% CI and 2,3 d.f.
>f
[1] 9.552094

67
Date – 19/03/2018

Experiment – 14

Given the following data:

X0 X1 X2 Y

1 1 8 6

1 4 2 8

1 9 -8 1
1 11 -10 0

1 3 6 5

1 8 -6 3
1 5 0 2

1 10 -12 -4
1 2 4 10

1 7 -2 -3

1 6 -4 5

1. Find least estimate of β’s in the model.


y = β0 + β1X1 + β2X2 + ε

2. Write out the analysis of variance table.


3. Using α = 0.05, test to determine if the overall regression is suitable for the given data.
4. Calculate Var(β1), Var(β2) and Cov(β1,β2).
5. How useful is the regression using X1 alone? What does X2 contribute given that X1 is
already in the equation?

68
Report Page

Object :(1)Find least estimate of β’s in the model Y = β0 + β1X1 + β2X2 + ε


(2)Write out the analysis of variance table
(3)Using α = 0.05, test to determine if the overall regression is suitable for the given data.
(4)Calculate Var(β1), Var(β2) and Cov (β1, β2).
(5)How useful is the regression using X1 alone ? What does X2 contribute given that X1 is already in
the equation ?

Procedure and formula used :

a. The Full Rank Model is given by Y = Xβ + ε = β0X0 + β1X1 + β2X2 + ε


b. The least square estimate of β is given by
β̂ = (X’X)-1(X’Y)

c. The analysis of variance table is given as follows


Source of d.f Sum of Square Mean sum of square F ratio
Variation
Regression p-1 SSR = Y’Xβ MSR = SSR/p-1

Error n- TSS – SSR= Y’Y- Y’Xβ MSE = SSE/ n-(p+1) F = MSR/MSE


(p+1)
Total n TSS = Y’Y

d. To estimating the overall regression is suitable for the given data we have to calculate the
Tabulated value of F at α = 0.05 with degrees of freedom 4,7.
e. To Calculate Var(β1), Var(β2) and Cov (β1, β2) we have to calculate Var(β̂) first by the
given formula
Var(β̂) = σ²(X’X)-1 where σ² = SSE/n-(p+1)

f. Now Using X1 alone , Find the least square estimate of β̂ and check whether the
regression using X1 alone is suitable for the data or not.
g. The regression using X₁ alone, Consider the Model Y = β₀X₀ + β₁X₁
h. Calculate : β* = (X*'X*)ˉ¹(X*'Y)
where X* = [X₀,X₁]
i. Now calculate SSR(X₁), SSE(X₁), TSS(X₁), MSR(X₁) and MSE(X₁) using formulas –
SSR(X₁) = Y'X*(X*'X*)ˉ¹(X*'Y) = Y'X*β*

TSS(X₁) = Y'Y and SSE = TSS - SSR(X₁)

MSR(X1) = SSR(X₁)/p and MSE(X₁) = SSE(X₁)/(n-p)

j. Calculate F-Ratio and F tabulated value. If ,


 Calculated > Tabulated, reject the null hypothesis

69
 Calculated< tabulated, Accept the null hypothesis

Result
1. Least estimate of β’s in the model
β₀ 14

β̂ = β₁ = -2

β₂ -0.5

2. The analysis of variance (ANOVA) table


Source of d.f Sum of Square Mean sum of F ratio
Variation square

Regression 4 SSR = 221 MSR = 73.6667

Error 7 SSE = 68 MSE = 8.5 F = 8.6667

Total 11 TSS = 289

3. Using α = 0.05 , test whether the overall regression is suitable for the given data .
F tabulated is given by F₃,₈(0.05) = 4.066181 and F calculated has been calculated in (ii) i.e. F =8.6667.
Since Fcal > Ftab ,and the overall regression is not suitable for the given data. Therefore the null
hypothesis will be rejected.

4. Var(β₁) = 1.4362519
Var (β₂) = 0.3590630

Cov(β₁,β₂) = 0.6985407

5. F-ratio = MSR(X₁)/MSE(X₁) = 13.0937769


Fcal at α = 0.05 with d.f. 2,9 is given as F₂,₉(0.05) =4.256494729

Thus Fcal > Ftab therefore hypothesis will be rejected at α = 0.05

70
The regression using X₁ alone doesn't serve the pupose.

71
Calculation
>x<-read.table("D:/amit/Exp-14.csv",sep = ",",header = TRUE)
>x<-as.matrix(x)
>x
X0 X1 X2
[1,] 1 1 8
[2,] 1 4 2
[3,] 1 9 -8
[4,] 1 11 -10
[5,] 1 3 6
[6,] 1 8 -6
[7,] 1 5 0
[8,] 1 10 -12
[9,] 1 2 4
[10,] 1 7 -2
[11,] 1 6 -4
>y<-matrix(c(6,8,1,0,5,3,2,-4,10,-3,5))
>y
[,1]
[1,] 6
[2,] 8
[3,] 1
[4,] 0
[5,] 5
[6,] 3
[7,] 2
[8,] -4
[9,] 10
[10,] -3
[11,] 5
>Ty<-t(y)
>Ty
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 6 8 1 0 5 3 2 -4 10 -3 5
>Tx<-t(x)
>Tx
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
X0 1 1 1 1 1 1 1 1 1 1 1
X1 1 4 9 11 3 8 5 10 2 7 6
X2 8 2 -8 -10 6 -6 0 -12 4 -2 -4
>u1<-Tx%*%x
>u1
X0 X1 X2
X0 11 66 -22
X1 66 506 -346
X2 -22 -346 484
>invu1<-solve(u1)
>invu1

72
X0 X1 X2
X0 4.3704790 -0.84946237 -0.40860215
X1 -0.8494624 0.16897081 0.08218126
X2 -0.4086022 0.08218126 0.04224270
>u2<-Tx%*%y
>u2
[,1]
X0 33
X1 85
X2 142
>esbeta<-invu1%*%u2 #Least estimate of beta in the model
>esbeta
[,1]
X0 14.0
X1 -2.0
X2 -0.5
>
>u4<-t(esbeta)
>u4
X0 X1 X2
[1,] 14 -2 -0.5
>u5<-u4%*%u2
>u5
[,1]
[1,] 221
>ssr<-u5
>ssr
[,1]
[1,] 221
>p1<-Ty%*%y
>p1
[,1]
[1,] 289
>sse<-p1-u5
>sse
[,1]
[1,] 68
>msr<-ssr/3
>msr
[,1]
[1,] 73.66667
>mse<-sse/8
>mse
[,1]
[1,] 8.5
>fcal<-msr/mse
>fcal
[,1]
[1,] 8.666667

73
>ftab<-qf(0.95,df1=3,df2=8)
>ftab
[1] 4.066181
>
>varbeta<-8.5*invu1
>varbeta
X0 X1 X2
X0 37.149071 -7.2204301 -3.4731183
X1 -7.220430 1.4362519 0.6985407
X2 -3.473118 0.6985407 0.3590630
>varb1<-1.43625
>varb2<-0.359063
>covb1b2<-0.698541
>
>x1=u1[c(1,2),c(1,2)]
>x1
X0 X1
X0 11 66
X1 66 506
>x1inverse<-solve(x1)
>x1inverse
X0 X1
X0 0.41818182 -0.054545455
X1 -0.05454545 0.009090909
>a<- t(x[,c(1,2)]) %*% y
>a
[,1]
X0 33
X1 85
>beta<-x1inverse %*% a
>beta
[,1]
X0 9.163636
X1 -1.027273
>SSRX1<-t(a) %*% beta
>SSRX1
[,1]
[1,] 215.0818
>TSSX1<-t(y) %*% y
>TSSX1
[,1]
[1,] 289
>SSEX1<-TSSX1 - SSRX1
>SSEX1
[,1]
[1,] 73.91818
>MSRX1<-SSRX1/2
>MSRX1
[,1]

74
[1,] 107.5409
>MSEX1<-SSEX1/(11-2)
>MSEX1
[,1]
[1,] 8.213131
>FRatio<-MSRX1/MSEX1
>FRatio
[,1]
[1,] 13.09378
>Ftab<-qf(0.95,df1=2,df2=9)
>Ftab
[1] 4.256495

75

S-ar putea să vă placă și