Documente Academic
Documente Profesional
Documente Cultură
DEPARTMENT OF STATISTICS
SESSION 2017-18
1
Index
S.No. Experiment Date Page No. Remark
2. Test hypothesis that Mean head length and breadth length of 22/01/2018 6–9
1st Son and 2nd Son are equal at significance Level 0.01.
3. Test hypothesis at 5% level of significance and significance 31/01/2018 10 – 16
of difference in mean vector of two groups of females.
4. Test for significance of difference in the mean vectors of two 31/01/2018 17 – 20
species. Obtain Fisher’s linear discriminant function and
classify into one of the two groups. Test the adequacy of the
assigned discriminant function Are x1 and x2 alone sufficient
for discrimination?
5. 1. State the regression model Estimate the parameters of the 06/02/2018 21 – 25
model Test for the significance of the average hourly
temperature. Construct 95% CI for regression coefficient.
2. Predict the amount of coal that should be ordered. What will
be the change in the amount of coal to be ordered.
6. Estimate the parameters.Write 90% C.I and test for the 06/02/2018 26 – 29
significance for β0, β1, β2 .
7. Test the hypothesis that the samples come from two 14/02/2018 30 – 35
trivariate normal populations with the same mean.
8. Check whether centroids of three populations are identical. 14/02/2018 36 -39
9. Compare two types of coating for resistance to corrosion. 26/02/2018 40-48
10. Test the hypothesis Ho:μ1=μ2 26/02/2018 44-54
11. Test whether two grades of eggs differ with respect to their 12/03/2018 55-59
means. What is the distance between two grades of eggs?
12. Obtain the best linear function which would discriminate 14/03/2018 59-56
between two groups.How would you classify and individual
into either of the two groups on the basis of observations
13. Find an appropriate linear model and establish analysis of 19/03/2018 67-69
variance table and derive estimable functions, find their
estimates and test hypothesis about them.
14. Find least estimate of β’s in the model Write out analysis of 19/03/2018 70-75
variance table.Test to determine if overall regression is
suitable for given data.Calculate Var(β1), Var(β2) and Cov
(β1,β2).How useful is the regression using X1 alone ? What
does X2 contribute given that X1 is already in the equation ?
2
Date: 22-Jan-2018
Experiment no 1
A random sample of 50 observations was taken on X1, X2, X3 . The observation yielded the following
information. The sample collected sum of squares matrix is
X1 X2 X3
X1 19.4 9.04 9.76
X2 9.04 11.87 4.62
X3 9.76 4.62 12.3
Means are: X̅1 = 5.936
X̅2 = 2.77
X̅3 = 4.26
If μ1, μ2, μ3,denoted the population means of X1,X2,X3, then test the following hypothesis.
1 – Ho: μ1 = 10.0, μ2 = 5.0, μ3 = 10.0
2- Ho: μ1,μ2,μ3
3
Report Page
Object:- To test the hypothesis.
1- Ho:μ1 = 10.0 , μ2 = 5.0 , μ3= 10.0
2- Ho:μ1 = μ2 = μ3
Procedure / Formula used:-
1. Calculate the matrix A by the following formula;
A = given matrix – Nx̅(x̅)’
2. Now calculate the matrix S by the formula;
S = A / (N – 1)
3. Calculate the value of T2 statistics by the formula;
T2 = N(x̅ - μ)’S-1(x̅ - μ)
4. Obtained the value of T2 statistics by the formula;
T2 = [(N – 1) / (N – p)p]*F(α, p,( N –P))
Where μ = Population Mean Vector
x̅ = Sample Mean Vector.
N is the number of observations and αis the level of significance.
5. Now compare the tabulated value with calculated value of T2, then Hypothesis will be
Accepted if T2<To2 and rejected if T2>To2.
Result:-
1. T2cal = 2346.153 and T2tab= 0.9738489 → Calculated>Tabulated
Thus, the null hypothesis H0: μ1 = 10.0 μ2 = 5.0 μ3 = 10.0 will be Rejected
2
as T cal > T2tab.
4
Calculation
#1
> N<-50
> p<-3
> x<-c(19.4,9.04,9.76,9.04,11.87,4.62,9.76,4.62,12.3)
> A1<-matrix(data=x,nrow=3,ncol=3,byrow=T)
> colnames(A1)<-c("x1","x2","x3")
> rownames(A1)<-c("x1","x2","x3")
> A1
x1 x2 x3
x1 19.40 9.04 9.76
x2 9.04 11.87 4.62
x3 9.76 4.62 12.30
> x.bar<-c(5.936,2.77,4.26) #mean vector
> x.bar
[1] 5.936 2.770 4.260
> mx<-N*(x.bar%*%t(x.bar))
> mx
[,1] [,2] [,3]
[1,] 1761.805 822.136 1264.368
[2,] 822.136 383.645 590.010
[3,] 1264.368 590.010 907.380
> det(A1)
[1] 1097.704
> A<-A1-mx
> S<-A/N-1
> S
x1 x2 x3
x1 -35.84810 -17.26192 -26.09216
x2 -17.26192 -8.43550 -12.70780
x3 -26.09216 -12.70780 -18.90160
> mu<-c(10,5,10)
> S.inv<-solve(S) #to calculate inverse
> Tsq<-N*((t(x.bar)-mu)%*%S.inv%*%(x.bar-mu)) #test statistic under H0
> Tsq
[,1]
[1,] 1128.303
> F<-2.8023
> T.tab<-(p*(N-1))/(N-p)
> T.tab
[1] 3.12766
Second-
> #2
> z<-c(1,-1,0,1,0,-1)
> C1<-matrix(data=z,nrow=2,ncol=3,byrow=T)
> C1
[,1] [,2] [,3]
[1,] 1 -1 0
[2,] 1 0 -1
> Y.bar<-C1%*%x.bar
> Y.bar
[,1]
[1,] 3.166
[2,] 1.676
> S1<-(C1%*%A%*%t(C1))/(N-1)
> S1
[,1] [,2]
[1,] -9.958935 -5.307976
[2,] -5.307976 -2.617731
5
> S1.inv<-solve(S1)
> S1.inv
[,1] [,2]
[1,] 1.243698 -2.521848
[2,] -2.521848 4.731544
> T.sq<-N*(t(Y.bar)%*%S1.inv%*%Y.bar)
> T.sq
[,1]
[1,] -50.29351
6
Date: 22-Jan-2018
Experiment No. 2
Four characteristics : head length of 1st son (X1) , head breadth of 1st son (X2), head length of
2nd son (X3) , head breadth of 2nd son (X4) were measured in a sample of 25 families & the
following were obtained .
Test the hypothesis that the mean head length & breadth of 1st son and 2nd son equal at
significance level 0.01. State the assumptions, If any.
7
Report Page
Object:To Test the hypothesis that the Mean head length and breadth length of 1st Son and
2nd Son are equal at significance Level 0.01.
Procedure and formula used:
Result:
T2cal= 1373.355 and T2tab= 17.54657
→ T2ca > T2tab
Thus, the Hypothesis that the mean head length and breadth of 1st son and 2nd son are
equal will be rejected.
8
Calculation
> N<-25
> p<-4
> X<-
matrix(data=c(0.9529,0.5287,0.6966,0.4611,0.5287,0.5436,0.5131,0.3503,0.69
66,0.5131,1.0081,0.5654,0.4611,0.3505,0.5654,0.4502),nrow=4)
> X
[,1] [,2] [,3] [,4]
[1,] 0.9529 0.5287 0.6966 0.4611
[2,] 0.5287 0.5436 0.5131 0.3505
[3,] 0.6966 0.5131 1.0081 0.5654
[4,] 0.4611 0.3503 0.5654 0.4502
> Xbar<-matrix(data=c(18.57,15.11,18.38,14.92),nrow=4)
> Xbar
[,1]
[1,] 18.57
[2,] 15.11
[3,] 18.38
[4,] 14.92
> c<-matrix(data=c(1,-1,0,0,0,1,-1,0,0,0,1,-1),nrow=3)
> c
[,1] [,2] [,3] [,4]
[1,] 1 0 -1 0
[2,] -1 0 0 1
[3,] 0 1 0 -1
> ybar<-c%*%Xbar
> ybar
[,1]
[1,] 0.19
[2,] -3.65
[3,] 0.19
> A<-(N-1)*X
> A
[,1] [,2] [,3] [,4]
[1,] 22.8696 12.6888 16.7184 11.0664
[2,] 12.6888 13.0464 12.3144 8.4120
[3,] 16.7184 12.3144 24.1944 13.5696
[4,] 11.0664 8.4072 13.5696 10.8048
> S<-(c%*%A%*%(t(c))/(N-1)) # sample covariance
> S
[,1] [,2] [,3]
[1,] 0.5678 -0.3606 0.1199
[2,] -0.3606 0.4809 -0.1675
[3,] 0.1199 -0.1673 0.2930
> Sinv<-solve(S)
> Sinv
[,1] [,2] [,3]
[1,] 3.3639280 2.550834 0.08167142
[2,] 2.5501194 4.529386 1.54577749
[3,] 0.0795222 1.542393 4.26217123
> T2<- N*t(ybar)%*%Sinv%*%ybar # calculating t- sq statistic
> T2
[,1]
[1,] 1373.618
> F<-4.938193382
> Ttab<-((N-1)*(p-1)*F/(N-p-1))
> Ttab
9
[1] 17.7775
>
Experiment 3
a) Process engineer of ABC Company is interested in understanding the features of the process
which produces product XYZ. She observed the following four characteristic of the finished products
coming out of the process from one of their production plants:
b) The following table gives the estimate of the mean and common dispersion matrix of three
character, x1 ,way if living ; x2 = way of eating ; x3 = way of talking of two groups of females, one
living in villages and other in cities.
Character mean mean
Villages Cities
10
x1 72.20 76.32
x2 30.56 30.28
x3 21.44 21.64
Dispersion matrix is
x1 x2 x3
S= 18.94002.2488 5.8740
.5652 .8700
2.2848
Test for the significance in the mean vector of the two groups of females.
11
Report Page
Object:
(a) To test the hypothesis that μ= (20, 275, 300, 50) at 5% level of significance.
(b) To test the significance of the difference in mean vector of two groups of females.
Formula Used/Procedure:
a) μ= (20, 275, 300, 50) at 5% level of significance.
(𝑁 − 1)𝑝
𝑇2 = 𝐹
(𝑁 − 𝑝) (𝛼,𝑝,𝑁−𝑝)
Now compare tabulated value with calculated value of T2
If calculated>Tabulated, reject the null hypothesis
If Calculated<tabulated, Accept the null hypothesis
12
(𝑛1 + 𝑛2 − 2)𝑝
𝑇2 = 𝐹
(𝑛1 + 𝑛2 − 𝑝 − 1) (𝛼,𝑝,(𝑛1 +𝑛2 −𝑝−1))
Now compare tabulated value with calculated value of T2
If calculated>Tabulated, reject the null hypothesis
If Calculated<tabulated, Accept the null hypothesis
13
Calculation
> N<-20
> p<-4
> x<-read.table("D:/amitr.csv",sep = ",",header = T)
> m<-as.matrix(x)
> m
Units X.X1. X.X2. X.X3. X.X4.
[1,] 1 23.0 276 289.6 51.0
[2,] 2 22.0 281 289.0 51.7
[3,] 3 22.8 270 288.2 51.3
[4,] 4 22.1 278 288.0 52.3
[5,] 5 22.5 275 288.0 53.0
[6,] 6 22.2 273 288.0 51.0
[7,] 7 22.0 275 290.0 53.0
[8,] 8 22.1 268 289.0 54.0
[9,] 9 22.5 277 289.0 52.0
[10,] 10 22.5 278 289.0 52.0
[11,] 11 22.3 269 287.0 54.0
[12,] 12 21.8 274 287.6 52.0
[13,] 13 22.3 270 288.4 51.0
[14,] 14 22.2 273 290.2 51.0
[15,] 15 22.1 274 286.0 51.0
[16,] 16 22.1 277 287.0 52.0
[17,] 17 21.8 277 287.0 51.0
[18,] 18 22.6 276 290.0 51.0
[19,] 19 22.3 278 287.0 51.7
[20,] 20 23.0 266 289.1 51.0
> x1bar<-sum(m[,1])/20
> x1bar
[1] 10.5
> x2bar<-sum(m[,2])/20
> x2bar
[1] 22.31
> x3bar<-sum(m[,3])/20
> x3bar
[1] 274.25
> x4bar<-sum(m[,4])/20
> x4bar
[1] 288.355
> a11<-sum(m[,1]*m[,1])
> a11
[1] 2870
> a12<-sum(m[,1]*m[,2])
> a12
[1] 4681.6
> a13<-sum(m[,1]*m[,3])
> a13
[1] 57515
> a14<-sum(m[,1]*m[,4])
> a14
[1] 60521.4
> a22<-sum(m[,2]*m[,2])
> a22
[1] 9957.02
> a23<-sum(m[,2]*m[,3])
> a23
[1] 122362.4
> a24<-sum(m[,2]*m[,4])
> a24
[1] 128666.9
> a33<-sum(m[,3]*m[,3])
14
> a33
[1] 1504553
> a34<-sum(m[,3]*m[,4])
> a34
[1] 1581624
> a44<-sum(m[,4]*m[,4])
> a44
[1] 1662999
> m2<-matrix(data =
c(a11,a12,a13,a14,a12,a22,a23,a24,a13,a23,a33,a34,a14,a24,a34,a44),nrow
= 4,ncol = 4,byrow = T)
> m2
[,1] [,2] [,3] [,4]
[1,] 2870.0 4681.60 57515.0 60521.4
[2,] 4681.6 9957.02 122362.4 128666.9
[3,] 57515.0 122362.40 1504553.0 1581624.2
[4,] 60521.4 128666.90 1581624.2 1662998.6
> xbar<-matrix(data = c(x1bar,x2bar,x3bar,x4bar), nrow=4 byrow=F)
Error: unexpected symbol in "xbar<-matrix(data =
c(x1bar,x2bar,x3bar,x4bar), nrow=4 byrow"
> xbar
[,1]
[1,] 22.310
[2,] 274.250
[3,] 288.355
[4,] 51.850
> A<-m2-(N*(xbar%*%t(xbar)))
> A
[,1] [,2] [,3] [,4]
[1,] -7084.722 -117688.8 -71149.0 37385.93
[2,] -117688.750 -1494304.2 -1459264.8 -155730.35
[3,] -71149.001 -1459264.8 -158419.1 1282600.06
[4,] 37385.930 -155730.3 1282600.1 1609230.12
> s<-A/(N-1)
> s
[,1] [,2] [,3] [,4]
[1,] -372.8801 -6194.145 -3744.684 1967.681
[2,] -6194.1447 -78647.591 -76803.409 -8196.334
[3,] -3744.6843 -76803.409 -8337.848 67505.267
[4,] 1967.6805 -8196.334 67505.267 84696.322
> mu<-matrix(data = c(20,275,300,50), nrow = 4,byrow = F)
> mu
[,1]
[1,] 20
[2,] 275
[3,] 300
[4,] 50
> sinverse<-solve(s)
> sinverse
[,1] [,2] [,3] [,4]
[1,] 0.028698505 -0.005422065 0.003677707 -0.004122672
[2,] -0.005422065 0.044996654 -0.050419536 0.044666175
[3,] 0.003677707 -0.050419536 0.056684887 -0.050144094
[4,] -0.004122672 0.044666175 -0.050144094 0.044396277
> Tsq<-N*(t(xbar-mu)%*%sinverse%*%(xbar-mu))
> Tsq
[,1]
[1,] 179.1756
> F<-3.0069
> Ttab<-(((N-1)*p)/(N-p))*F
> Ttab
[1] 14.28277
>
Second Part –
15
>villagem<-matrix(data = c(72.20,30.56,21.44),nrow = 3,byrow = F)
>villagem
[,1]
[1,] 72.20
[2,] 30.56
[3,] 21.44
>citym<-matrix(data = c(76.32,30.28,21.64),nrow = 3,byrow = F)
>citym
[,1]
[1,] 76.32
[2,] 30.28
[3,] 21.64
>s<-matrix(data =
c(18.9400,2.2488,5.8740,2.2488,0.5652,0.8700,5.8740,0.8700,2.848),byrow =
T,nrow = 3)
>s
[,1] [,2] [,3]
[1,] 18.9400 2.2488 5.874
[2,] 2.2488 0.5652 0.870
[3,] 5.8740 0.8700 2.848
> Sinverse<-solve(s)
> Sinverse
[,1] [,2] [,3]
[1,] 0.1629929 -0.2473598 -0.2606101
[2,] -0.2473598 3.7150238 -0.6246767
[3,] -0.2606101 -0.6246767 1.0794566
> n1<-22
> n2<-62
> p<-3
> Tsquare<-(n1*n2*(t(villagem-
citym))%*%Sinverse%*%(villagem-citym))/(n1+n2)
> Tsquare
[,1]
[1,] 53.78594
> F<-2.718
> Tsqtab<-((p*(n1+n2-2))/(n1+n2-p-1))*F
> Tsqtab
[1] 8.35785
>
16
Date: 31/01/2018
Experiment-4
Population I Population II
Haltica Oleraces Haltica Carduorum
n1 19 n2 20
267.05 290.80
x1 137.37 x 2 157.20
185.95 209.25
(i) Test for the significance of difference in the mean vectors of two species.
(ii) Obtain Fisher’s linear discriminant function and classify the observation vector (270.51, 150.12,
190.74) into one of the two groups or populations.
(iii) Test the adequacy of the assigned discriminant function 3x1 – 2x2 + x3 at level =.05.
(iv) Are x1 and x2 alone sufficient for discrimination?
17
Report Page
Object: 1)Test for the significance of difference in the mean vectors of two species.
2) Obtain Fisher’s linear discriminant function and classify the observation vector (270.51,
150.12, 190.74) into one of the two groups or populations.
Result :
1. Ho: There is no significane difference in the mean vectors of two species or the two
means are equal
Since Tcal= 45.8794> Ttab = 9.115003, therefore we reject the null hypothesis.
Therefore, there is significane difference in the mean vectors of two species or the two means are not
equal.
18
Calculations
1.)
>n1<-19
>n2<-20
>p<-3
>X1_bar<-matrix(data=c(267.05,137.37,185.95),nrow=3)
>X1_bar
[,1]
[1,] 267.05
[2,] 137.37
[3,] 185.95
>X2_bar<-matrix(data=c(290.80,157.20,209.25),nrow=3)
>X2_bar
[,1]
[1,] 290.80
[2,] 157.20
[3,] 209.25
>S<-
matrix(data=c(367.79,121.88,106.24,121.88,118.31,42.06,106.24,42.06,208.07),nrow=3)
#Sample covariance
>S
[,1] [,2] [,3]
[1,] 367.79 121.88 106.24
[2,] 121.88 118.31 42.06
[3,] 106.24 42.06 208.07
>S_inv<-solve(S)
>S_inv
[,1] [,2] [,3]
[1,] 0.004509824 -0.0041236181 -0.0014691418
[2,] -0.004123618 0.0128773030 -0.0004975546
[3,] -0.001469142 -0.0004975546 0.0056567923
>T_sq<-n1*n2*(t(X1_bar-X2_bar)%*%S_inv%*%(X1_bar-X2_bar))/(n1+n2) #Hotelling
Tsq
>T_sq
[,1]
[1,] 45.8794
>F=2.874187 #5% CI, p=3,(n1+n2-p-1)=35 d.f.
>T_tab<-(p*(n1+n2-2))/((n1+n2-p-1))*F
>T_tab
[1] 9.115003
19
2.)
>x1mean<-matrix(data =c(267.05,137.37,185.95),byrow = FALSE,nrow=3 )
>x1mean
[,1]
[1,] 267.05
[2,] 137.37
[3,] 185.95
>x2mean<-matrix(data=c(290.80,157.20,209.25),byrow=FALSE,nrow = 3)
>x2mean
[,1]
[1,] 290.80
[2,] 157.20
[3,] 209.25
>xo=matrix(data = c(270.51,150.12,190.70),byrow = FALSE,nrow = 3)
>s<-matrix(data=c(367.79,121.88,106.24,121.88,118.31,42.06,106.24,42.06,208.07),byrow =
TRUE,nrow = 3)
>s
[,1] [,2] [,3]
[1,] 367.79 121.88 106.24
[2,] 121.88 118.31 42.06
[3,] 106.24 42.06 208.07
>sinverse=solve(s)
>sinverse
[,1] [,2] [,3]
[1,] 0.004509824 -0.0041236181 -0.0014691418
[2,] -0.004123618 0.0128773030 -0.0004975546
[3,] -0.001469142 -0.0004975546 0.0056567923
>b=((t(x1mean-x2mean))%*%sinverse%*%xo)-((1/2)*(t(x1mean-
x2mean))%*%sinverse%*%(x1mean+x2mean))
>b
[,1]
[1,] 0.1123424
20
Date:06- Feb-2018
Experiment 5
ABC company owns a large nine building complex in Central America and heats this complex by
using a modern coal- fuelled heating system. The company is facing problems in determining proper
amount of coal to be ordered each week to heat the complex adequately for the next week. You are
approached by the company to develop a regression model to predict the amount of coal (in tons) that
should be ordered each week to heat the complex adequately for the next week, on the basis of
following data;
b) Estimate the parameters of the model and write down the fitted model.
c) Test for the significance of the average hourly temperature in predicting the amount of coal
ordered at 5% level of significance.
e) Predict the amount of coal that should be ordered when value of average hourly temperature is 50
degree Fahrenheit.
f) What will be the change in the amount of coal to be ordered when average hourly temperature
increase by one degree Fahrenheit?
21
Report page
Result:
1. The regression model can be written as :
Y=β0+X1β1+ε
The assumptions of the regression model are as follows –
i. The regression model is linear in parameters.
ii. The mean of residuals is zero.
iii. Homoscedasticity of residuals or equal variance.
iv. No auto correlation of residuals.
v. The X variables and residuals are uncorrelated.
vi. The number of observations must be great than the number of Xs.
vii. The variability in X values is positive.
viii. The regression model is correctly specified.
ix. No perfect multicolinearity.
x. Normality of residuals.
2. The fitted model is as follows:
22
Y=Intercept + β*X
The estimates of the parameters, i.e., intercept (a) = 15.83786 and β = (-0.1279)
→ Y= 15.83786-0.1279*X
3. Here amount of coal ordered(Y) is the dependent variable and Average hourly
temperature(X) is the predictor variable. Regression model can be written as :
Y= a+ Xβ + ε
p- Value=0.0003301; as the p-value is much less than 0.05, we reject the null
hypothesis that β = 0. Hence there is a significant relationship between the variables in the
linear regression model of the data set faithful.
4. Confidence interval for β1: (2.688e-05, -2.558)
Confidence interval for β0: (-0.003, 31.676)
5. The amount of coal that should be ordered when average hourly temperature is 50 degree
Fahrenheit =9.44186
6. The change in the amount of coal to be ordered when average hourly temperature increases
by one degree Fahrenheit would be = intercept + regression coefficient
= 15.83786-0.1279
= 15.70996
23
CALCULATION
>
x<-read.table("C:/Users/dell/Desktop/D.csv",sep
= "," , header = FALSE)
> m<-as.matrix(x)
> m
V1 V2
[1,] 28.0 12.4
[2,] 28.0 11.7
[3,] 32.5 12.4
[4,] 39.0 10.8
[5,] 45.9 9.4
[6,] 57.8 9.5
[7,] 58.1 8.0
[8,] 62.5 7.5
> reg<-lm(m[,2]~m[,1])
> reg
Call:
lm(formula = m[, 2] ~ m[, 1])
Coefficients:
(Intercept) m[, 1]
15.8379 -0.1279
> summary(reg)
Call:
lm(formula = m[, 2] ~ m[, 1])
Residuals:
Min 1Q Median 3Q Max
-0.5663 -0.4432 -0.1958 0.2879 1.0560
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 15.83786 0.80177 19.754 1.09e-
06
m[, 1] -0.12792 0.01746 -7.328
0.00033
(Intercept) ***
m[, 1] ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
24
[1] 31.67602
> cibeeta0u<-(15.83786)-(19.754*0.80177)
> cibeeta0u
[1] -0.00030458
>
25
Date:- 06/02/2018
Experiment 6
Wildcats are wells drilled to find and produce oil and / or gas in an improved area or to find a new
reservoir in a field previously found to be productive of oil or gas or to extend the limit of known oil or
gas reservoir. Table gives data related to wildcat activity in Iran.
1 8 4.8 4.8
2 9 4.8 4.9
3 10 4.6 5.3
4 12 4.4 5.7
5 13 4.3 5.9
6 13 4.5 6.2
7 13 4.6 6.1
8 14 4.5 6.5
9 16 4.4 6.6
10 14 4.7 6.8
Y = βo + β1X1 + β2X2 +
26
Report Page
Result:
1. The fitted model is as follows:
Y=Intercept + β1* X1+X2*β2
Y = -0.6794 + (-14276) * X1 + 3.2635 * X2
And the estimates of the parameters i.e.-
intercept (𝛽 0,) = (-0.6794) ,𝛽 1 = (-1.427) and 𝛽 2= 3.2635.
2. 90% C.I for -
𝛽 0 lies between -0.0002908 and-1.358509.
𝛽 1 lies between -1.358509 and 0.0005943
𝛽 2 lies between -0.0003398 and 6.52734
3. p-value = 7.581e-05. As the p-value is much less than 0.05, we reject the null hypothesis
that β = 0. Hence there is a significant relationship between the variables in the linear
regression model of the data set and the regression model statistically significantly predicts
the outcome variable (i.e., it is a good fit for the data).
27
CALCULATION
>x<-read.table("C:/Users/dell/Desktop/
rajamit.CSV" ,sep = ",", header = TRUE)
> m<-as.matrix(x)
> m
Y X1 X2
[1,] 8 4.8 4.8
[2,] 9 4.8 4.9
[3,] 10 4.6 5.3
[4,] 12 4.4 5.7
[5,] 13 4.3 5.9
[6,] 13 4.5 6.2
[7,] 13 4.6 6.1
[8,] 14 4.5 6.5
[9,] 16 4.4 6.6
[10,] 14 4.7 6.8
> reg<-lm(m[,1]~m[,2]+m[,3])
> reg
Call:
lm(formula = m[, 1] ~ m[, 2] + m[, 3])
Coefficients:
(Intercept) m[, 2] m[, 3]
11.266 -3.617 2.964
> summary(reg)
Call:
lm(formula = m[, 1] ~ m[, 2] + m[, 3])
Residuals:
Min 1Q Median 3Q Max
-0.4205 -0.3158 -0.2225 0.1868 1.0871
Coefficients:
Estimate Std. Error t value
Pr(>|t|)
(Intercept) 11.2662 6.7810 1.661
0.1406
m[, 2] -3.6173 1.2521 -2.889
0.0233
m[, 3] 2.9641 0.3068 9.660 2.69e-
05
(Intercept)
m[, 2] *
m[, 3] ***
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
> cibeeta1u<-(-1.4276)+(-0.877*1.6285)
> cibeeta1u
[1] -2.855795
> cibeeta1l<-(-1.4276)-(-0.877*1.6285)
> cibeeta1l
28
[1] 0.0005945
> cibeeta0u<-(-0.6794)+(-0.077*8.8196)
> cibeeta0u
[1] -1.358509
> cibeeta0l<-(-0.6794)-(-0.077*8.8196)
> cibeeta0l
[1] -0.0002908
> cibeeta2u<-(3.2635)+(8.178*0.3991)
> cibeeta2u
[1] 6.52734
> cibeeta2l<-(3.2635)-(8.178*0.3991)
> cibeeta2l
[1] -0.0003398
>
29
Date: 12-02-2018
Experiment -7
The following measurements are taken as the two trivariate normal populations:
Sample 1 Sample 2
X1 X2 X3 Y1 Y2 Y3
4.8 3.4 1.6 5.6 2.9 3.6
5.4 3.7 1.5 6.1 2.9 3.7
4.9 3.1 1.5 6 2.2 4
4.4 2.9 1.4 5.9 3 4.2
5 3.4 1.5 5 2 3.5
5.1 3.5 1.4 7 3.2 4.7
4.9 3 1.4 6.4 3.2 4.5
4.7 3.2 1.3 6.9 3.1 4.9
4.6 3.1 1.5 5.5 2.3 4
5 3.6 1.4 6.5 2.8 4.6
5.4 3.9 1.7 5.2 2.7 3.9
4.6 3.4 1.4 6.6 2.9 4.6
4.9 2.4 3.3
6.3 3.3 4.7
5.7 2.8 4.5
Test the hypothesis that the samples come from two trivariate normal populations with the
same mean(assume the variance-covariance matrix for the trivariate normal populations are the
same).
30
Report Page
Object: To test the hypothesis that the samples come from two trivariate normal populations
with the same mean(assume the variance-covariance matrix for the trivariate normal
populations are the same).
Result:
Here we get
2
𝑇𝑐𝑎𝑙 = 760.9092
2
𝑇𝑡𝑎𝑏 = 9.873908
2 2
Since 𝑇𝑐𝑎𝑙 > 𝑇𝑡𝑎𝑏 , So we reject the null hypothesis at 95% level of significance and conclude
that samples do not come from two trivariate normal populations with the same mean, i.e.,
µ1≠µ2
31
Calculations
33
> ZX
[,1] [,2] [,3]
[1,] 289.16 197.80 86.42
[2,] 197.80 135.66 59.14
[3,] 86.42 59.14 25.94
> A1<-ZX-(n1*(x_bar%*%t(x_bar)))
> A1
X1 X2 X3
X1 1.04 0.82 0.1800000
X2 0.82 0.99 0.1800000
X3 0.18 0.18 0.1266667
> S1<-A1/(n1-1)
> S1
X1 X2 X3
X1 0.09454545 0.07454545 0.01636364
X2 0.07454545 0.09000000 0.01636364
X3 0.01636364 0.01636364 0.01151515
>
> #calculation of sum of square matrix for sample2
> b11<-sum(Z2[,1]*Z2[,1])
> b11
[1] 541.24
> b12<-sum(Z2[,1]*Z2[,2])
> b12
[1] 251.64
> b13<-sum(Z2[,1]*Z2[,3])
> b13
[1] 378.49
> b22<-sum(Z2[,2]*Z2[,2])
> b22
[1] 118.07
> b23<-sum(Z2[,2]*Z2[,3])
> b23
[1] 176.18
> b33<-sum(Z2[,3]*Z2[,3])
34
> b33
[1] 265.65
> ZY<-matrix(data = c(b11,b12,b13,b12,b22,b23,b13,b23,b33),nrow = 3,ncol = 3,byrow = T)
> ZY
[,1] [,2] [,3]
[1,] 541.24 251.64 378.49
[2,] 251.64 118.07 176.18
[3,] 378.49 176.18 265.65
> A2<-ZY-(n2*(y_bar%*%t(y_bar)))
> A2
Y1 Y2 Y3
Y1 6.029333 2.552 3.962
Y2 2.552000 2.144 1.874
Y3 3.962000 1.874 3.564
> S2<-A2/(n2-1)
> S2
Y1 Y2 Y3
Y1 0.4306667 0.1822857 0.2830000
Y2 0.1822857 0.1531429 0.1338571
Y3 0.2830000 0.1338571 0.2545714
> S_combined<-(((n1-1)*S1)+((n2-1)*S2))/(n1+n2-2)
> S_combined
X1 X2 X3
X1 0.2827733 0.13488 0.1656800
X2 0.1348800 0.12536 0.0821600
X3 0.1656800 0.08216 0.1476267
> S_inverse<-solve(S_combined)
> S_inverse
X1 X2 X3
X1 13.519734 -7.244606 -11.141163
X2 -7.244606 16.439384 -1.018606
X3 -11.141163 -1.018606 19.844359
> #Calculation of Hotelling T-square statistic
> t_sq<-((n1*n2)*(t(x_bar-y_bar)%*%S_inverse%*%(x_bar-y_bar)))/(n1+n2)
> t_sq
35
[,1]
[1,] 760.9092
> F<-3.027998384 #F(alpha=0.05,p,n1+n2-p-1)
> T_tab<-((n1+n2-2)*p*F)/(n1+n2-p-1) #Tabulated value of T_sq
> T_tab
[1] 9.873908
36
Date : 14/02/2018
Experiment No. 8
The researchers were interested in comparing three strategies for teaching reading
comprehension to fourth grade students (our “analysis units”). One strategy was to teacher students a
number of reading comprehension monitoring strategies. This approach was called “Think Aloud”
(TA). A second strategy was labeled “Directed Reading and Thinking Activity” (DRTA) which required
students to make predictions and evaluate their predictions as they read stories. The third strategy,
labeled “Directed Reading Activity” (DRA) was an instructed control condition using a common
approach to teaching reading comprehension. Following the intervention period, measures on three
outcome variables were obtained. The first variable was “Error Detection Task” (EDT, Y1) where
students were asked to identify inconsistencies (errors) in a story passage. The second variable was
measured via the “Degrees of Reading Power” (DRP, Y2), a standardized test of reading
comprehension. The third variable was based on a comprehension monitoring questionnaire that
asked students questions on the strategies they used while reading to increase their comprehension.
Scores on the Error Detection Task (Y1) and Degrees of Reading Power (Y2) for the Think Aloud (TA),
Directed Reading Activity (DRA) and Directed Reading and Think Aloud (DRTA) Groups are given below
: Y1 Y2
5 34
TA DRTA 9 36
Y1 Y2 DRA Y1 Y2 5 42
4 43 6 27 7 37
4 34 6 36 4 44
4 45 5 51 9 49
3 39 5 51 3 38
8 40 0 50 4 38
1 27 6 55 2 38
7 46 6 52 5 50
7 39 11 48 7 31
9 31 6 53 8 49
6 39 8 45 10 54
4 40 8 47 9 52
12 52 3 51 12 50
14 53 7 30 5 35
12 53 7 50 8 36
7 41 6 55 12 46
5 41 9 48 4 42
9 46 7 52 8 47
13 52 6 46 6 39
11 55 7 36 5 38
5 36 6 45
11 50 6 49
15 54 6 49
37
Whether the centeroids for the three populations are identical.
38
Report Page
Object-
Procedure and formula used-
1. Here , the hypothesis to be tested is –
HO: 𝜇 1 = 𝜇 2 = 𝜇 3
Where 𝜇 1 is the population mean vector of using 1st strategy (TA) , 𝜇 2 is the
population mean vector of using 2nd strategy (DRTA) and 𝜇 3 population mean vector of using
3rd strategy (DRA) for teaching reading comprehension to fourth-grade students.
2. Since here we have more than two population vector therefore for testing our null
hypothesis we use Multivariate analysis.
3. First we have to calculate an error sum of squares and cross products matrix E and a
matrix for the hypothesis H are needed by using the formula as follows
E= A+B+C
Where A= ∑Xi2 – N1X̅X̅’ B= ∑Yi2 – N2Y̅Y̅’ and C= ∑Zi2 – N3Z̅Z̅’
4. Then calculate the grand mean centroid by using the formula,
m̅ = (X̅+Y̅+Z̅) /3
then separate the group centroids from the grand mean centroids.
5. Then SSCP for each group is calculated as,
̅ )′here y̅j is used for general term.
̅ )(𝒚̅𝒋 − 𝒎
𝒏𝒋(𝒚̅𝒋 − 𝒎
6. H is calculated by summing each SSCP as
H = ∑ 𝑺𝑺𝑪𝑷𝒋
Here j= 1,2,3
7. Then by using Wilk’s criterian we test our hypothesis by using the formula
𝑰𝑬𝑰
λ=
𝑰𝑯+𝑬𝑰
Result-
From all criterion we conclude that calculated value is geater than tabulated value so
we reject the null hypothesis i.e centroids of the populations are not same.
39
Calculation
> X<-read.table("C:/Users/hp/Desktop/Book8(1).csv",header=TRUE,sep=",")
#sample group of TA
>X
X1 X2
1 4 43
2 4 34
3 4 45
4 3 39
5 8 40
6 1 27
7 7 46
8 7 39
9 9 31
10 6 39
11 4 40
12 12 52
13 14 53
14 12 53
15 7 41
16 5 41
17 9 46
18 13 52
19 11 55
20 5 36
21 11 50
22 15 54
>
> Y<-read.table("C:/Users/hp/Desktop/Book8(2).csv",header=TRUE,sep=",")
#sample group of DRTA
>Y
Y1 Y2
1 6 27
2 6 36
3 5 51
4 5 51
5 0 50
6 6 55
7 6 52
8 11 48
9 6 53
10 8 45
11 8 47
12 3 51
13 7 30
14 7 50
15 6 55
16 9 48
17 7 52
40
18 6 46
19 7 36
20 6 45
21 6 49
22 6 49
>
> Z<-read.table("C:/Users/hp/Desktop/Book8(3).csv",header=TRUE,sep=",")
#sample group of DRA
>Z
Z1 Z2
1 5 34
2 9 36
3 5 42
4 7 37
5 4 44
6 9 49
7 3 38
8 4 38
9 2 38
10 5 50
11 7 31
12 8 49
13 10 54
14 9 52
15 12 50
16 5 35
17 8 36
18 12 46
19 4 42
20 8 47
21 6 39
22 5 38
>
> N1<-22 #size of sample 1
> N1
[1] 22
> N2<-22 #size of sample 2
> N2
[1] 22
> N3<-22 #size of sample 3
> N3
[1] 22
> p<-2 #no. of variables
>p
[1] 2
>
>X.bar<-colMeans(X,na.rm=FALSE,dim=1) #sample mean vector for TA group
>X.bar
X1 X2
7.772727 43.454545
41
>Y.bar<-colMeans(Y,na.rm=FALSE,dim=1) #sample mean vector for DRTA group
>Y.bar
Y1 Y2
6.227273 46.636364
>Z.bar<-colMeans(Z,na.rm=FALSE,dim=1) #sample mean vector for DRA group
>Z.bar
Z1 Z2
6.681818 42.045455
>
> a11<-sum(X[,1]*X[,1])
> a11
[1] 1653
> a12<-sum(X[,1]*X[,2])
> a12
[1] 7949
> a22<-sum(X[,2]*X[,2])
> a22
[1] 42840
> x<-matrix(data=c(a11,a12,a12,a22),nrow=2,ncol=2,byrow=T)
>x
[,1] [,2]
[1,] 1653 7949
[2,] 7949 42840
>
> A<-x-(N1*(X.bar%*%t(X.bar))) #variance covariance matrix for TA
>A
X1 X2
[1,] 323.8636 518.2727
[2,] 518.2727 1297.4545
>
> b11<-sum(Y[,1]*Y[,1])
> b11
[1] 945
> b12<-sum(Y[,1]*Y[,2])
> b12
[1] 6337
> b22<-sum(Y[,2]*Y[,2])
> b22
[1] 49076
> y<-matrix(data=c(b11,b12,b12,b22),nrow=2,ncol=2,byrow=T)
>y
[,1] [,2]
[1,] 945 6337
[2,] 6337 49076
>
> B<-y-(N2*(Y.bar%*%t(Y.bar))) #variance covariance matrix for DRTA
>B
Y1 Y2
[1,] 91.86364 -52.18182
[2,] -52.18182 1227.09091
42
>
> c11<-sum(Z[,1]*Z[,1])
> c11
[1] 1143
> c12<-sum(Z[,1]*Z[,2])
> c12
[1] 6372
> c22<-sum(Z[,2]*Z[,2])
> c22
[1] 39811
> z<-matrix(data=c(c11,c12,c12,c22),nrow=2,ncol=2,byrow=T)
>z
[,1] [,2]
[1,] 1143 6372
[2,] 6372 39811
>
> C<-z-(N3*(Z.bar%*%t(Z.bar))) #variance covariance matrix for DRA
>C
Z1 Z2
[1,] 160.7727 191.3182
[2,] 191.3182 918.9545
>
> #the error matrix is given by
> E<-A+B+C
>E
X1 X2
[1,] 576.5000 657.4091
[2,] 657.4091 3443.5000
>
> #the grand mean centroid is given as
>m.bar<-(X.bar+Y.bar+Z.bar)/3
> m.bar
X1 X2
6.893939 44.045455
>
> #SSCP for each group is computed as
> SSCP1<-N1*((X.bar-m.bar)%*%t(X.bar-m.bar))
> SSCP1
X1 X2
[1,] 16.98990 -11.424242
[2,] -11.42424 7.681818
> SSCP2<-N2*((Y.bar-m.bar)%*%t(Y.bar-m.bar))
> SSCP2
Y1 Y2
[1,] 9.777778 -38.0000
[2,] -38.000000 147.6818
> SSCP3<-N3*((Z.bar-m.bar)%*%t(Z.bar-m.bar))
> SSCP3
43
Z1 Z2
[1,] 0.989899 9.333333
[2,] 9.333333 88.000000
>
> H<-SSCP1+SSCP2+SSCP3
>H
X1 X2
[1,] 27.75758 -40.09091
[2,] -40.09091 243.36364
> #by Wilk's lambda criterian,the calculated value is given as
> lambda<-(det(E)/det(H+E))
> lambda
[1] 0.8409394
>
44
Date: 26/02/2018
Experiment – 9
To compare two types of coating for resistance to corrosion, 15 pieces of pipe were coated with
each type of coating (Kramer and Jensen 1969b). Two pipes, one with each type of coating, were
buried together and left for the same length of time at 15 different locations, providing a natural
pairing of the observation. Corrosion for the first type of coating was measured by two variables:
X1 = Maximum depth of pit in thousandths of an inch
X2 = Number of pits
with Y1 and Y2 defined analogously for the second coating.
45
Report Page
Object:Compare two types of coating for resistance to corrosion.
Result:
Ho: There is no significant difference in the two coatings.
Since the calculated value = 10.718 > tabulated = 1.766, we reject the null hypothesis and conclude
that there is a significant difference in the two coating sheets.
46
Calculations
>data<-read.table("D:/amit/Exp-9.csv",sep=",",header=TRUE)
>z<-as.matrix(data)
>z
Depth.X1 Number.X2 Depth.Y1 Number.Y2
[1,] 73 31 51 35
[2,] 43 19 41 14
[3,] 47 22 43 19
[4,] 53 26 41 29
[5,] 58 36 47 34
[6,] 47 30 32 26
[7,] 52 29 24 19
[8,] 38 36 43 37
[9,] 61 34 53 24
[10,] 56 33 52 27
[11,] 56 19 57 14
[12,] 34 19 44 19
[13,] 55 26 57 30
[14,] 65 16 40 7
[15,] 75 18 68 13
>n<-15
>p<-2
>d1<-z[,1]-z[,3]
>d2<-z[,2]-z[,4]
>n<-length(d1)
>d<-matrix(data = c(d1,d2),byrow = FALSE,nrow = n)
>d
[,1] [,2]
[1,] 22 -4
[2,] 2 5
[3,] 4 3
[4,] 12 -3
[5,] 11 2
[6,] 15 4
[7,] 28 10
[8,] -5 -1
[9,] 8 10
[10,] 4 6
[11,] -1 5
[12,] -10 0
[13,] -2 -4
[14,] 25 9
[15,] 7 5
47
>mean_d1<-mean(d1) #Mean of d1
>mean_d1
[1] 8
>mean_d2<-mean(d2) #Mean of d2
>mean_d2
[1] 3.133333
>mean_d<-matrix(data = c(mean_d1,mean_d2),byrow = FALSE,nrow = 2)
>mean_d
[,1]
[1,] 8.000000
[2,] 3.133333
>cv<-cov(d) #Variance Covariance Matrix
>cv
[,1] [,2]
[1,] 121.57143 18.28571
[2,] 18.28571 22.55238
>cvinverse<-solve(cv) #Inverse of covariance matrix
>cvinverse
[,1] [,2]
[1,] 0.009368105 -0.007595761
[2,] -0.007595761 0.050499941
>tsq<-(n*t(mean_d)%*%cvinverse%*%(mean_d)) #Calculated T sq
>tsq
[,1]
[1,] 10.71833
>f<-qf(0.95,df1=p,df2=(n-p)) #F Value
>f
[1] 3.805565
>tsqtab<-((n-p)/(p*(n-1)))*f #Tabulated T sq
>tsqtab
[1] 1.76687
48
Date: 26/02/018
Experiment No. 10
Twenty engineer apprentices and20 pilots were given six tests (Travers 1939). The variable
were
y1= dynamometer,y2 = dotting,y3 =sensory motor coordination,y4= perseveration
49
Report Page
Object: To test the hypothesis that the samples come from two trivariate normal populations with
the same mean.
Procedure/Formula Used:
y1i
a. Calculate the mean vector for each group i.e. y.i = (y2i )
y3i
b. Find a matrix M by following way
∑x12 ∑x1x2∑x1x3…. ∑x1xp
∑x1x2 ∑x22 ∑x2x3…. ∑x2xp
:
:
∑x1xp∑x2xp∑xpx3…. ∑xp2px p
Ho: The samples come from the population with same mean.
Now, calculated value = 39.49974 and tabulated value = 11.47151
Since calculated value is more than tabulated value thus we reject Ho and conclude that the samples
come from population with different means.
50
Calculation
> data<-read.table("E:/amitwar/Exp-10.csv",sep=",",header=TRUE)
> m<-as.matrix(data)
>m
> n1<-20
> n2<-20
> p<-4
> cv<-cov(m)
51
> cv
> s1<-cv[c(1,2,3,4),c(1,2,3,4)]
> s1
Y1 Y2 Y3 Y4
> s2<-cv[c(5,6,7,8),c(5,6,7,8)]
> s2
> s<-((n1-1)*s1+(n2-1)*s2)/(n1+n2-2)
>s
52
Y1 Y2 Y3 Y4
>
> yebar
[,1]
[1,] 76.20
[2,] 192.75
[3,] 53.65
[4,] 239.80
> yqbar
[,1]
[1,] 87.40
[2,] 236.60
[3,] 44.25
[4,] 280.20
> sinverse<-solve(s)
> sinverse
Y1 Y2 Y3 Y4
> tsq<-(n1*n2*(t(yebar-yqbar))%*%sinverse%*%(yebar-yqbar))/(n1+n2)
53
> tsq
[,1]
[1,] 39.49975
> f<-qf(0.95,df1=p,df2=(n1+n2-p-1))
>f
[1] 2.641465
> tsqtab<-((p*(n1+n2-2))/(n1+n2-p-1))*f
> tsqtab
[1] 11.47151
>
54
Date: 12/03/2018
Experiment No– 11
Eggs are usually classified into two grades A and B by visual inspection. In order to examine if these
grades differ in respect of four important characters. Yolk shadow (x1), yolk colour (x2), albumn index
(x3) and albumn height (x4).
25 eggs of grade A and 33 eggs of grade B were observed for these characters. The following table
gives the same values and corrected sum of squares and products –
Test whether the two grades of eggs differ with respect to their means. What is the distance
between two grades of eggs.
55
Report Page
Object:Test whether the two grades of eggs differ with respect to their means. What is the distance
between two grades of eggs.
Procedure/Formula Used:
Result
Ho: The two grades of eggs differ with respect to their means.
Now, calculated value = 71.42995 and tabulated value = 10.76161
Since calculated value is more than tabulated value thus we reject Ho and conclude that the two
grades of eggs do not differ with respect to their means.
The distance between two grades of eggs is 5.20109
56
Calculation
> n1<-25
> n2<-33
> p<-4
> x1bar<-matrix(data=c(7.16,13.92,21.6,26.04),nrow=4,byrow=FALSE)
> x1bar
[,1]
[1,] 7.16
[2,] 13.92
[3,] 21.60
[4,] 26.04
> x2bar<-matrix(data=c(10.3,15.3,28.33,20.09),nrow=4,byrow=FALSE)
> x2bar
[,1]
[1,] 10.30
[2,] 15.30
[3,] 28.33
[4,] 20.09
> A1<-matrix(data = c(106.32,10.32,3.6,-12.16,10.32,85.84,-21.8,12.08,3.6,-21.8,536,-486.6,-12.16,12.08,-
486.6,532.96),nrow = 4,byrow = FALSE)
> A1
[,1] [,2] [,3] [,4]
[1,] 106.32 10.32 3.6 -12.16
[2,] 10.32 85.84 -21.8 12.08
[3,] 3.60 -21.80 536.0 -486.60
[4,] -12.16 12.08 -486.6 532.96
> A2<-matrix(data = c(40.97,-1.03,146.67,-104.91,-1.03,64.97,-13.33,-5.91,146.67,-13.33,1133.33,-640,-104.91,-5.91,-
640,506.73),nrow = 4,byrow = FALSE)
> A2
[,1] [,2] [,3] [,4]
[1,] 40.97 -1.03 146.67 -104.91
[2,] -1.03 64.97 -13.33 -5.91
[3,] 146.67 -13.33 1133.33 -640.00
[4,] -104.91 -5.91 -640.00 506.73
> S1<-A1/(n1-1)
> S1
[,1] [,2] [,3] [,4]
[1,] 4.4300000 0.4300000 0.1500000 -0.5066667
[2,] 0.4300000 3.5766667 -0.9083333 0.5033333
[3,] 0.1500000 -0.9083333 22.3333333 -20.2750000
[4,] -0.5066667 0.5033333 -20.2750000 22.2066667
> S2<-A2/(n2-1)
> S2
[,1] [,2] [,3] [,4]
[1,] 1.2803125 -0.0321875 4.5834375 -3.2784375
[2,] -0.0321875 2.0303125 -0.4165625 -0.1846875
[3,] 4.5834375 -0.4165625 35.4165625 -20.0000000
[4,] -3.2784375 -0.1846875 -20.0000000 15.8353125
> S<-((n1-1)*S1+(n2-1)*S2)/(n1+n2-2)
>S
[,1] [,2] [,3] [,4]
57
[1,] 2.6301786 0.1658929 2.6833929 -2.0905357
[2,] 0.1658929 2.6930357 -0.6273214 0.1101786
[3,] 2.6833929 -0.6273214 29.8094643 -20.1178571
[4,] -2.0905357 0.1101786 -20.1178571 18.5658929
> sinverse<-solve(S)
> sinverse
[,1] [,2] [,3] [,4]
[1,] 0.42423410 -0.03266269 -0.02421559 0.02172314
[2,] -0.03266269 0.37843027 0.02570235 0.02192724
[3,] -0.02421559 0.02570235 0.12773922 0.13553800
[4,] 0.02172314 0.02192724 0.13553800 0.20304605
> tsq<-(n1*n2*(t(x1bar-x2bar))%*%sinverse%*%(x1bar-x2bar))/(n1+n2)
> tsq
[,1]
[1,] 71.42995
> f<-qf(0.95,df1=4,df2=(n1+n2-p-1))
>f
[1] 2.546273
> tsqtab<-((p*(n1+n2-2))/(n1+n2-p-1))*f
> tsqtab
[1] 10.76161
> sigma<-((n1+n2-2)*S)/(n1+n2)
> sigma
[,1] [,2] [,3] [,4]
[1,] 2.5394828 0.1601724 2.5908621 -2.0184483
[2,] 0.1601724 2.6001724 -0.6056897 0.1063793
[3,] 2.5908621 -0.6056897 28.7815517 -19.4241379
[4,] -2.0184483 0.1063793 -19.4241379 17.9256897
> deltasq<-(t(x1bar-x2bar))%*%(solve(sigma))%*%(x1bar-x2bar)
> deltasq
[,1]
[1,] 5.20109
>
58
Date: 14/03/2018
Experiment – 12
The means of three biometrical characters and the matrix of pooled variances and covariance were
obtained for the groups of females desert locusts-one in the phase gregaria and the other in an
intermediate phase between gregaria and solitaria.
Matrix of pooled variances and covariances based on 90 d.f.
X1 X2 X3
X1 4.735 0.5622 1.4685
X2 0.1431 0.2174
X3 0.5702
Means
Phase X̅ 1 X̅ 2 X̅ 3
Gregaria (n=20) 25.8 7.81 10.77
Intermediate (n=72) 28.35 7.81 10.75
1. Obtain the best linear function which would discriminate between the two groups.
2. How would you classify and individual into either of the two groups on the basis of
observations (27.06, 8.03, 11.36)?
59
Report Page
Object:Obtain the best linear function for the above data and classify the given individual in one of
the population.
Result:
1. The best linear discriminant function for the data is:
-2.72973491x1-0.02212616x2+7.07370385x3
2. The value of discriminant function = 6.312976 is greater than region of classification into
1 =2.032676. Hence the given indiviual will be classified in the first population.
60
Calculations
>S<-matrix(data = c(4.735,0.5622,1.4685,0.5622,0.1431,0.2174,1.4685,0.2174,0.5702),nrow
= 3,byrow = FALSE)
>S
[,1] [,2] [,3]
[1,] 4.7350 0.5622 1.4685
[2,] 0.5622 0.1431 0.2174
[3,] 1.4685 0.2174 0.5702
>x1bar<-matrix(data = c(25.8,7.81,10.77),nrow = 3,byrow = FALSE)
>x1bar
[,1]
[1,] 25.80
[2,] 7.81
[3,] 10.77
>x2bar<-matrix(data = c(28.35,7.81,10.75),nrow = 3,byrow = FALSE)
>x2bar
[,1]
[1,] 28.35
[2,] 7.81
[3,] 10.75
>x<-matrix(data = c(27.06,8.03,11.36),nrow = 3,byrow = FALSE)
>x
[,1]
[1,] 27.06
[2,] 8.03
[3,] 11.36
>lhs<-t(x)%*%solve(S)%*%(x1bar-x2bar)
>lhs
[,1]
[1,] 6.312976
>rhs<-((t(x1bar+x2bar))%*%solve(S)%*%(x1bar-x2bar))/2
>rhs
[,1]
[1,] 2.032676
>df<-solve(S)%*%(x1bar-x2bar)
>df
[,1]
[1,] -2.72973491
[2,] -0.02212616
[3,] 7.07370385
61
Date – 19/03/2018
Experiment – 13
LINEAR MODEL NOT OF FULL RANK
A statistician report an analysis of rubber producing plants called guayule, for which the plant
weights were available for 54 plants of three different kinds, 27 of them normal, 15 off types and 12
aberrant. A sample of six plants (3 normal, 2 off types, 1 aberrant) is taken. The following table has
shown the weights of these six plants.
TYPES OF PLANTS
NORMAL OFF TYPES ABERRANT
101 84 32
105 88
94
62
Report Page
Object:Tofind an appropriate linear model and establish analysis of variance table and derive
estimable functions, find their estimates and test hypothesis about them.
Result:
1. The linear model is represented as yij = μ+βi+εij
ANOVA TABLE
Source of Variation d.f Sum of Mean sum of square F- Ratio
square
Regression 3 45816 15272 654.5143
Error 3 70 23.33333
Total 6 45886
Since, F calculated> F tabulated so we reject the null hypothesis.
63
Hence the Treatment effects are not equal.
64
Calculation
>x<-read.table("D:/amit/Exp-13.csv",sep = ",",header = TRUE)
>x<-as.matrix(x)
>x
X0 X1 X2 X3
[1,] 1 1 0 0
[2,] 1 1 0 0
[3,] 1 1 0 0
[4,] 1 0 1 0
[5,] 1 0 1 0
[6,] 1 0 0 1
>y=matrix(data = c(101,105,94,84,88,32),byrow = FALSE,nrow = 6)
>y
[,1]
[1,] 101
[2,] 105
[3,] 94
[4,] 84
[5,] 88
[6,] 32
>xtx<-t(x)%*%x
>xtx
X0 X1 X2 X3
X0 6 3 2 1
X1 3 3 0 0
X2 2 0 2 0
X3 1 0 0 1
>library(MASS)
>G<-ginv(xtx)
>G
[,1] [,2] [,3] [,4]
[1,] 0.11458333 -0.03125 0.01041667 0.1354167
[2,] -0.03125000 0.28125 -0.09375000 -0.2187500
[3,] 0.01041667 -0.09375 0.36458333 -0.2604167
[4,] 0.13541667 -0.21875 -0.26041667 0.6145833
>xty<-t(x)%*%y
>xty
[,1]
X0 504
X1 300
X2 172
X3 32
>beta0<-G%*%xty
>beta0
[,1]
[1,] 54.5
[2,] 45.5
[3,] 31.5
[4,] -22.5
65
>tss<-t(y)%*%y
>tss
[,1]
[1,] 45886
>ssr<-t(xty)%*%beta0
>ssr
[,1]
[1,] 45816
>sse<-tss-ssr
>sse
[,1]
[1,] 70
>library(Matrix)
>r<-rankMatrix(x)
>r
[1] 3
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>n<-6
>msr<-ssr/r
>msr
[,1]
[1,] 15272
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>mse<-sse/(n-r)
>mse
[,1]
[1,] 23.33333
attr(,"method")
[1] "tolNorm2"
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>F<-msr/mse
>F
[,1]
[1,] 654.5143
attr(,"method")
[1] "tolNorm2"
66
attr(,"useGrad")
[1] FALSE
attr(,"tol")
[1] 1.332268e-15
>qt<-matrix(data = c(1,1,1,1,0,0,0,1,0,0,0,1), byrow=FALSE,nrow = 3)
>qt
[,1] [,2] [,3] [,4]
[1,] 1 1 0 0
[2,] 1 0 1 0
[3,] 1 0 0 1
>qtbeta0<-qt%*%beta0
>qtbeta0
[,1]
[1,] 100
[2,] 86
[3,] 32
>f<-qf(0.95,2,3) #F tabulated with 5% CI and 2,3 d.f.
>f
[1] 9.552094
67
Date – 19/03/2018
Experiment – 14
X0 X1 X2 Y
1 1 8 6
1 4 2 8
1 9 -8 1
1 11 -10 0
1 3 6 5
1 8 -6 3
1 5 0 2
1 10 -12 -4
1 2 4 10
1 7 -2 -3
1 6 -4 5
68
Report Page
d. To estimating the overall regression is suitable for the given data we have to calculate the
Tabulated value of F at α = 0.05 with degrees of freedom 4,7.
e. To Calculate Var(β1), Var(β2) and Cov (β1, β2) we have to calculate Var(β̂) first by the
given formula
Var(β̂) = σ²(X’X)-1 where σ² = SSE/n-(p+1)
f. Now Using X1 alone , Find the least square estimate of β̂ and check whether the
regression using X1 alone is suitable for the data or not.
g. The regression using X₁ alone, Consider the Model Y = β₀X₀ + β₁X₁
h. Calculate : β* = (X*'X*)ˉ¹(X*'Y)
where X* = [X₀,X₁]
i. Now calculate SSR(X₁), SSE(X₁), TSS(X₁), MSR(X₁) and MSE(X₁) using formulas –
SSR(X₁) = Y'X*(X*'X*)ˉ¹(X*'Y) = Y'X*β*
69
Calculated< tabulated, Accept the null hypothesis
Result
1. Least estimate of β’s in the model
β₀ 14
β̂ = β₁ = -2
β₂ -0.5
3. Using α = 0.05 , test whether the overall regression is suitable for the given data .
F tabulated is given by F₃,₈(0.05) = 4.066181 and F calculated has been calculated in (ii) i.e. F =8.6667.
Since Fcal > Ftab ,and the overall regression is not suitable for the given data. Therefore the null
hypothesis will be rejected.
4. Var(β₁) = 1.4362519
Var (β₂) = 0.3590630
Cov(β₁,β₂) = 0.6985407
70
The regression using X₁ alone doesn't serve the pupose.
71
Calculation
>x<-read.table("D:/amit/Exp-14.csv",sep = ",",header = TRUE)
>x<-as.matrix(x)
>x
X0 X1 X2
[1,] 1 1 8
[2,] 1 4 2
[3,] 1 9 -8
[4,] 1 11 -10
[5,] 1 3 6
[6,] 1 8 -6
[7,] 1 5 0
[8,] 1 10 -12
[9,] 1 2 4
[10,] 1 7 -2
[11,] 1 6 -4
>y<-matrix(c(6,8,1,0,5,3,2,-4,10,-3,5))
>y
[,1]
[1,] 6
[2,] 8
[3,] 1
[4,] 0
[5,] 5
[6,] 3
[7,] 2
[8,] -4
[9,] 10
[10,] -3
[11,] 5
>Ty<-t(y)
>Ty
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,] 6 8 1 0 5 3 2 -4 10 -3 5
>Tx<-t(x)
>Tx
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
X0 1 1 1 1 1 1 1 1 1 1 1
X1 1 4 9 11 3 8 5 10 2 7 6
X2 8 2 -8 -10 6 -6 0 -12 4 -2 -4
>u1<-Tx%*%x
>u1
X0 X1 X2
X0 11 66 -22
X1 66 506 -346
X2 -22 -346 484
>invu1<-solve(u1)
>invu1
72
X0 X1 X2
X0 4.3704790 -0.84946237 -0.40860215
X1 -0.8494624 0.16897081 0.08218126
X2 -0.4086022 0.08218126 0.04224270
>u2<-Tx%*%y
>u2
[,1]
X0 33
X1 85
X2 142
>esbeta<-invu1%*%u2 #Least estimate of beta in the model
>esbeta
[,1]
X0 14.0
X1 -2.0
X2 -0.5
>
>u4<-t(esbeta)
>u4
X0 X1 X2
[1,] 14 -2 -0.5
>u5<-u4%*%u2
>u5
[,1]
[1,] 221
>ssr<-u5
>ssr
[,1]
[1,] 221
>p1<-Ty%*%y
>p1
[,1]
[1,] 289
>sse<-p1-u5
>sse
[,1]
[1,] 68
>msr<-ssr/3
>msr
[,1]
[1,] 73.66667
>mse<-sse/8
>mse
[,1]
[1,] 8.5
>fcal<-msr/mse
>fcal
[,1]
[1,] 8.666667
73
>ftab<-qf(0.95,df1=3,df2=8)
>ftab
[1] 4.066181
>
>varbeta<-8.5*invu1
>varbeta
X0 X1 X2
X0 37.149071 -7.2204301 -3.4731183
X1 -7.220430 1.4362519 0.6985407
X2 -3.473118 0.6985407 0.3590630
>varb1<-1.43625
>varb2<-0.359063
>covb1b2<-0.698541
>
>x1=u1[c(1,2),c(1,2)]
>x1
X0 X1
X0 11 66
X1 66 506
>x1inverse<-solve(x1)
>x1inverse
X0 X1
X0 0.41818182 -0.054545455
X1 -0.05454545 0.009090909
>a<- t(x[,c(1,2)]) %*% y
>a
[,1]
X0 33
X1 85
>beta<-x1inverse %*% a
>beta
[,1]
X0 9.163636
X1 -1.027273
>SSRX1<-t(a) %*% beta
>SSRX1
[,1]
[1,] 215.0818
>TSSX1<-t(y) %*% y
>TSSX1
[,1]
[1,] 289
>SSEX1<-TSSX1 - SSRX1
>SSEX1
[,1]
[1,] 73.91818
>MSRX1<-SSRX1/2
>MSRX1
[,1]
74
[1,] 107.5409
>MSEX1<-SSEX1/(11-2)
>MSEX1
[,1]
[1,] 8.213131
>FRatio<-MSRX1/MSEX1
>FRatio
[,1]
[1,] 13.09378
>Ftab<-qf(0.95,df1=2,df2=9)
>Ftab
[1] 4.256495
75