Documente Academic
Documente Profesional
Documente Cultură
i
i
ere ere
12
= 0.01960
The sample standard deviation, Sere = 01960 . 0
= 0.14
Here, we can say that the variance and standard deviation shows that there is a
little variability in the data.
13
***** 3 *****
Graphical Representation of the Data (Sample) on XY-plot and on Box-plot
Graphical presentation of the data using XY and Box plot is used to visualize the
data.
Box plot is used to visualize the available data regarding the symmetry of relative
error, through quartiles and percentiles. This also shows outliers/extreme outliers
as well.
The features of the data can be seen are central tendency, spread, departure
and from symmetry.
q
1
= 1
st
quartile = (n+1)/4 = (45+1)/4 = 11.5
th
value.
Hence, 1
st
quartile relative error = 0.125
q
2
= 2
nd
quartile = 23rd value, as sample size = 45
Hence, 2
nd
quartile relative error = 0.22
q
3
= 3
rd
quartile = 3(n+1)/4 = 3(45+1)/4 = 34.5
th
value
Hence, 3
rd
quartile relative error = 0.38
Inter-quartile range (IQR) = q3 q1 = 0.38 0.125 = 0.255
IQR is a measure of variability and also is less sensitive to extreme values.
1.5(IQR) = 1.5(34.5 11.5) = 23
14
3 IQR 3 IQR
1.5 IQR 1.5 IQR
Min. Relative error max. Relative error
0.02 0.03 q1= q2= q3=
0.125 0.22 0.38
Box Plot
Here we can say that there are no outliers and hence no extreme outliers and
hence no significant variability in the collected data.
15
***** 4 *****
Use of Central Limit Theorem
For the Point Estimation and confidence interval of mean of Relative
Error.
This theorem is used to do calculate the inference of population mean. We are
interested in the mean relative error of the whole population.
From the available 1000 values / data, 31 random samples each having
40 values of relative error are drawn
The mean of each sample is computed
The mean of these mean values is the point estimate for the true relative
error mean ( ere )
The confidence interval of 95% is considered.
This is for the true mean ere
05 . 0 =
Number of samples = n
1
= 31
The mean computed = ere = 0.25
And Standard deviation = S
ere
= 0.03
Hence, the confidence interval for population mean is,
n
S
Z ere
n
S
Z ere
ere
n ere
ere
n 025 . 0 ) 1 ( 025 . 0 ) 1 (
+
31
03 . 0
042 . 2 25 . 0
31
03 . 0
042 . 2 25 . 0 +
ere
16
0.25 0.0513
ere
0.25 + 0.0513
0.240
ere
0.2713
Therefore, we can say with 95% confidence that the mean of relative error of
population will fall between 0.240 and 0.2713.
ere
Or, we can say that, if we test 100 values of
ere
out of them 95 values will fall
between interval 0.24 and 0.2713
17
****** 5 *****
Hypothesis Test for mean of the relative error.
The relative error between the calculated diameter of the spindle and
corresponding approximated by using Artificial Neural Network is expected to be
0.05.
We will test our hypothesis for the same.
The null hypothesis H
0
: 0.050 mm
Alternative hypothesis H
1
: > 0.050 mm
Sample mean ( ere ) = 0.25 mm
The Z-statistic : Z
0
=
n
S
ere
ere
050 . 0
=
31
03 . 0
050 . 0 25 . 0
= 37.11
This is a right tail Hypothesis Test.
Accept H0 Reject H0
18
The acceptance region for null hypothesis H
0
is Z < Z
0.05
and rejection is Z > Z
0.05
Hence, Z
0.05
is the critical point/value for the test.
We have Z
0.05
= 1.645
So, Z
0
> Z
0.05
This makes to reject the null hypothesis that, true mean of relative error would
be less than 0.05.
Here, we can conclude that, the Artificial Neural Networks model failed to
conform to the required performance.
We have the strong evidence that model would predict relative error higher
than 0.05.
19
**** 6 ****
Regression Analysis
Check relationship between number of neuron and relative error.
Prediction of Relative error for corresponding value of number of neuron.
Further analysis is required to check for the relationship between the number of
neurons (non) and the relative error (ere).
We have seen from the graph, that approximately a straight line can be fitted
(subjectively) in the plot.
This shows that we can use simple linear regression analysis to predict the
corresponding values of relative error (ere) for the known or certain values of
number of neurons (non).
As the point on the regression line at any value of non is the expected mean
value of corresponding ere, we need to find out the regression line/model.
i i i
non ere + + =
1 0
Intercept Slope of Random
of line the line error
The expected value of intercept of the line =
0
The expected value of slope of the line =
1
Hence, The equation is,
i
re e =
0
+
i i
non +
1
20
xx
i
i
i
i
xy
i
i
i
i
i
i i
S
non
non
S
non ere
non ere
\
|
|
\
|
=
=
= =
=
45
1
2
45
1 2
45
1
45
1
45
1
1
45
) (
) (
45
) * (
45
24710841
701037
45
) 4971 )( 57 . 10 (
1522
=
=
2 . 151907
37 . 354
= 0.0023
This value is used in further analysis as well as to compute
0
(intercept of the
regression line).
non ere
1 0
=
=
= =
45
1
45
1
45
1
* 0023 . 0
1
i
i
i
i
non ere
n
= 1/45 * 10.57 0.0023 * 110.50
= 0.2348 0.2541
= - 0.01921
Hence, linear regression model is
+ + = non e r e * 0023 . 0 01921 . 0
As the intercept is negative, it is evident here that the regression line
(relationship line) would intersect Y axis below zero.
21
**** 7 ****
Test intercept and slope (confidence intervals)
We will test the intercept and slope for the confidence interval.
As assumed before again considered significance level with 05 . 0 =
Hence, confidence interval for slope is,
xx
n
xx
n
S
t
S
t
2
) 2 (
2
1 1
2
) 2 (
2
1
+
(n-2) = Degrees of freedom = (45-2) = 43
=
2
Variance of the error term
= SSE / (n-2)
SSE = error sum of squares
=
xy
S SST
1
=
ere ere
i
i
= 3.365 45(0.23)
2
-0.815
= 0.17
2 45
17 . 0
=
= 0.004
We know that S
xx
= 151907.2
22
2 . 151907
004 . 0
021 . 2 0023 . 0
2 . 151907
004 . 0
021 . 2 0023 . 0
1
+
0024 . 0 0021 . 0
1
Here we can say that, if we test 100 values, then out of those, 95 values will fall
in between above two values.
Confidence interval for intercept is,
|
\
|
+ +
|
\
|
+
xx
n
xx
n
S
n o n
n
t
S
n o n
n
t
2
2
) 2 (
2
0 0
2
2
) 2 (
2
0
) (
1
) (
1
=
|
\
|
+ +
|
\
|
+
2 . 151907
90 . 12202
45
1
004 . 0 021 . 2 01921 . 0
2 . 151907
90 . 12202
45
1
004 . 0 021 . 2 01921 . 0
0
6461 . 0 01921 . 0 6461 . 0 01921 . 0
0
+
6268 . 0 6653 . 0
0
Here, we can say with 95% confidence that if we test for 100 values of
0
, out of
that 95 values will fall between above two values.
23
**** 8 and 9 ****
Test for mean response of relative error for the values of number of
neutrons (confidence intervals)
And
Test for the prediction interval for future observations o the relative error
(confidence intervals)
We can do the test for the confidence interval for the mean response i.e. mean
relative error at number of neurons (non) = 31 and 05 . 0 =
Hence, 95% confidence interval for mean relative error (ere) =
|
|
\
|
+
xx
n
non ere
S
non non
n
t
2
2
) 2 (
2
|
) (
1
|
|
\
|
+
xx
n
non ere
S
non
n
t
2
2
) 2 (
2
|
) 31 (
1
Here, non
non ere
* 0023 . 0 01921 . 0
|
+ =
= -0.01921 + 0.0023 (31)
= 0.05
Hence,
ere
confidence interval =
|
|
\
|
+
2 . 151907
) 46 . 110 31 (
45
1
004 . 0 021 . 2 05 . 0
2
= 5081 . 0 05 . 0
Hence, the desired 95% confidence interval on
31 | ere
is 0.5581 to 0.4581
Hence, we can say that out of 100 tested values, 95 will fall in above range.
24
The prediction interval for future observation or future value of relative error at
95% prediction interval is =
Here,
) (
1 0 0
non e r e + =
= -0.01921 + 0.0023(31)
= 0.05
Hence,
(
+ +
2 . 151907
) 46 . 110 31 (
45
1
1 0.004 2.021 0.05
2
= 0.05 0.1297
So we can say that, future value of relative error with 95% confidence will fall
between 0.1797 and -0.0797.
The results are plotted and are shown in the next page.
(
+ +
xx
n
S
n
t e r e
2
2
) 2 (
2
0
) 46 . 110 31 (
1
1
+ +
xx
n
S
non non
n
t e r e
2
2
) 2 (
2
0
) (
1
1
25
**** 10 ****
Test for intercept and slope (Hypothesis Testing)
We need to test the intercept
0
and slope
1
through hypothesis testing.
For
0
, we will test whether there is true linear relationship or no linear
relationship.
Null Hypothesis H
0
:
1
= 0
Alternative hypothesis H
1
: 0
1
We have 05 . 0 =
0023 . 0
1
=
45 = n
2 . 151907 =
xx
S and
004 . 0
2
=
We will use t-statistic
xx
S
t
2
1
0
=
2 . 151907
004 . 0
) 0023 . 0 (
2
0
= t
0323 . 0
0
= t
The reference value of t from Student distribution table is = 2.021
26
This shows that the test statistic is different and far into the critical region and
hence implies that null hypothesis H
0
:
1
= 0 needs to be rejected.
The conclusion can be drawn that there is a linear effect of number of neuron
on relative error.
Also, we can say that, straight line fitted in the plot shows adequate relationship
between both.
27
**** 11 ****
Test of strength of relationship between number of neurons and
corresponding relative error.
We have another test to check, how strong linear relationship is there between
number of neuron and corresponding value of relative error. For that, we have
coefficient of determination r
2
.
= =
SST
SSR
r
2
Sum of squares for regression / Total sum of squares
=
SST
SSE
1
=
2
45
1
2
45
1
) (
) (
1
=
=
i
i
i
i i
e r e ere
e r e ere
=
878 . 0
05 . 0
1
= 0.94326
Here, we can say that linear regression relationship between relative error and
number of neurons is supported by 94%.
28
Q.1.b.
The Hypothesis,
Null Hypothesis H
0
:
0
=
Alternative hypothesis H
1
:
0
The population is normal with known.
Steps to perform this hypothesis test:
The proposed hypothesis needs to be checked with Two-Trail test.
Here, we have assumed that (1- ) is the confidence coefficient.
As we are interested in the population parameter true mean (
0
), we
need to find out , where = sample mean.
We collect some samples at random (generally sample size > 30) = n
As the population is normal, the sample distribution is also normal, as
shown in the figure.
29
As mentioned earlier, the significance level is , and we have the two tail
hypothesis test, as shown in the figure.
The acceptance region for null hypothesis (
0
) is,
) ( ) (
2 2
+ < <
Here,
0
is the test statistic given by,
n
0
=
Here
2
and
2
.
If
2
< or
2
+ >
We reject null hypothesis
0
, else we accept it.
30
b(i)
Now consider,
<< 0 and the critical values are
and
.
Now is the type I error. That means unfortunately, the mean of the selected
samples falls in the rejection region, even though the true mean is the indeed the
hypothesized mean.
I.e. we reject null hypothesis
0 0
: = even though it is true.
is nothing but probability of making type I error.
= (Type I error)
= P (reject
0 0
| is true)
We have probability of accepting
0
is,
] [ 1 ) ( + =
Probability of type I error i.e. rejecting
0
is,
]] [ 1 [ 1 +
= + + 1 1
=
31
b(ii)
Now we have,
+ =
0 0
:
+
0 1
:
= probability of accepting the null hypothesis even though it is false.
Assuming, 0
We have test statistic (
0
) =
n
0
=
n
n
+
+ ) (
0
The distribution Z
0
for H
1
to be true is
) 1 , ( ~
0
n
N z
32
If
1
is true, type II error will exist only if
0
I.e. probability of the type II error is the probability that
0
falls between
and
, given that
1
is true.
|
|
\
|
|
|
\
|
|
|
\
|
|
|
\
|
=
n n
Here, ) (z is the probability on the left side of z in the standard normal
distribution.
33
c(i)
=
2
Mean absorption time of competitors product.
=
1
Mean absorption time of our companys product.
2
1
and
2
2
are respective variances.
The Assumptions I considered are:
1. The samples of Our Companys pills are selected at random from the large
population.
2. The samples selected from the competitors population are also random and
its population is also large.
3. Both of the samples are supposed to be independent so as to avoid any
relationship to arise between them at the beginning of the statistical analysis
and subsequently to avoid this interference of the relationship in the further
analysis.
4. Both populations are normal.
5. The sample size is large, i.e. Larger than 30.
6. The critical region is considered at
= 0.05
Null hypothesis,
0 2 :
2 1 0
=
0 2 :
2 1 1
<
2 1
&n n are respective sample sizes.
2 1
& are the sample means corresponding to
1
and
2
respectively.
34
Now, we can consider 2 1 2X X as the point estimate for
2 1
2 .
This is based on
2 1
2 1 2 1 2 ) 2 ( ) ( ) 2 ( = = X E X E X X E
So the variance will also be
2
2
2
1
2
1
2 1 2 1
4
) 2 ( ) ( ) 2 (
n n
X V X V X X V
+ = + =
In order to standardize, We have,
2
2
2
1
2
1
2 1 2 1
0
4
) 2 ( 2
n n
+
= ------- (11)
So, the test statistic will be
2
2
2
1
2
1
2 1
0
4
0 2
n n
+
=
The conclusions that we can draw are:
Reject 0 2 :
2 1 0
= , if
<
0
------- ) 05 . 0 ( =
I.e. we reject
0
at the critical value
05 . 0
(Left-tail Test) and accept the
alternative hypothesis.
35
Our Companys new gastric relieving pills work at the lesser speed than twice the
competitors pills work.
The probability of
<
0
, is P (
<
0
) ------- (C
12
)
It is the probability of rejecting the null hypothesis.
Else,
Accept the null hypothesis and we can say that our Companys pills work at
equal or greater speed than that of the competitors.
36
C(ii)
Here, I have considered and Implemented two methods
1
st
Method
Here, the problem is for minimizing the cost for the specified variance for the
difference in the sample means.
So, the variance for the difference in sample means =
K
n n
X V X V X X V = + = + =
2
2
21
1
2
1
2 1 2 1 ) ( ) ( ) (
Here we have formed function, using Lagrange multiplier ( ) to optimize the
cost,
( )
2 2
1 2
1 2 1 1 2 2
1 2
, , f n n n n K
n n
| |
= + + +
|
\
Derivative functions f with respect to
1
n ,
2
n and , and equate to 0.
Therefore,
2
1
1 2
1 1
0
f
n n
= =
2
2
2 2
2 2
0
f
n n
= =
So,
1
1 1
= n
and
2
2 2
= n
Here, we have,
37
2 2
1 2
1 2
K
n n
+ =
Put the values of n
1
and n
2
as above.
So,
K = +
2
2
2
2
1
1
2
1
K =
+
2 2 1 1
For, n
1
, put
1
1 1
n
=
We get
1
2 2 1 1 1
1
) (
K
n
+
=
Similarly for n
2
, put
2
2 2
n
=
We get
2
2 2 1 1 2
2
) (
K
n
+
=
38
2
nd
Method
For this 2
nd
method, here we have considered cost as r
1
and r
2
rather than
1
and
2
.
We have,
2 1 0
: = null hypothesis
2 1 1
: alternative hypothesis
We have,
r
1
= cost of observation for one sample in the sample size n
1
r
2
= cost of observation for one sample in the sample size n
2
To minimize total cost of study, we have,
2
1
1
2
r
r
n
n
=
In this square root rule, if the costs are not different, then n
1
= n
2
can be chosen.
The total cost of sampling is,
C = r
1
n
1
+ r
2
n
2
And we need to minimize,
2 1
1 1
n n
+ , subject to total cost C.
Then we have,
+
=
+
=
2 1 2
2
2 1 1
1
r r r
C
n
r r r
C
n
Minimum cost sample sizes.
These n
1
and n
2
values can be input into the equation,
39
2
2
2
1
2
1
2
2
2
1
n n
n
+
+
=
Also we have,
2
2
2
1
0 2 1
+
= d
As we know the value ofd and n, we can use these two values in O.C. curve
with 05 . 0 = .
If the calculated value is same as what we expected then we can say that
calculated n
1
and n
2
are
the correct smallest sample sizes with minimum cost of
experiment. On the other hand, if the calculated value of is not same as the
40
expected one then the slight adjustment to values of n
1
and n
2
can be done to
find the new sample sizes.
41
c (iii)
Here, in the problem we have to minimize . That is we have to maximize the
power of the test ( 1 ). We can define power of the tests as the probability of
rejecting the null hypothesis H0, when the alternative hypothesis is true. It is the
probability of correctly rejecting a false null hypothesis.
We have,
) ( ) (
2
2
2
1
2
1
2 /
2
2
2
1
2
1
2 /
n n
z
n n
z
+
+
=
In order to minimize , we should make
2
2
2
1
2
1
n n
+
large, that is make
2
2
2
1
2
1
n n
+
as small as possible.
Using Lagrange multiplier minimize
2 2
1 2
1 2
n n
+
As we can consider
1 2
n n N + = as a constraint, we have,
( ) ( )
2 2
1 2
1 2 1 2
1 2
, , f n n n n N
n n
= + + +
Derivative functions f with respect to
1
n ,
2
n and , make them equal to 0,
hence,
2
1
2
1 1
0
f
n n
= =
2
2
2
2 2
0
f
n n
= =
Solving,
42
1
1
= n
And
2
2
= n
Now we have,
n
1
+ n
2
= N
So we get,
2 1
+ = N
To get n
1
we put
1 1
/ n =
We get,
1
1
1 2
n N
=
+
And to get n
2
we put
2 2
/ n =
2
2
1 2
n N
=
+
------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------
References:
[1] Montgomery D.C., Runger G.C. Applied Statistics and Probability for Engineers. John Wiley
and Sons, ISBN: 0-471-20454-4, 2003.
[2] http://www.microbiologybytes.com/maths/1011-20.html
[3] Anand J. Kulkarni, Ren V. Mayorga: The Design of the Spindle of the Cylindrical Grinding
Machine through Neural Networks and Genetic Algorithm. IICAI 2005: 3323-3336