Sunteți pe pagina 1din 42

1

Engineering Research Methodology


(M-7806)


Assignment-2

Independent Study Assignment


Anand Jayant Kulkarni
PhD Student (MAE)




Date: Oct 30
th
, 2007






2
a. Importance of Statistics in Engineering Research:

The research engineer who practices engineering research is one of the
important parts of the society and solves the problems of interest to society itself.
While doing so, he uses efficient application scientific principles [1].

As is the science of measuring and characterizing variation, the field of statistics
involves dealing with collection of data, presentation and modeling, analysis
of the data. Research engineers use this data in order to make the further
decision in solving, modifying, refining/ improving, developing the designs,
processes and products. Almost all the engineering streams and practices
involve working data and it is very important to know statistics to engineers in
order to make the best use of it.

The engineering research has the following steps as shown in the figure 1, and
right from the start statistics is required to proceed to the next step.



The Engineering Method [1]
Generally, it is not possible to collect the whole data or the whole population.
This makes the data collected incomplete (also can be called as sample data)
and also in the raw form and there is a relation between the various variables
that needs to be established in order to proceed further in the research process.

3
Because of this, almost all the engineering problems include uncertainty / risk
and variability in the process and results. The data collected can be random data
and incomplete data and for knowing the trend of the variability it can be
organized for interpretation in the meaningful form that can be applied for may be
further improvement of the process with the various statistical techniques (Ex.
Correlation, Regression, Figure 2). The data collected needs to be checked for
the relationship i.e. for the linear or non-linear relationship. The strength of the
relationship can also be present or checked quantitatively. These quantitative
values are very important in research rather than the subjective judgment.

Regression Line



Fig. 2 Example of Linear Regression

The role of the statistics starts right from the first step of the engineering research.
For example, how and why is the variation of the diameter of the shaft being
produced in the machine-shop? The measurement of the samples of the shafts
for a shift, a day, a week and/ or a month would give the data regarding what is
the variability of the diameter about its expected value. Is the variation in the
desired limit? The observed variability depends on many parameters like,
4
vibrations to the machine, the tool setting, temperature of the job while machining,
error in the measuring instrument and many more.

Statistics gives the important framework and relationship for describing such
variability and learning about which potential sources or parameters of variability
are the most important and have major impact on the variability of the diameter.
This makes to concentrate on the impact parameters to concentrate on and
suggesting the modifications. The impact of individual parameter on the
performance of the system can be checked with slight alterations in the
performance affecting parameter. The corresponding change in the performance
can be measured and the conclusion can be drawn about which parameter to
focus on. This conclusion can be drawn using the statistical comparison from the
previous process performance and the performance after alteration. This can be
done using the observed data from the observational study and comparative
experimentation. The Design of Experimentation (DOE) study is very useful in
this regard.

The research engineers always want to apply the new tools and / or techniques
to the available processes and / or techniques in order to achieve the better
performance. Two or more existing processes / techniques / models / plant
performances can be compared and the impact parameters can be checked for
the further decisions. The decision, why to apply and when to apply the proposed
ones is based on the statistical conclusion. As mentioned in [3], the decision of
applying the new technique to optimization of Artificial Neural Networks has been
analyzed by comparing the available traditional standard method with the new
proposed method. How better the proposed model performs, is determined using
the relative-error plotting and the trend of the plot.

Now, another most important feature of statistics for a research engineer is the
inference. In statistics the inferences can be made about the whole population
from the limited and incomplete information. This is done on the basis of the
5
suitable probability model that can best be applied to the available data to take
the decisions in the future for the modifications and refinements. As considered
in the above example, the variability of the diameter of the shaft is considered
from the samples taken from a shift, a day, a week, etc. From the selected
samples, the engineer has to infer about probable variability in the parameter
under consideration for the total population (See figure 3). The figure 4 below
gives the idea regarding the probability of the performance parameter will fall into
the region. The shaded region on the right shows the rejection region i.e. the
performance will be below expectation in this region while the performance will
be adequate in the region on the left side.



Fig.3 Statistical Inference

6

Fig.4 Probability model for the hypothesis test [2]


The physical experiments are in order to validate the proposed method or
phenomena and are based on some assumptions and prior scientific knowledge.
The refinement of the proposed model can be done on the basis of statistical
analysis of the experimental data. (For example, one can consider the
comparison simulated model verses analytical model). The modified model /
method assist in order to take the decision of the use of the proposed model as
the better solution of the problem at hand. The research engineer can propose
the model with the concept of the confidence interval. The concept of
confidence interval is related to predicting the particular performance parameter
of the whole population from the sample/s only. Using this concept, one can tell
how reliable or how better the model results would be (please see the example
illustrated in this report)!

The knowledge of effective data collection simplifies the analyzing and
understanding the whole population under study. The data collection can be from
the observational study (Processes over time); it can be a historical data and / or
can be from the designed experiment. Furthermore, with the type of collected
data, and sample size collected there are different models available which can be
applied at those circumstances for better presentation and interpretation of the.
7
For example, the Dot Diagram (see figure 5) can be useful when the data
collected is less and needs the comparison (may be for the improvement of the
same process after alterations in certain parameters.


Fig.5 Dot Diagram


This process of refinement of the model is done through the iterations. But the
iterations and the parameter/s to be varied are again depending on the statistical
analysis of the previous iteration results. So this highlights that the statistics is an
essential part of every step in the engineering research and also highlights that
the research engineer needs statistics for better decision making, better
inference, better presentation, right analysis and showing the right direction for
the improvement of the research to get the better results.

References:
[1] Montgomery D.C., Runger G.C. Applied Statistics and Probability for Engineers. John Wiley
and Sons, ISBN: 0-471-20454-4, 2003.
[2] http://www.microbiologybytes.com/maths/1011-20.html
[3] Anand J. Kulkarni, Ren V. Mayorga: The Design of the Spindle of the Cylindrical Grinding
Machine through Neural Networks and Genetic Algorithm. IICAI 2005: 3323-3336








8
All of the above points mentioned about the importance of statistics in
engineering have been implemented in the subsequent sections. I have decided
to statistically analyze some of the part of my own research work, which I did
during my Masters studies.

Background

During my Masters Study with Dr. Rene V. Mayorga, we successfully
implemented the optimization of Artificial Neural Networks (ANN) through
Genetic Algorithms (GA). For that we had selected a problem of mapping relation
between force applied on the spindle of cylindrical grinding machine spindle while
machining and corresponding diameter of the spindle required. During the
intermediate stages the GA was used to randomly select the number of neurons
in ANN. The results / data collected were ignored as we had some other way to
tackle the problem. The ignored data has been used here for the statistical
analysis.

The data collected are the Relative Error values and corresponding Number of
Neuron values. Relative error is the error between calculated values of spindle
diameter of cylindrical grinding machine and approximated values of diameter
using neural networks.

The values of number of neurons are selected randomly using genetic algorithm
and corresponding values of relative error are noted. The experiment / simulation
were conducted for 1000 random values and corresponding values are collected.
Out of 1000 values I have selected 45 random values and the statistical analysis
is done.

In the subsequent sections for the use of central limit theorem more number of
samples was selected and the procedure is mentioned in these sections.

9
The tests conducted are explained in the next pages.
The purpose of tests, are to judge the performance of Artificial Neural Network,
which approximates the diameter of the spindle. The approximated relative error
was expected to be below 0.05.
The required tests include Hypothesis testing for mean error. Further the data is
required to be presented using Box-plot for the properties like central tendency
and others.
The relation between number of neurons and relative error is also required to be
checked for the future prediction.
The tests to be done are
1. Finding out mean
For central tendency of the data collected.
For further calculations.

2. Sample standard deviation and variance
To extract more information regarding variability of the data
collected.
For further calculations.

3. Presentation of the data using XY-Plot and Box-plot
Visualization of the data through quartiles, percentiles, outliers,
extreme outliers, central tendency, spread, departure.

4. Use of central limit theorem
For point estimation and confidence interval of population mean.

5. Hypothesis Testing
For mean of the relative error.

6. Regression Analysis
Check the relationship between number of neuron and relative error.
10
Prediction of relative error for corresponding values of number of
neuron.

7. Test for intercept and slope (Confidence intervals and Hypothesis
Test)

8. Test mean response at number of neuron values (Confidence
Intervals)

9. Prediction interval for future observations of the relative error
(Confidence Intervals)

10. Test for intercept and slope (Hypothesis Test)

11. Test the strength of the relationship between number of neuron and
relative error.















11
***** 1 and 2 *****

Sample Mean
For central tendency of the data collected.
For future calculations.

Sample Standard Deviation and Variance
For extracting information regarding variability of data collected.
For future calculations.

The statistical analysis starts with finding out sample mean of the relative error
values Relative error = ere
Mean relative error = ere

45
......
45 2 1
ere ere ere
ere
+ + +
=

45
46 . 0 ..... 03 . 0 02 . 0 + +
=

= 0.22 ---------- Relative error has no unit

This would be needed in further calculations and analysis of the data available.

To extract more information like scatter of data, sample variance and standard
deviation is required to be calculated.

The sample variance,
(S
ere
)
2
=
2
45
1
) 1 45 (
) (

i
i
ere ere


12

= 0.01960
The sample standard deviation, Sere = 01960 . 0
= 0.14
Here, we can say that the variance and standard deviation shows that there is a
little variability in the data.

























13
***** 3 *****

Graphical Representation of the Data (Sample) on XY-plot and on Box-plot

Graphical presentation of the data using XY and Box plot is used to visualize the
data.
Box plot is used to visualize the available data regarding the symmetry of relative
error, through quartiles and percentiles. This also shows outliers/extreme outliers
as well.
The features of the data can be seen are central tendency, spread, departure
and from symmetry.

q
1
= 1
st
quartile = (n+1)/4 = (45+1)/4 = 11.5
th
value.
Hence, 1
st
quartile relative error = 0.125

q
2
= 2
nd
quartile = 23rd value, as sample size = 45
Hence, 2
nd
quartile relative error = 0.22

q
3
= 3
rd
quartile = 3(n+1)/4 = 3(45+1)/4 = 34.5
th
value
Hence, 3
rd
quartile relative error = 0.38

Inter-quartile range (IQR) = q3 q1 = 0.38 0.125 = 0.255

IQR is a measure of variability and also is less sensitive to extreme values.
1.5(IQR) = 1.5(34.5 11.5) = 23






14

3 IQR 3 IQR

1.5 IQR 1.5 IQR




Min. Relative error max. Relative error

0.02 0.03 q1= q2= q3=
0.125 0.22 0.38

Box Plot

Here we can say that there are no outliers and hence no extreme outliers and
hence no significant variability in the collected data.
















15
***** 4 *****

Use of Central Limit Theorem
For the Point Estimation and confidence interval of mean of Relative
Error.

This theorem is used to do calculate the inference of population mean. We are
interested in the mean relative error of the whole population.
From the available 1000 values / data, 31 random samples each having
40 values of relative error are drawn
The mean of each sample is computed
The mean of these mean values is the point estimate for the true relative
error mean ( ere )

The confidence interval of 95% is considered.
This is for the true mean ere
05 . 0 =

Number of samples = n
1
= 31
The mean computed = ere = 0.25
And Standard deviation = S
ere
= 0.03

Hence, the confidence interval for population mean is,

n
S
Z ere
n
S
Z ere
ere
n ere
ere
n 025 . 0 ) 1 ( 025 . 0 ) 1 (
+



31
03 . 0
042 . 2 25 . 0
31
03 . 0
042 . 2 25 . 0 +
ere


16

0.25 0.0513
ere
0.25 + 0.0513

0.240
ere
0.2713

Therefore, we can say with 95% confidence that the mean of relative error of
population will fall between 0.240 and 0.2713.



ere

Or, we can say that, if we test 100 values of
ere
out of them 95 values will fall
between interval 0.24 and 0.2713













17
****** 5 *****

Hypothesis Test for mean of the relative error.

The relative error between the calculated diameter of the spindle and
corresponding approximated by using Artificial Neural Network is expected to be
0.05.

We will test our hypothesis for the same.
The null hypothesis H
0
: 0.050 mm
Alternative hypothesis H
1
: > 0.050 mm

Sample mean ( ere ) = 0.25 mm

The Z-statistic : Z
0
=
n
S
ere
ere
050 . 0


=
31
03 . 0
050 . 0 25 . 0


= 37.11

This is a right tail Hypothesis Test.


Accept H0 Reject H0
18
The acceptance region for null hypothesis H
0
is Z < Z
0.05
and rejection is Z > Z
0.05
Hence, Z
0.05
is the critical point/value for the test.
We have Z
0.05
= 1.645
So, Z
0
> Z
0.05

This makes to reject the null hypothesis that, true mean of relative error would
be less than 0.05.

Here, we can conclude that, the Artificial Neural Networks model failed to
conform to the required performance.

We have the strong evidence that model would predict relative error higher
than 0.05.


















19
**** 6 ****

Regression Analysis
Check relationship between number of neuron and relative error.
Prediction of Relative error for corresponding value of number of neuron.


Further analysis is required to check for the relationship between the number of
neurons (non) and the relative error (ere).
We have seen from the graph, that approximately a straight line can be fitted
(subjectively) in the plot.

This shows that we can use simple linear regression analysis to predict the
corresponding values of relative error (ere) for the known or certain values of
number of neurons (non).
As the point on the regression line at any value of non is the expected mean
value of corresponding ere, we need to find out the regression line/model.

i i i
non ere + + =
1 0


Intercept Slope of Random
of line the line error

The expected value of intercept of the line =
0


The expected value of slope of the line =
1



Hence, The equation is,
i
re e =
0

+
i i
non +
1



20
xx
i
i
i
i
xy
i
i
i
i
i
i i
S
non
non
S
non ere
non ere

\
|
|

\
|

=
=
= =
=
45
1
2
45
1 2
45
1
45
1
45
1
1
45
) (
) (
45
) * (



45
24710841
701037
45
) 4971 )( 57 . 10 (
1522

=

=
2 . 151907
37 . 354


= 0.0023
This value is used in further analysis as well as to compute
0

(intercept of the
regression line).

non ere
1 0

=
=

= =

45
1
45
1
45
1
* 0023 . 0
1
i
i
i
i
non ere
n

= 1/45 * 10.57 0.0023 * 110.50
= 0.2348 0.2541
= - 0.01921
Hence, linear regression model is
+ + = non e r e * 0023 . 0 01921 . 0

As the intercept is negative, it is evident here that the regression line
(relationship line) would intersect Y axis below zero.
21
**** 7 ****

Test intercept and slope (confidence intervals)

We will test the intercept and slope for the confidence interval.
As assumed before again considered significance level with 05 . 0 =

Hence, confidence interval for slope is,
xx
n
xx
n
S
t
S
t
2
) 2 (
2
1 1
2
) 2 (
2
1



+


(n-2) = Degrees of freedom = (45-2) = 43

=
2
Variance of the error term
= SSE / (n-2)

SSE = error sum of squares
=
xy
S SST
1

.SST = Total sum of squares.


= ) 37 . 354 ( 0023 . 0 ) ( 45
2
45
1
2

=
ere ere
i
i

= 3.365 45(0.23)
2
-0.815
= 0.17

2 45
17 . 0

=

= 0.004


We know that S
xx
= 151907.2
22

2 . 151907
004 . 0
021 . 2 0023 . 0
2 . 151907
004 . 0
021 . 2 0023 . 0
1
+

0024 . 0 0021 . 0
1


Here we can say that, if we test 100 values, then out of those, 95 values will fall
in between above two values.



Confidence interval for intercept is,

|

\
|
+ +
|

\
|
+

xx
n
xx
n
S
n o n
n
t
S
n o n
n
t
2
2
) 2 (
2
0 0
2
2
) 2 (
2
0
) (
1

) (
1




=

|

\
|
+ +
|

\
|
+
2 . 151907
90 . 12202
45
1
004 . 0 021 . 2 01921 . 0
2 . 151907
90 . 12202
45
1
004 . 0 021 . 2 01921 . 0
0


6461 . 0 01921 . 0 6461 . 0 01921 . 0
0
+

6268 . 0 6653 . 0
0


Here, we can say with 95% confidence that if we test for 100 values of
0
, out of
that 95 values will fall between above two values.



23
**** 8 and 9 ****

Test for mean response of relative error for the values of number of
neutrons (confidence intervals)
And
Test for the prediction interval for future observations o the relative error
(confidence intervals)

We can do the test for the confidence interval for the mean response i.e. mean
relative error at number of neurons (non) = 31 and 05 . 0 =

Hence, 95% confidence interval for mean relative error (ere) =

|
|

\
|

+

xx
n
non ere
S
non non
n
t
2
2
) 2 (
2
|
) (
1




|
|

\
|

+

xx
n
non ere
S
non
n
t
2
2
) 2 (
2
|
) 31 (
1



Here, non
non ere
* 0023 . 0 01921 . 0
|
+ =
= -0.01921 + 0.0023 (31)
= 0.05

Hence,
ere

confidence interval =
|
|

\
|
+
2 . 151907
) 46 . 110 31 (
45
1
004 . 0 021 . 2 05 . 0
2

= 5081 . 0 05 . 0

Hence, the desired 95% confidence interval on
31 | ere

is 0.5581 to 0.4581
Hence, we can say that out of 100 tested values, 95 will fall in above range.
24
The prediction interval for future observation or future value of relative error at
95% prediction interval is =







Here,

) (

1 0 0
non e r e + =
= -0.01921 + 0.0023(31)
= 0.05

Hence,
(


+ +
2 . 151907
) 46 . 110 31 (
45
1
1 0.004 2.021 0.05
2

= 0.05 0.1297

So we can say that, future value of relative error with 95% confidence will fall
between 0.1797 and -0.0797.




The results are plotted and are shown in the next page.


(


+ +

xx
n
S
n
t e r e
2
2
) 2 (
2
0
) 46 . 110 31 (
1
1


+ +

xx
n
S
non non
n
t e r e
2
2
) 2 (
2
0
) (
1
1

25
**** 10 ****

Test for intercept and slope (Hypothesis Testing)

We need to test the intercept
0
and slope
1
through hypothesis testing.
For
0
, we will test whether there is true linear relationship or no linear
relationship.

Null Hypothesis H
0
:
1
= 0
Alternative hypothesis H
1
: 0
1


We have 05 . 0 =
0023 . 0

1
=
45 = n
2 . 151907 =
xx
S and
004 . 0
2
=

We will use t-statistic
xx
S
t
2
1
0

=
2 . 151907
004 . 0
) 0023 . 0 (
2
0
= t

0323 . 0
0
= t

The reference value of t from Student distribution table is = 2.021

26
This shows that the test statistic is different and far into the critical region and
hence implies that null hypothesis H
0
:
1
= 0 needs to be rejected.

The conclusion can be drawn that there is a linear effect of number of neuron
on relative error.

Also, we can say that, straight line fitted in the plot shows adequate relationship
between both.























27
**** 11 ****

Test of strength of relationship between number of neurons and
corresponding relative error.

We have another test to check, how strong linear relationship is there between
number of neuron and corresponding value of relative error. For that, we have
coefficient of determination r
2
.

= =
SST
SSR
r
2
Sum of squares for regression / Total sum of squares

=
SST
SSE
1

=
2
45
1
2
45
1
) (
) (
1

=
=

i
i
i
i i
e r e ere
e r e ere


=
878 . 0
05 . 0
1

= 0.94326

Here, we can say that linear regression relationship between relative error and
number of neurons is supported by 94%.




28
Q.1.b.

The Hypothesis,

Null Hypothesis H
0
:
0
=
Alternative hypothesis H
1
:
0

The population is normal with known.

Steps to perform this hypothesis test:

The proposed hypothesis needs to be checked with Two-Trail test.




Here, we have assumed that (1- ) is the confidence coefficient.

As we are interested in the population parameter true mean (
0
), we
need to find out , where = sample mean.

We collect some samples at random (generally sample size > 30) = n

As the population is normal, the sample distribution is also normal, as
shown in the figure.

29
As mentioned earlier, the significance level is , and we have the two tail
hypothesis test, as shown in the figure.

The acceptance region for null hypothesis (
0
) is,
) ( ) (
2 2

+ < <
Here,
0
is the test statistic given by,

n


0

=

Here
2

and
2

+ are two critical points/values of the test.


We find out the above critical values at specified value of
2

.
If
2

< or
2

+ >
We reject null hypothesis
0
, else we accept it.














30
b(i)

Now consider,
<< 0 and the critical values are

and



.





Now is the type I error. That means unfortunately, the mean of the selected
samples falls in the rejection region, even though the true mean is the indeed the
hypothesized mean.
I.e. we reject null hypothesis
0 0
: = even though it is true.
is nothing but probability of making type I error.

= (Type I error)
= P (reject
0 0
| is true)

We have probability of accepting
0
is,
] [ 1 ) ( + =



Probability of type I error i.e. rejecting
0
is,
]] [ 1 [ 1 +
= + + 1 1
=
31
b(ii)

Now we have,

+ =
0 0
:
+
0 1
:

= probability of accepting the null hypothesis even though it is false.
Assuming, 0
We have test statistic (
0
) =
n


0


=

n
n
+
+ ) (
0


The distribution Z
0
for H
1
to be true is
) 1 , ( ~
0

n
N z









32
If
1
is true, type II error will exist only if



0


I.e. probability of the type II error is the probability that
0
falls between



and

, given that
1
is true.

|
|

\
|
|
|

\
|

|
|

\
|
|
|

\
|
=


n n


Here, ) (z is the probability on the left side of z in the standard normal
distribution.



















33
c(i)

=
2
Mean absorption time of competitors product.
=
1
Mean absorption time of our companys product.

2
1
and
2
2
are respective variances.

The Assumptions I considered are:

1. The samples of Our Companys pills are selected at random from the large
population.
2. The samples selected from the competitors population are also random and
its population is also large.
3. Both of the samples are supposed to be independent so as to avoid any
relationship to arise between them at the beginning of the statistical analysis
and subsequently to avoid this interference of the relationship in the further
analysis.
4. Both populations are normal.
5. The sample size is large, i.e. Larger than 30.
6. The critical region is considered at

= 0.05

Null hypothesis,
0 2 :
2 1 0
=
0 2 :
2 1 1
<

2 1
&n n are respective sample sizes.
2 1
& are the sample means corresponding to
1
and
2
respectively.


34
Now, we can consider 2 1 2X X as the point estimate for
2 1
2 .
This is based on

2 1
2 1 2 1 2 ) 2 ( ) ( ) 2 ( = = X E X E X X E

So the variance will also be

2
2
2
1
2
1
2 1 2 1
4
) 2 ( ) ( ) 2 (
n n
X V X V X X V

+ = + =

In order to standardize, We have,

2
2
2
1
2
1
2 1 2 1
0
4
) 2 ( 2
n n


+

= ------- (11)

So, the test statistic will be

2
2
2
1
2
1
2 1
0
4
0 2
n n

+

=

The conclusions that we can draw are:

Reject 0 2 :
2 1 0
= , if

<
0
------- ) 05 . 0 ( =

I.e. we reject
0
at the critical value
05 . 0
(Left-tail Test) and accept the
alternative hypothesis.


35
Our Companys new gastric relieving pills work at the lesser speed than twice the
competitors pills work.

The probability of

<
0
, is P (

<
0
) ------- (C
12
)
It is the probability of rejecting the null hypothesis.

Else,

Accept the null hypothesis and we can say that our Companys pills work at
equal or greater speed than that of the competitors.





















36
C(ii)
Here, I have considered and Implemented two methods

1
st
Method
Here, the problem is for minimizing the cost for the specified variance for the
difference in the sample means.

So, the variance for the difference in sample means =

K
n n
X V X V X X V = + = + =
2
2
21
1
2
1
2 1 2 1 ) ( ) ( ) (


Here we have formed function, using Lagrange multiplier ( ) to optimize the
cost,
( )
2 2
1 2
1 2 1 1 2 2
1 2
, , f n n n n K
n n


| |
= + + +
|
\


Derivative functions f with respect to
1
n ,
2
n and , and equate to 0.
Therefore,
2
1
1 2
1 1
0
f
n n

= =



2
2
2 2
2 2
0
f
n n

= =



So,
1
1 1

= n
and
2
2 2

= n

Here, we have,
37
2 2
1 2
1 2
K
n n

+ =
Put the values of n
1
and n
2
as above.

So,

K = +
2
2
2
2
1
1
2
1



K =
+


2 2 1 1


For, n
1
, put
1
1 1

n
=

We get
1
2 2 1 1 1
1
) (


K
n
+
=

Similarly for n
2
, put
2
2 2

n
=


We get

2
2 2 1 1 2
2
) (


K
n
+
=


38
2
nd
Method
For this 2
nd
method, here we have considered cost as r
1
and r
2
rather than
1
and
2
.

We have,

2 1 0
: = null hypothesis
2 1 1
: alternative hypothesis


We have,
r
1
= cost of observation for one sample in the sample size n
1

r
2
= cost of observation for one sample in the sample size n
2

To minimize total cost of study, we have,
2
1
1
2
r
r
n
n
=

In this square root rule, if the costs are not different, then n
1
= n
2
can be chosen.
The total cost of sampling is,
C = r
1
n
1
+ r
2
n
2
And we need to minimize,
2 1
1 1
n n
+ , subject to total cost C.
Then we have,

+
=
+
=
2 1 2
2
2 1 1
1
r r r
C
n
r r r
C
n
Minimum cost sample sizes.

These n
1
and n
2
values can be input into the equation,
39


2
2
2
1
2
1
2
2
2
1
n n
n


+
+
=

Also we have,


2
2
2
1
0 2 1


+

= d


As we know the value ofd and n, we can use these two values in O.C. curve
with 05 . 0 = .




If the calculated value is same as what we expected then we can say that
calculated n
1
and n
2
are

the correct smallest sample sizes with minimum cost of
experiment. On the other hand, if the calculated value of is not same as the
40
expected one then the slight adjustment to values of n
1
and n
2
can be done to
find the new sample sizes.





























41
c (iii)
Here, in the problem we have to minimize . That is we have to maximize the
power of the test ( 1 ). We can define power of the tests as the probability of
rejecting the null hypothesis H0, when the alternative hypothesis is true. It is the
probability of correctly rejecting a false null hypothesis.

We have,
) ( ) (
2
2
2
1
2
1
2 /
2
2
2
1
2
1
2 /
n n
z
n n
z




+

+
=
In order to minimize , we should make
2
2
2
1
2
1
n n


+
large, that is make
2
2
2
1
2
1
n n

+
as small as possible.
Using Lagrange multiplier minimize
2 2
1 2
1 2
n n

+

As we can consider
1 2
n n N + = as a constraint, we have,
( ) ( )
2 2
1 2
1 2 1 2
1 2
, , f n n n n N
n n

= + + +
Derivative functions f with respect to
1
n ,
2
n and , make them equal to 0,
hence,
2
1
2
1 1
0
f
n n

= =



2
2
2
2 2
0
f
n n

= =



Solving,
42

1
1
= n
And

2
2
= n


Now we have,
n
1
+ n
2
= N

So we get,

2 1
+ = N


To get n
1
we put
1 1
/ n =

We get,
1
1
1 2
n N


=
+

And to get n
2
we put
2 2
/ n =
2
2
1 2
n N


=
+


------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------

References:
[1] Montgomery D.C., Runger G.C. Applied Statistics and Probability for Engineers. John Wiley
and Sons, ISBN: 0-471-20454-4, 2003.
[2] http://www.microbiologybytes.com/maths/1011-20.html
[3] Anand J. Kulkarni, Ren V. Mayorga: The Design of the Spindle of the Cylindrical Grinding
Machine through Neural Networks and Genetic Algorithm. IICAI 2005: 3323-3336

S-ar putea să vă placă și