Statistics For Analytical Chemistry

|

Recommended textbook:
³Statistics for Analytical Chemistry´ J.C. Miller and J.N. Miller,
Second Edition, 1992, Ellis Horwood Limited
³Fundamentals of Analytical Chemistry´
Skoog, West and Holler, 7th Ed., 1996
(Saunders College Publishing)
!!

O
analysis for quality control, and ³reverse engineering´
(i.e. finding out what your competitors are doing).
v familiar to those who attended the second year
³Environmental Chemistry´ modules. A very wide range of problems and
types of analyte
dealing with many problems from first two.
O of great interest to many of my

colleagues. I will not be dealing with this type of problem.
"

#
Select sample
Extract analyte(s) from matrix
Separate analytes
Detect, identify and

quantify analytes
Determine reliability and

significance of results
v

Impossible to Y Yerrors.
How reliable are our data?
Data of
quality are useless!
Carry out replicate measurements

Analyse accurately known standards
Perform statistical tests on data

Defined as follows:
o ð
i=1
ð =
Where ð A individual values of ðand A number of replicate

measurements

he middle result when data are arranged in order of size (for even
numbers the mean of middle two). Median can be preferred when
there is an ³ Y´ - one reading very different from rest. Median
less affected by outlier than is mean.

$
%
$
%
Results of 6 determinations of the Fe(III) content of a solution, known to
contain 20 ppm:
: he value is 19.78 ppm (i.e. ) - the value is
#
Relates to Y of results..

How similar are values obtained in exactly the same way?
Useful for measuring this:

Î :
ð ð

Measurement of agreement between experimental mean and

true value (which may not be known!).
Measures of accuracy:
vð ð(where ð A true or accepted value)
ð ð
v 100%
ð

(latter is more useful in practice)

& $

%
$! %
Low accuracy, low precision Low accuracy, high precision
High accuracy, low precision High accuracy, high precision

$

%
$! %
V V
V V
enzyl isothiourea
hydrochloride
V

Analyst 4: imprecise, inaccurate
Analyst 3: precise, inaccurate Nicotinic acid
Analyst 2: imprecise, accurate
Analyst 1: precise, accurate
ypes of Error in Experimental
Data
hree types:
(1) ! v
Data scattered approx. symmetrically about a mean value.
Affects precision - dealt with statistically (see later).
(2) ! v

Several possible sources - later. Readings all too high
or too low. Affects accuracy.
(3) " v
Usually obvious - give ³ Y´ readings.
Detectable by carrying out sufficient replicate
measurements.

v
O v
Need frequent calibration - both for apparatus such as
volumetric flasks, burettes etc., but also for electronic
devices such as spectrometers.
# v
Due to inadequacies in physical or chemical behaviour
of reagents or reactions (e.g. slow or incomplete reactions)
Example from earlier overhead - nicotinic acid does not
react completely under normal Kjeldahl conditions for
nitrogen determination.
$
v
e.g. insensitivity to colour changes; tendency to estimate
scale readings to improve precision; preconceived idea of
³true´ value.

can be
(e.g. error in burette reading -
less important for larger values of reading) or
(e.g. presence of given proportion of
interfering impurity in sample; equally significant
for all values of measurement)
# by careful recalibration and good
maintenance of equipment.
# by care and self-discipline
# - most difficult. ³ rue´ value may not be known.

hree approaches to minimise:
analysis of certified standards
use 2 or more independent methods
analysis of blanks

'
v
here are always a large number of small, random errors
in making any measurement.
hese can be small changes in temperature or pressure;

random responses of electronic detectors (³noise´) etc.
Suppose there are 4 small random errors possible.

Assume all are equally likely, and that each causes an error
of U in the reading.
Possible combinations of errors are shown on the next slide:

'
v
v % & '
()()()() (*) +, - ..,
)()()() () * *+, - . .

())()()
()())()
()()())
))()() . , ,+, - .$

)())()
)()())
()))()
())())
()()))
()))) ) * *+, - . .

)()))
))())
)))()
)))) *) +, - ..,
he next overhead shows this in graphical form

' ( )

'
v
4 random uncertainties 10 random uncertainties
his is a
A very large number of " or
random uncertainties
curve.
Symmetrical about
the mean.
' !
)

*+#!
% /0 % /0 % /0
1 9.988 18 9.975 35 9.976

2 9.973 19 9.980 36 9.990
3 9.986 20 9.994 37 9.988
4 9.980 21 9.992 38 9.971
5 9.975 22 9.984 39 9.986
6 9.982 23 9.981 40 9.978
7 9.986 24 9.987 41 9.986
8 9.982 25 9.978 42 9.982
9 9.981 26 9.983 43 9.977
10 9.990 27 9.982 44 9.977
11 9.980 28 9.991 45 9.986
12 9.989 29 9.981 46 9.978
13 9.978 30 9.969 47 9.983
14 9.971 31 9.985 48 9.980
15 9.982 32 9.977 49 9.983
16 9.983 33 9.976 50 9.979
17 9.988 34 9.983
Mean volume 9.982 ml Median volume 9.982 ml

Spread 0.025 ml Standard deviation 0.0056 ml

!

A histogram of experimental results
A Gaussian curve with the same mean value, the same precision (see later)
and the same area under the curve as for the histogram.
#|v A finite number of observations
#,#-| , A total (infinite) number of observations
Properties of Gaussian curve defined in terms of population.
hen see where modifications needed for small samples of data
# "
6! : defined as earlier (N ] ). In absence of systematic error,

6 is the .
(maximum on Gaussian curve).
Remember, ( ð ) defined for small values of N.
(Sample mean ë population mean when N 20)

Î ±! - defined on next overhead
± : measure of ! of a population of data,
given by:
o ( ð )2
1
±
Where 6 A population mean; is very large.
he equation for a Gaussian curve is defined in terms of 6 and ±, as follows:
(ð ) 2 / 2± 2
Y

± 2
wo Gaussian curves with two different
standard deviations, ±A and ± (A2±A)
General Gaussian curve plotted in

units of z, where
z A (x - 6)/±
i.e. deviation from the mean of a
datum in units of standard
deviation. Plot can be used for
data with given value of mean,
and standard deviation.
"
From equation above, and illustrated by the previous curves,

68.3 of the data lie within ± of the mean (6), i.e. 68.3 of
the area under the curve lies between ± of 6.
Similarly, 95.5 of the area lies between ±, and 99.7

between ±.
here are 68.3 chances in 100 that for a single datum the
random error in the measurement will not exceed ±.
he chances are 95.5 in 100 that the error will not exceed ±.

!

) .
/
he equation for ± must be modified for small samples of data, i.e. small
o ( ð ð)2
1

1
wo differences cf. to equation for ±:
1. Use sample mean instead of population mean.
2. Use , - 1, instead of .

Reason is that in working out the mean, the sum of the
differences from the mean must be zero. If - 1 values are
known, the last value is defined. hus only - 1 degrees
of freedom. For large values of , used in calculating
±, and - 1 are effectively equal.

. v !

( ð ) 2
2 1
( ð )
1

1
0 NEVER round off figures before the end of the calculation
Reproducibility of a method for determining
Î the of selenium in foods. 9 measurements
were made on a single batch of brown rice.
Sample Selenium content (6g/g) (xI) xi2
1 0.07 0.0049
2 0.07 0.0049
3 0.08 0.0064
4 0.07 0.0049
5 0.07 0.0049
6 0.08 0.0064
7 0.08 0.0064
8 0.09 0.0081
9 0.08 0.0064
xi A 0.69 xi2A 0.0533

Mean A xi/NA 0.0776g/g (xi)2/N A 0.4761/9 A 0.0529
0.0533 0.0529
Standard deviation: 0.00707106 0.007
9 1
Coefficient of variance A 9.2 Concentration A 0.077 ± 0.007 6g/g

v

he standard deviation relates to the probable error in a Ymeasurement.

If we take a series of measurements, the probable error of the mean is less than
the probable error of any one measurement.
he , is defined as follows:

# )
o achieve a value of s which is a good approximation to ±, i.e. 20,

it is sometimes necessary to data from a number of sets of measurements
(all taken in the same way).
Suppose that there are small sets of data, comprising 1, 2,«.t measurements.
he equation for the resultant sample standard deviation is:
1 2 3
( ð ð1 ) 2 ( ð ð2 ) 2 ( ð ð3 ) 2 ....
1 1 1
Y
1 2 3 ......
( 0 one degree of freedom is lost for each set of data)

Î Analysis of 6 bottles of wine
for residual sugar.
Bo t t l e ugar w o . o f o b s. e ia t io n s f r o ean
. . . .
. . 6 . . . 6
. . . . . .
.6 . . . 6 .
. . . .
6 . 6 . 6 . . .

( . ) ( . ) ( . ) .
. .
and similarly or all .

n o( ð ð)
s n
. .
. . 01326
.
. . Y 0.088%
. . 23 6
. .
. .
To a l .
wo alternative methods for measuring the precision of a set of results:
/O%v his is the square of the standard deviation:

( ð2 ð)2
2 1
1v&&OOv% 1& /O%v /!

v O/v %ÎÎ Îv/O O1%!
Divide the standard deviation by the mean value and express as a percentage:

á ( ) 100
ð
â

How can we relate the observed mean value ( ð ) to the true mean (6)?
he latter can never be known exactly.
he range of uncertainty depends how closely s corresponds to ±.
We can calculate the limits (above and below) around ð that 6 must lie,
with a given degree of probability.

Define some terms:
,')vv|
interval around the mean that probably contains 6.
,')vv v'1|
the magnitude of the confidence limits
,')vv|v1v|
fixes the level of probability that the mean within the confidence limits
vð Y YK First assume that the known s is a good

approximation to ±.

" 2 - 6
50 of area lies between 0.67±

80 ³ 1.29±
90 ³ 1.64±
95 ³ 1.96±
99 ³ 2.58±
What this means, for example, is that 80 times out of 100 the YY will lie
between 1.29± of any measurement we make.
hus, at a YY YY of 80 , the YY are 1.29±
For a single measurement: CL for 6 A x z± (values of z on next overhead)
For the sample mean of N measurements ( ð ), the equivalent expression is:
or ð ±
/

0 3 4
50 0.67
68 1.0
80 1.29
90 1.64
95 1.96
96 2.00
99 2.58
99.7 3.00
99.9 3.29
0 these figures assume that an excellent approximation

to the real standard deviation is known.
|& 2&
Atomic absorption analysis for copper concentration in aircraft engine oil gave a value
of 8.53 6g Cu/ml. Pooled results of many analyses showed s ] ± A 0.32 6g Cu/ml.
Calculate 90 and 99 confidence limits if the above result were based on (a) 1, (b) 4,
(c) 16 measurements.
(a) (b)
(164
. )(0.32) (164
. )(0.32)
90 CL 8.53 8.53 0.52 6g / ml 90 CL 8.53 8.53 0.266g / ml
1 4
i.e. 8.5 0.56g / ml i.e. 8.5 0.36g / ml
( 2.58)( 0.32) (2.58)( 0.32)

99 CL 8.53 8.53 0.836g / ml 99 CL 8.53 8.53 0.416g / ml
1 4
i.e. 8.5 0.86g / ml i.e. 8.5 0.4 6g / ml
(164
. )( 0.32)
90 CL 8.53 8.53 0.136g / ml
16
i.e. 8.5 0.16g / ml
(c)
(2.58)(0.32)
99 CL 8.53 8.53 0.216g / ml
16
i.e. 8.5 0.2 6g / ml
If we have no information on ±, and only have a value for s -
the confidence interval is larger,
i.e. there is a greater uncertainty.
Instead of , it is necessary to use the parameter , defined as follows:
A (ð- 6)/
i.e. just like , but using s instead of ±.
y analogy we have: CL for 6 ð

(where ð A sample mean for measurements)
he calculated values of are given on the next overhead

1
.
. !

Î .3 .3 3 3

%!
1 3.08 6.31 12.7 63.7
2 1.89 2.92 4.30 9.92
3 1.64 2.35 3.18 5.84
4 1.53 2.13 2.78 4.60
5 1.48 2.02 2.57 4.03
6 1.44 1.94 2.45 3.71
7 1.42 1.90 2.36 3.50
8 1.40 1.86 2.31 3.36
9 1.38 1.83 2.26 3.25
19 1.33 1.73 2.10 2.88
59 1.30 1.67 2.00 2.66
1.29 1.64 1.96 2.58
0 (1) As (N-1) ] , so t ] z

(2) For all values of (N-1) < , t > z, I.e. greater uncertainty
|& 2&
Analysis of an insecticide gave the following values for of the chemical lindane:
7.47, 6.98, 7.27. Calculate the CL for the mean value at the 90 confidence level.
2
i i
7 .4 7 .8 0 0 9
6 .9 8 4 8 .7 2 0 4
7 .2 7 2 .8 2 9
i A 21.72 i2 A 1 7.3742

o i

2172
.
7.24
3
( o ð ) 2 . )2
oð 2
157.3742
(2172
3 90% ð 7.24
(2.92)(0.25)
3
1 2
0.246 0.25% 7.24 0.42%
(164
. )( 0.28)
If repeated analyses showed that s ]± A 0.28 : 90 ð ± 7.24
3
7.24 0.27%
5
Carry out measurements on an accurately known standard.

Experimental value is different from the true value.
Is the difference due to a systematic error (bias) in the method - or simply to random error?
Assume that there is bias

(%) 56
1 5vO!,
and calculate the probability
that the experimental error
is due to random errors.
Figure shows (A) the curve for

the true value (6A A 6t) and
() the experimental curve (6)
ias A 6- 6A A 6 - xt.
est or bias by comparing ð ð ith the

di erence caused by random error
Remember confidence limit for 6 (assumed to be xt, i.e. Y )

is given by:

CL for 6 ð

at desired confidence level, random
errors can lead to:

ð ð

if ð ð , then at the desired

confidence level bias (systematic error)
is likely (and vice versa).
)
v

A standard material known to contain
38.9 Hg was analysed by ð 37.8% ð ð 11%
.
atomic absorption spectroscopy. o ð 113.4 o ð2 4208.30
he results were 38.9 , 37.4
4208.30 (113.4) 2 3
and 37.1 . At the 95 confidence level, 0.943%
2
is there any evidence for
a systematic error in the method?
Assume null hypothesis (no bias). Only reject this if
ð ð
ut t (from able) A 4.30, s (calc. above) A 0.943 and N A 3
4.30 0.943 3 2.342

ð ð
herefore the null hypothesis is maintained, and there is

evidence for systematic error at the 95 confidence level.
&

3
Suppose two samples are analysed under identical conditions.

Sample 1 ] ð1 rom 1 replicate analyses
Sample 2 ] ð 2 rom 2 replicate analyses
Are these significantly different?

Using definition of pooled standard deviation, the equation on the last
overhead can be re-arranged:
1 2
ð1 ð2 Y
1 2
Only if the difference between the two samples is greater than the term on
the right-hand side can we assume a real difference between the samples.

& &
wo different methods for the analysis of boron in plant samples

gave the following results (6g/g):
(spectrophotometry)
(fluorimetry)
Each based on 5 replicate measurements.
At the 99 confidence level, are the mean values significantly
different?
Calculate spooled A 0.267. here are 8 degrees of freedom,
therefore ( able) t A 3.36 (99 level).
Level for rejecting null hypothesis is
1 2 1 2 - i. e . ( 3.3 6 )( 0 .2 6 7 ) 1 0 2 5
i.e. ± 0.5674, or ±0.57 6g/g.
u t ð1 ð 2 2 8 .0 2 6 .2 5 1 .7 5 6 g / g
i. e . ð 1 ð 2 Y 1 2 1 2
herefore, at this confidence level, there a significant

difference, and there must be a systematic error in at least
one of the methods of analysis.
Î " v
A set of results may contain an outlying result

- out of line with the others.
Should it be retained or rejected?
here is no universal criterion for deciding this.
One rule that can give guidance is the 6 .
Csi er a set f reslts
e arameter Qex is efie as flls:
6Yð ð ð
ere ð esti a le res lt

ð earest ei r
s rea f e tire set
Qexp is then compared to a set of values Qcrit:
Qcrit (reject if Qexpt > Qcrit)
No. of observations 90 95 99 confidencelevel
3 0.941 0.970 0.994

4 0.765 0.829 0.926
5 0.642 0.710 0.821
6 0.560 0.625 0.740
7 0.507 0.568 0.680
8 0.468 0.526 0.634
9 0.437 0.493 0.598
10 0.412 0.466 0.568
Rejection of outlier recommended if Qexp > Qcrit for the desired confidence level.
:1. he higher the confidence level, the less likely is
rejection to be recommended.
2. Rejection of outliers can have a marked effect on mean
and standard deviation, esp. when there are only a few
data points. O
Y.
3. If outliers are to be retained, it is often better to report
the Y value rather than the Y.
he following values were obtained for
6 ' 4 the concentration of nitrite ions in a sample
, of river water: 0.403, 0.410, 0.401, 0.380 mg/l.
Should the last reading be rejected?
6 e x p 0 .3 8 0 0 .4 0 1 ( 0 . 4 1 0 0 .3 8 0 ) 0 .7
ut Qcrit A 0.829 (at 95 level) for 4 values
herefore, Qexp < Qcrit, and we cannot reject the suspect value.
Suppose 3 further measurements taken, giving total values of:
0.403, 0.410, 0.401, 0.380, 0.400, 0.413, 0.411 mg/l. Should
0.380 still be retained?
6 e x p 0 .3 8 0 0 .4 0 0 ( 0 .4 1 3 0 .3 8 0 ) 0 .6 0 6
ut Qcrit A 0.568 (at 95 level) for 7 values
herefore, Qexp > Qcrit, and rejection of 0.380 is recommended.
ut note that 5 times in 100 it will be wrong to reject this suspect value!
Also note that if 0.380 is retained, s A 0.011 mg/l, but if it is rejected,
s A 0.0056 mg/l, i.e. precision appears to be twice as good, just by
rejecting one value.
,

!
.
!
5 '

No problem sample representative.

ake a number of small samples at random from throughout the bulk - this will
give a suitable representative sample.

ake small samples from each homogeneous region and
mix these in the same proportions as between each
region and the whole.
If it is suspected, but not certain, that a bulk material is heterogeneous, then

it is necessary to grind the sample to a fine powder, and mix this very
thoroughly before taking random samples from the bulk.
' Y Y Y Y Y Y

YY Y YY Y Y
Y

May be many analytes present - separation - see later.
May be small amounts of analyte(s) in bulk material.

Need to concentrate these before analysis.e.g. heavy metals in
animal tissue, additives in polymers, herbicide residues in flour etc. etc.
May be helpful to concentrate complex mixtures selectively.
Most general type of pre-treatment: v O1%

Classical extraction method is: 15v v O1%
(named after developer).

Apparatus
Sample in porous
thimble.
Exhaustive reflux for
up to 1 - 2 days.
Solution of analyte(s)
in volatile solvent
(e.g. CH2Cl2, CHCl3 etc.)
Evaporate to dryness or
suitable concentration,
for separation/analysis.

Statistics For Analytical Chemistry

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Statistics For Analytical Chemistry

Încărcat de

Drepturi de autor:

Formate disponibile

|  

      dealing with many problems from first two.

    O       of great interest to many of my

Detect, identify and

Determine reliability and

Carry out replicate measurements

Where ð A individual values of ðand A number of replicate

Relates to Y   of results..

Useful for measuring this:

Measurement of agreement between experimental mean and

    vð  ð(where ð A true or accepted value)

Low accuracy, low precision Low accuracy, high precision

High accuracy, low precision High accuracy, high precision

(2)     ! v 

#       by care and self-discipline

#    - most difficult. ³ rue´ value may not be known.

hese can be small changes in temperature or pressure;

Suppose there are 4 small random errors possible.

 v  %    & ' 

()()()() (*)  +, - ..,

)()()() () * *+, - . .

))()() . , ,+, - .$

()))) ) * *+, - . .

)))) *)  +, - ..,

he next overhead shows this in graphical form

4 random uncertainties 10 random uncertainties

1 9.988 18 9.975 35 9.976

Mean volume 9.982 ml Median volume 9.982 ml

 A histogram of experimental results

#     "  

   6! : defined as earlier (N ] ). In absence of systematic error,

Where 6 A population mean; is very large.

he equation for a Gaussian curve is defined in terms of 6 and ±, as follows:

General Gaussian curve plotted in

From equation above, and illustrated by the previous curves,

Similarly, 95.5 of the area lies between ±, and 99.7

1. Use sample mean instead of population mean.

2. Use    , - 1, instead of .

xi A 0.69 xi2A 0.0533

he standard deviation relates to the probable error in a  Ymeasurement.

he         , is defined as follows:

o achieve a value of s which is a good approximation to ±, i.e. 20,

( 0 one degree of freedom is lost for each set of data)

and similarly or all  .

/O%v his is the square of the standard deviation:

1v&&OOv% 1& /O%v /!

he latter can never be known exactly.

he range of uncertainty depends how closely s corresponds to ±.

with a given degree of probability.

vð Y Y K First assume that the known s is a good

50 of area lies between 0.67±

hus, at a  YY YY of 80 , the  YY  are 1.29±

For a single measurement: CL for 6 A x z± (values of z on next overhead)

For the sample mean of N measurements ( ð ), the equivalent expression is:

 0 these figures assume that an excellent approximation

( 2.58)( 0.32) (2.58)( 0.32)

i.e. just like , but using s instead of ±.

y analogy we have: CL for 6  ð 

he calculated values of  are given on the next overhead

Î     .3 .3  3 3

 0 (1) As (N-1) ] , so t ] z

Carry out measurements on an accurately known standard.

Assume that there is bias

|

dealing with many problems from first two.

O of great interest to many of my

Carry out replicate measurements

Where ð A individual values of ðand A number of replicate

Relates to Y of results..

vð ð(where ð A true or accepted value)

(2) ! v

# by care and self-discipline

# - most difficult. ³ rue´ value may not be known.

v % & '

()()()() (*) +, - ..,

)()()() () * *+, - . .

))()() . , ,+, - .$

()))) ) * *+, - . .

)))) *) +, - ..,

A histogram of experimental results

# "

6! : defined as earlier (N ] ). In absence of systematic error,

2. Use , - 1, instead of .

he standard deviation relates to the probable error in a Ymeasurement.

he , is defined as follows:

( 0 one degree of freedom is lost for each set of data)

and similarly or all .

/O%v his is the square of the standard deviation:

1v&&OOv% 1& /O%v /!

vð Y YK First assume that the known s is a good

hus, at a YY YY of 80 , the YY are 1.29±

0 these figures assume that an excellent approximation

y analogy we have: CL for 6 ð

he calculated values of are given on the next overhead

Î .3 .3 3 3

0 (1) As (N-1) ] , so t ] z

est or bias by comparing ð ð ith the

Remember confidence limit for 6 (assumed to be xt, i.e. Y )

4.30 0.943 3 2.342

Csi er a set f reslts

e arameter Qex is efie as flls:

ere ð esti a le res lt

5 '

Most general type of pre-treatment: v O1%