Documente Academic
Documente Profesional
Documente Cultură
There are at least two ways to do regression fitting in Excel, There's the manual way and the built-in
automatic way. The reason not to be fully-reliant on the automatic way is that it's part of a "Data
Analysis" add-in (that, confusingly, shows up under "Tools" on the menu in versions of Excel prior to
2007, but is on the "Data" tab thereafter). Further, some Apple/Macintosh versions of Excel lack the
"Data Analysis" add-in completely.
Let yi be the absorbances you measured and xi be the concentrations of the solutions. If a working curve
is linear i.e. if Beer's Law holds, there's no interference, or the concentration of an interference is constant
including in the blank, then
y i b mx i y i (10)
where b = extrapolated blank when xi = 0, m = slope of the working curve, the change in absorbance per
unit concentration, yi = the measurement error in yi. The mean of yi is presumed to be 0. We assume
that the error in xi is negligible or else independent of concentration so that the contribution from errors in
sample preparation can be lumped in with yi.
See Harris, Chapter 5, Section 1 (or analogous sections of Harvey), to see where the math comes from. It
turns out that
N N
y i bN m x i
(11a)
i1 i1
N N N
2
x y i i b x i m x i (11b)
i1 i1 i1
where N is the number of independent data points. If you make replica measurements at a particular
concentration, each measurement is independent, so if you make triplicate measurements on each of 5
solutions, N = 3*5 = 15. What happened to the errors, yi? They average to 0, as does the product xiyi.
Finding N is easy – just count how many measurements you made. Then make a table, where column A
has all the concentrations and column B has the corresponding absorbance values. Total the entries in
column A to get x i Total the entries in column B to get
y i . In column C, find the square of each
entry in column A (for example, in cell C5, type =A5*A5). Add that column up to get x 2
i In column
D, find the product of xi and yi, e.g. D5 would read = A5*B5. Now you have all the sums, and you have
two equations with two unknowns. It's very handy to have as an intermediate quantity D defined as:
N 2
N
D N x x i
2
i (12)
i1 i1
1
N N N N
y i x i2 x i y i x i
i1 i1 i1 i1
b (13a)
D
N N N
N xi yi yi xi
i1 i1 i1
m (13b)
D
In each case, the sums are quantities
0.5 you have already computed, so they can
be used directly. Always check that the
0.4
least squares fitted line goes through the
center of your data by using the m and b
0.3
you calculated to plot the fitted line. If
0.2
the line doesn't go through the data,
check to ensure that you did the
0.1 arithmetic correctly. Something else
you should always do with any fit is to
0.0 plot the residuals vs. the independent
100
50
variable, i.e. yi – yfit, where yi is the
experimentally determined value of y at
x10-3
0
-50
-100
xi, and yfit is the y value predicted by the
50 100 150 200 model at xi. Figure 4 shows an example
Concentration ( M) of a fit to a linear absorbance working
Figure 4 curve with the residuals plotted
separately at the bottom. If examination
of the residual plot reveals a trend, this is a good indication that the model (straight line in this example)
is not adequate to describe the experimental behavior.
How do the uncertainties in b and m determine the uncertainties in a concentration estimated from
measuring an absorbance, i.e. what's the uncertainty in x for a single measurement y? From Harris, we
need an intermediate quantity:
N
2
y i mx i b
i1
sy (14)
N 2
Then the standard deviation of the slope and intercept are given by,
N
sm sy (15a)
D
1/ 2
N 2
xi
sb s y i 1 (15b)
D
2
We now know the uncertainty in the slope, the intercept, and in an individual measurement of the
absorbance or analytical signal (that's sy). So if we now measure a signal y, we can use propagation of
error calculations to
compute the uncertainty
in the apparent
concentration x. Error
propagation is discussed
in section VII.
Figure 5
The least squares
regression is done
automatically, the
residuals are computed -
slopes, intercepts,
residuals. After a little
moving around of graphs
and adjusting their size, it
looks like Fig. 6.
Multiple R The
correlation
coefficient i.e.
the degree to
which the model Figure 6
fits the data. +1
= perfect match. –1 = anti-correlated, i.e. as the independent variable increases, the dependent
variable decreases. 0 = uncorrelated i.e. there's no functional relationship between the
independent and dependent variables.
R Square The square of the correlation coefficient.
3
Adjusted R Square The correlation coefficient, scaled for the effect of non-infinite degrees of
2
freedom. Radj 1
1 R N 1 where p' is the number of parameters fit (2 in the case of a
2
N p
straight line on a Cartesian plane)
Standard Error Related to standard deviation. The sum of the squares of the distance from the regressed
data to the fitted line.
Observations Number of data points, what we've called N up to now.
This is more than you really need. What you are looking for:
Is there a pattern to the residuals, indicating that the model is inadequate? The residual graph
can give you a visual idea of this (if it's a smooth curve, rather than random, there's a
problem). r2 is a numerical means of judging. Anything less than 0.999 says there's some
other significant source of variance that slope and intercept don't account for.
What are the uncertainties in the model parameters (slope and intercept in this case)?