Sunteți pe pagina 1din 4

Appendix on Linear Regress in Excel

There are at least two ways to do regression fitting in Excel, There's the manual way and the built-in
automatic way. The reason not to be fully-reliant on the automatic way is that it's part of a "Data
Analysis" add-in (that, confusingly, shows up under "Tools" on the menu in versions of Excel prior to
2007, but is on the "Data" tab thereafter). Further, some Apple/Macintosh versions of Excel lack the
"Data Analysis" add-in completely.

Let yi be the absorbances you measured and xi be the concentrations of the solutions. If a working curve
is linear i.e. if Beer's Law holds, there's no interference, or the concentration of an interference is constant
including in the blank, then

y i  b  mx i  y i (10)

where b = extrapolated blank when xi = 0, m = slope of the working curve, the change in absorbance per
unit concentration, yi = the measurement error in yi. The mean of yi is presumed to be 0. We assume
that the error in xi is negligible or else independent of concentration so that the contribution from errors in
sample preparation can be lumped in with yi.

See Harris, Chapter 5, Section 1 (or analogous sections of Harvey), to see where the math comes from. It
turns out that
N N

y i  bN  m x i
(11a)
i1 i1

N N N
2
x y i i  b x i  m  x i (11b)
i1 i1 i1

where N is the number of independent data points. If you make replica measurements at a particular
concentration, each measurement is independent, so if you make triplicate measurements on each of 5
solutions, N = 3*5 = 15. What happened to the errors, yi? They average to 0, as does the product xiyi.

Finding N is easy – just count how many measurements you made. Then make a table, where column A
has all the concentrations and column B has the corresponding absorbance values. Total the entries in
column A to get  x i Total the entries in column B to get 
y i . In column C, find the square of each
entry in column A (for example, in cell C5, type =A5*A5). Add that column up to get x 2
i In column
D, find the product of xi and yi, e.g. D5 would read = A5*B5. Now you have all the sums, and you have
two equations with two unknowns. It's very handy to have as an intermediate quantity D defined as:

 N 2
N
D  N  x   x i 
2
i (12)
i1 i1 

Then b and m can be found by:

1
N N N N

 y i  x i2   x i y i  x i
i1 i1 i1 i1
b (13a)
D
N N N
N  xi yi   yi  xi
i1 i1 i1
m (13b)
D
In each case, the sums are quantities
0.5 you have already computed, so they can
be used directly. Always check that the
0.4
least squares fitted line goes through the
center of your data by using the m and b
0.3
you calculated to plot the fitted line. If
0.2
the line doesn't go through the data,
check to ensure that you did the
0.1 arithmetic correctly. Something else
you should always do with any fit is to
0.0 plot the residuals vs. the independent
100
50
variable, i.e. yi – yfit, where yi is the
experimentally determined value of y at
x10-3

0
-50
-100
xi, and yfit is the y value predicted by the
50 100 150 200 model at xi. Figure 4 shows an example
Concentration ( M) of a fit to a linear absorbance working
Figure 4 curve with the residuals plotted
separately at the bottom. If examination
of the residual plot reveals a trend, this is a good indication that the model (straight line in this example)
is not adequate to describe the experimental behavior.

How do the uncertainties in b and m determine the uncertainties in a concentration estimated from
measuring an absorbance, i.e. what's the uncertainty in x for a single measurement y? From Harris, we
need an intermediate quantity:

N
2
 y i  mx i  b
i1
sy  (14)
N 2

Then the standard deviation of the slope and intercept are given by,

N
sm  sy (15a)
D
1/ 2
 N 2
  xi 
sb  s y  i 1  (15b)
 D 
 
 

2
We now know the uncertainty in the slope, the intercept, and in an individual measurement of the
absorbance or analytical signal (that's sy). So if we now measure a signal y, we can use propagation of
error calculations to
compute the uncertainty
in the apparent
concentration x. Error
propagation is discussed
in section VII.

For the less adventurous


if you put your working
curve data in columns A
and B of an Excel
spreadsheet, then pop up
the Regression window,
you set up the fitting
problem as shown in Fig.
5.

Figure 5
The least squares
regression is done
automatically, the
residuals are computed -
slopes, intercepts,
residuals. After a little
moving around of graphs
and adjusting their size, it
looks like Fig. 6.

Excel spreadsheet terms


are defined below.

Multiple R The
correlation
coefficient i.e.
the degree to
which the model Figure 6
fits the data. +1
= perfect match. –1 = anti-correlated, i.e. as the independent variable increases, the dependent
variable decreases. 0 = uncorrelated i.e. there's no functional relationship between the
independent and dependent variables.
R Square The square of the correlation coefficient.

3
Adjusted R Square The correlation coefficient, scaled for the effect of non-infinite degrees of
2
freedom. Radj  1
1 R N 1 where p' is the number of parameters fit (2 in the case of a
2

N  p
straight line on a Cartesian plane)
Standard Error Related to standard deviation. The sum of the squares of the distance from the regressed
data to the fitted line.
Observations Number of data points, what we've called N up to now.

ANOVA Acronym for Analysis of Variance


Regression Description of the results of the analysis of variance
Residual How much of the raw data is NOT described by the model
Total Sum of Regression and Residual factors
Df Degrees of freedom – how many parameters can one legitimately extract from the data?
SS Sum of squares
MS Modified sum of squares i.e. sum of squares divided by degrees of freedom
N
2
m 2
 x i  x
i1
F The statistical significance of a non-zero slope. The bigger, the better: F  1
sy2
Significance F The statistical significance of the F statistic. The smaller, the better.
Coefficients Coefficients of the model (intercept and slope for the straight line)
Intercept Self-explanatory
X Variable 1 Slope
Standard Error Uncertainty in the coefficients
t Statistic Student's t for the hypothesis that the intercept is non-zero.
Upper/Lower Limits Confidence intervals for each computed value.
Residual Output A table giving the computed value for each of the dependent variables given the
input values of the dependent variables, and the residual i.e. observed - predicted

This is more than you really need. What you are looking for:

 Is there a pattern to the residuals, indicating that the model is inadequate? The residual graph
can give you a visual idea of this (if it's a smooth curve, rather than random, there's a
problem). r2 is a numerical means of judging. Anything less than 0.999 says there's some
other significant source of variance that slope and intercept don't account for.
 What are the uncertainties in the model parameters (slope and intercept in this case)?

S-ar putea să vă placă și