Physics-Based Multiplicative Scatter Correction Approaches For Improving The Performance of Calibration Models

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/7170764
Physics-Based Multiplicative Scatter Correction Approaches for Improving

the Performance of Calibration Models
Article in Applied Spectroscopy · April 2006

DOI: 10.1366/000370206776342535 · Source: PubMed
CITATIONS READS
45 169
3 authors:
Suresh N Thennadil Martin Høy

Charles Darwin University Norwegian University of Science and Technology
33 PUBLICATIONS 962 CITATIONS 49 PUBLICATIONS 1,238 CITATIONS
SEE PROFILE SEE PROFILE
Achim Kohler
Norwegian University of Life Sciences (NMBU)
187 PUBLICATIONS 3,086 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Belanoda - Multidisciplinary graduate and post-graduate education in big data analysis for life sciences View project
LipoFungi – Bioconversion of low-cost fat materials into high-value PUFA-Carotenoid-rich biomass View project
All content following this page was uploaded by Suresh N Thennadil on 28 November 2016.
The user has requested enhancement of the downloaded file.

Physics-based Multiplicative Scatter Correction Approaches for
Improving the Performance of Calibration Models
S. N. Thennadil*, H. Martens and A. Kohler
* Corresponding Author
1
Merz Court
Chemical Engineering and Advanced Materials,
University of Newcastle upon Tyne,
Newcastle upon Tyne
NE1 7RU
United Kingdom
Phone: +44 (0) 191 222 5466
Fax: +44(0) 191 222 5748
Email: s.n.thennadil@ncl.ac.uk
Harald Martens1,2,3
1
The Norwegian Food Research Institute, 2CIGENE Center for Integrative Genetics
3
IKBM/Norwegian U. of Life Sciences, Norway.
Osloveien 1
N-1432 Ås, Norway
Phone: +47 64 97 0100
Email: harald.martens@matforsk.no
Achim Kohler
Matforsk/Norwegian Food Research Institute
Osloveien 1
N-1430 Ås, Norway
Phone: +47 64 97 02 40
Email: achimkohler@matforsk.no
1
Abstract
Light scattering effects pose a major problem in the estimation of chemical properties
of particulate systems such as blood, tissue and pharmaceutical solids. Recently,
Martens et al proposed an Extended Multiplicative Signal Correction (EMSC)
approach where light scattering effects were taken into account in an empirical
manner. It is possible to include causal, first-principles mathematical models based on
the physics of light scattering into the EMSC framework. This could lead to
significant improvements in the separation of absorption and scattering effects. A pre-
conditioning step prior to application of EMSC whereby a transformation based on
the physics of light scattering is used to convert the spectra into a form where the
absorption and scattering effects are separable (an underlying assumption of EMSC)
is proposed. Results indicate that the transformation followed by EMSC gives better
calibration models than the direct application of EMSC to the absorbance spectra.
Keywords: Extended Multiplicative Scatter Correction, Pre-processing, NIR
Spectroscopy, Light Scattering.
Introduction
Light scattering effects pose a major problem in the estimation of chemical properties
of particulate systems such as blood, tissue and pharmaceutical solids. Various
empirical signal correction methods have been proposed to overcome this
problem.1,2,3,4,5 These methods attempt to remove “non-chemical” variations in the
spectra including those due to light scattering in an empirical manner. While light
scattering is strongly dependent on wavelength, these methods do not take this aspect
into account. Application of MSC in a piece-wise manner has been one of the few
attempts at accounting for wavelength dependence of the variations.2 Recently,
Martens et al.5 proposed an Extended Multiplicative Signal Correction (EMSC)
2
approach where the wavelength dependence of light scattering effects were taken into
account through a second order polynomial.
It is possible to include causal, first-principles mathematical models based on the
physics of light scattering into the EMSC framework. This could lead to significant
improvements in the separation of absorption and scattering effects thereby leading to
better calibration models. There are two levels in which the physics-based model scan
be incorporated into the EMSC approach. One is to modify the EMSC equation itself
by replacing the polynomial wavelength dependent terms with a first-principles based
model.6 The second, which can be considered as a higher level pre-conditioning step,
is to find a transformation based on the physics of light scattering, that would convert
the spectra into a form where the absorption and scattering effects are additive (an
underlying assumption of EMSC) and then apply the EMSC technique. This pre-
conditioning approach can be implemented by invoking theories of light propagation
such as the Kubelka-Munk theory or the more rigorous Transport theory.7
In this paper, a method for transforming spectra into a form suitable for the
subsequent application of EMSC is proposed using a simple Beer’s law type equation
using the transport theory based parameters. The derivation of the transformation is
given in the next section. The two-step pre-processing consisting of applying the
transformation followed by EMSC on the transformed spectra is then compared with
the direct application of EMSC on the absorbance spectra. Two forms of the EMSC
are considered: One which includes the 2nd order polynomial to describe wavelength
dependence of the scattering variations5 and the other which includes a logarithmic
function of wavelength to account for the light scattering variations.6
3
Method for Transforming Spectra
According to the transport theory,7 light propagation at wavelength  through a
particulate media can be completely described given 3 optical parameters, the bulk
absorption coefficient a(), the bulk scattering coefficient s() and the anisotropy
parameter g(). The bulk absorption coefficient is the sum of the contributions of the
individual species i.e. the sum of the concentration times the absorption cross-section
of each chemical species in the mixture. Using the transport equation, it is possible to
extract the optical parameters of a sample provided suitable sets of measurements (e.g.
diffuse reflectance, diffuse transmission and collimated transmission) are made on the
sample. 8 This approach has been extensively used in the medical diagnostics area
where bulk optical parameters are of interest. The complexity of both the
measurement and the subsequent inversion of the transport equation makes it
impractical for most situations where fast measurements and simple instrumentation
are required. Therefore it is desirable to develop methods for extracting the relevant
information through a simple (and single) measurement such as reflectance,
transmittance or transflectance.
A logical starting point would be to extract the ratio  a  s' where  s'   s (1  g) is
the reduced bulk scattering coefficient from the measurement. It is not feasible to
obtain this ratio using the exact transport equation. Therefore, an approximate
equation that would adequately fit the measurements has to be used. A plausible
equation would be a Beer’s law form:
x()  exp  C0 a ()() (1)
4
where x () is the measurement such as reflectance or transmittance at wavelength ,
C0 is a constant dependent on the measurement configuration and  is the penetration
depth given by:
 
()  3 a  a ()   s' () 0.5 (2)
The penetration depth is related to the average distance travelled by the photons
through the sample before reaching the detector. The form of Eq. 1, was suggested by
Jacques9 in the context of total diffuse reflectance measurements. It is reasonable to
assume that the same form should apply equally well for transmittance or
transflectance measurements. By fitting Eq. 1 with simulated data for typical range of
values of the optical parameters for tissue in the NIR region, Jacques obtained a value
of C0 = 7.8 for diffuse reflectance measurements. However, it was pointed out that C0
would be dependent on the range of the optical parameters used for the fit. For the
current study, C0 is treated as an unknown to be determined by cross-validation as
discussed in the next section.
Substituting in Eq. 2 in Eq. 1 we get,
  
0.5 
 
x ()  exp  C  
a ( )
(3)
   ( )   ' ( )  
  a s  
where C = C0/3.
Taking the logarithm of this expression leads to
0.5
  a ( ) 
A()   logx ()  C  (4)
  (  )   ' ( ) 
 a s 
From Eq. 4, it is seen that even if the measurements could be accurately described
using a simplified expression such as Eq. 1, the strictly linear separation of absorption
5
and scattering which is assumed by the EMSC method is not satisfied. It is plausible
that transforming the spectra into a form that will enhance the performance of EMSC
could be derived from Eq. 4. An obvious way is to extract the term:
 ( ) C1 A()2
u ( )  a  (5)
 s' () 1  C1 A()2
from Eq. 4 where C1 = C-2 . Taking the logarithm of Eq. 5 we get
log u()  log  a ()  log  s' () (6)
We now have a form where the scattering and absorption are separated albeit in log
scale. Comparing this with the EMSC equation of Martens et al:5
z  a  bz chem  dλ  eλ 2 (7)
we see that Eq. 6 is of the form given by Eq. 7 with b=1 and
 log  s' ()  a  d  e2

z chem ()  log  a () (8)
z()  log u ()
On the other hand, if  s' is of the form:6
 s' ()   (9)
then,
log  s' ()  log    log  (10)
Substituting (10) in (6) we get,
log u ()  log  a ()   '   ' log 

(11)
 '   log  and  '  
This is of the same form as the EMSC equation used by Thennadil and Martin:6
6
z  a  bz chem  d log λ (12)
with b = 1, z and zchem as in Eq. 8, a   ' and d   ' .
In either case, note that after EMSC correction of the transformed spectra we get,
z corr  log μ a (13)
While a is linear in the concentration of species, log a is not. Therefore, the
exponential of Eq. 13 should be used as the corrected spectra for the subsequent
model building step:
z 'corr  10 z corr  μ a (14)
One further point to note is that the introduction of spectral information of the pure
species as done in EMSC is not straightforward when using logu as is seen from Eq.
11 because of the log μ a term. Here it is assumed that this term can be approximated
by the linear combination of the log of the pure component spectra i.e.

log μ a  log x c1 log k 1  c 2 log k 2  ... (15)
_
where x is the reference or mean spectrum and k1, k2 ... are the spectra of pure
components. While from the derivations, the correspondence between the derived
equations to EMSC is obtained by letting b = 1, in this study, b was estimated as part
of the EMSC step since the derivation involves approximations.
Analysis and Results
In this section, we investigate the effectiveness of the two-step (Transformation –
EMSC) approach in improving the performance of calibration models using the
7
gluten-starch data of Martens et al.5 We examine the effect of applying the transform
in conjunction with the full EMSC equation which includes both the physical and
chemical effects. Both forms of the EMSC, given by Eq. 7 (referred henceforth as
EMSCWP) and Eq. 12 (referred henceforth as EMSCLP), were considered with and
without applying the transformation given by Eq. 5 and 6 to the spectra. The
performance of the resulting PLS calibration models are compared with those
obtained using the absorbance spectra without any transformation and those using
EMSC with only the physical wavelength dependent terms of Eq. 7 and 12 taken into
account (These will be referred to as EMSCW and EMSCL respectively).
While in the last section, it was deduced that the log u transform (i.e. Eq. 5 followed
by Eq. 6) will lead to spectra which will satisfy the EMSC form, analysis was also
done with the u spectra (i.e. using only Eq. 5). This is mainly used as a check on the
assumptions involved in using Eq.1 (i.e. C0 is independent of the optical properties of
the samples) and Eq. 15 ( log μ a can be adequately represented by a linear
combination of the log of the pure component spectra and the reference spectrum). If
the u spectra led to better PLS models then these assumptions have to be called into
question. On the other hand if the log(u) spectra lead to better models and thus
consistent with the derived theory, it would indicate that these assumptions are valid
for the dataset considered.
The data set of Martens et al.5 consisted of 5 different mixtures of gluten and starch (0,
25, 50, 75, 100% gluten by weight) with 20 samples at each concentration, with the
transmission spectra being collected with different packing (firm or loose) and using
different cuvettes. For details of sample preparation and experimental design, the
8
reader is referred to the original paper. Samples with the same gluten/starch ratio will
be considered as replicates for PLS modelling and analysis purposes. PLS with cross-
validation (leave one sample i.e. 20 replicates out) was used to build the calibration
models for predicting the concentration of gluten. For pure spectral information k, the
difference between sample 1 (100% gluten) and sample 91 (100% starch) was used.
The reference spectrum was computed as the mean of the transformed spectra i.e. (u
or log u depending on the transformation used) of all the 100 samples when applying
EMSC methods after transforming the spectra. When applying EMSCWP or
EMSCLP without transforming the data, the reference was computed in the same way
as Martens et al. That is, the reference was taken as the average of the pure gluten and
starch spectra. When EMSCW and EMSCL were used, the reference was taken to be
the mean absorbance spectrum of the 100 samples.
Optimum value of C1
In order to transform the spectra using Eq. 5, the value of C1 has to be determined.
One way would be to choose the value of C1 along with the number of latent variables
(LV) using cross-validation. Since this dataset consists of spectra of a binary mixture,
if the chemical and light scattering variations were completely separated by the
transform, only one LV would be required to model the data. Therefore, in this study
the optimum value of C1 was chosen as that which leads to the lowest root mean
square error of cross-validation (RMSECV) when a one LV model is used. Noting
that the value of u(λ) should be positive for it to be physically realistic (since it is the
ratio of two positive quantities), from Eq. 5 it is seen that the value of C 1 should
satisfy the condition:
1  C1A()2  0 (16)
9
Or
1
C1  (17)
A()2
From this equation, it is seen that the value of C1 should be less than the right-hand
side expression for the maximum absorbance value in the dataset since in this study,
C1 is taken to be constant across all wavelength ranges and samples. For the gluten-
starch dataset, it is found that C1 < 0.102. Thus the optimum value has to lie in the
range 0 < C1,opt < 0.102. It can expected that the value of C1,opt will depend on the
EMSC method used and also on how the extracted u, is used subsequently i.e.
whether the EMSC step is applied to u or log(u). Figure 1(a) shows the RMSECV for
a 1 LV PLS model as a function of C1 when EMSCWP and EMSCLP are applied to
log(u). For these cases, the reference spectrum was taken to be the mean of log(u) of
all the 100 samples. It is seen that the RMSECV curve exhibits a clear minimum for
both methods with C1,opt = 0.066 for EMSCWP and 0.072 for EMSCLP. It can be
seen that EMSCWP leads to lower error than EMSCLP for the range of C1
considered. However, when the optimal value for C1 is chosen, the difference in the
RMSECV’s between the two methods is very small.
Figure 1(b) shows the RMSECV curves for the two methods when u is used. For
these cases, the reference spectrum was taken to be the mean of u of all the 100
samples. Here, the optimal values for C1 are 0.088 and 0.064 for EMSCWP and
EMSCLP respectively. In this case, EMSCLP performs better than EMSCWP
although, it is evident from the figures that the minimum values of RMSECV
obtained by using u is larger than those obtained when log u is used.
10
Figure 2(a) shows the RMSECV of a 1 LV PLS model as a function of C 1 when
EMSCW and EMSCL i.e. EMSC with only the physical effects taken into
consideration, are applied to log(u). The RMSECV profiles do not exhibit a clear
minimum. RMSECV for EMSCW appears insensitive to C1 whereas for EMSCL
there is a slight monotonic decrease with decreasing C1. EMSCL produces lower
RMSECV values for all values of C1 compared to EMSCW regardless of what values
of C1 are chosen for the two methods. Since the RMSECV curve is flat for EMSCW,
the optimal value was chosen as 0.05. For EMSCL, a value of 0.07 was chosen since
from the figure, it is seen that the greatest rate of change in the RMSECV occurs in
the region of 0.1 to 0.07 after which the RMSECV values fall much slower as C1 is
reduced. Similar behaviour is exhibited in Figure 2b which shows the RMSECV vs.
C1 when the u spectra is used. The subsequent analysis uses these values of C1 for the
different Transform-EMSC approaches.
Effect of Transform-EMSC on spectra and PLS scores
Figure 3 shows the 1 LV PLS model performance when the absorbance i.e. –log(1/T)
spectra is used without subjecting it to the pre-processing or the transform step. The
absorbance versus wavelength for the 100 samples is shown in figure 3(a) and figure
3(b) shows the same when plotted against the reference spectrum. It can be seen that
there is a significant overlap of samples with different concentrations due to the high
“within sample” variation i.e. the variation in the spectra of samples with the same
gluten concentration. Figure 3(c) shows the gluten concentration plotted against the
scores of the first LV. The large variance in the scores for samples with the same
concentrations is evident and this leads to large errors when a 1 LV model is used as
seen in figure 3(d).
11
Comparing figure 4(a) and (b) with figure 3(a) and (b), it is seen that the application
of EMSCW i.e. EMSC with only the physical (quadratic wavelength dependent)
effects taken into account, significantly reduces “within sample” variance in the
spectra leading to a separation of samples with different concentrations. This is
reflected in the plot of gluten concentration against the scores of the first latent
variable, figure 4(c), where the “within sample” variance is greatly reduced. However,
a distinct curvature is evident, which results in a systematic distribution of errors as
seen from figure 4(d).
The results of applying EMSCWP i.e. the inclusion of pure component information in
the correction scheme, to the –log(1/T) spectra is illustrated in Figure 5. The corrected
absorbance shown in figures 5(a) and 5(b) exhibit a greater extent of separation
between samples of different concentrations. There is still some non-linearity in the
gluten versus scores of the first latent variable as seen from Figure 5(c) and (d).
The results when EMSCWP is transformed using Eq. 5 i.e. converting the absorbance
to u is shown in Figure 6. Comparing with figure 5, we see that the “within sample”
variations (i.e. variations in samples with the same gluten concentration) in the
corrected u spectra is higher compared to applying EMSCWP to the absorbance
spectra. The residual plot still exhibits a slight curvature (figures 6f).
When the spectra are transformed using Eq. 5 and EMSCWP is applied to the log(u)
spectra (i.e. Eq. 6) and a PLS model built after transforming back to u, a linear
12
relationship between the gluten concentration and the scores of the first latent variable
is evident from figures 7(e) and (f). Thus using EMSCWP on log(u) instead on
directly on u leads to lower “within sample” variation as well as a linear relationship
between gluten concentration and the scores of the first latent variable. This result is
consistent with the logic presented in the derivation of the transform in section 2.
From this analysis, it is seen that the two-step Transform-EMSCWP approach
presented here includes the reduction of “within sample” variation through the EMSC
step and the linearization of the response curve of gluten concentration as a function
of the first LV by the transformation step. The above analysis was repeated with
EMSCL and EMSLP i.e. using the log wavelength relationship to model the physical
(light scattering) effects as given by Eq. 12. The effect on the spectra and first LV
scores were visually similar to the corresponding EMSCW and EMSCWP plots and
therefore are not shown here. However, they do differ in the magnitude of RMSECV
as will be seen in section 3(c) below where the discussion centres on the RMSECV
profiles when different Transform-EMSC techniques are used.
Comparison of RMSECV profiles
The RMSECV curves for PLS models where EMSCW is used with absorbance
spectra (A), u and log(u) are shown in figure 8(a) along with the curve using raw
absorbance spectra without the application of EMSCW. It is seen that there is very
little difference in the various methods when considering a 1 LV model. The “best
model” i.e. the one with the lowest RMSECV is obtained when using the log(u)
transformed spectra with a 4 LV model. To evaluate the extent of improvement
obtained by including the pure component information in the EMSC scheme, the
13
various combinations of Transform-EMSCWP methods was compared to this “best
EMSCW model” in figure 8(b). It is seen that when a 1 LV model is used, the
inclusion of pure component information leads to an appreciable reduction in
RMSECV. The lowest RMS errors for a 1LV model are obtained when EMSCWP is
used on the log(u) spectra. It is also interesting to note that when this transform is
used the minimum RMSECV is obtained with a model using only one latent variable.
In fact, it outperforms the other methods even when they include more than 1 LV in
their models.
The analysis was repeated using the EMSCL and EMSCLP variation of the EMSC
technique. When EMSCL is used (Figure 9a), there is an appreciable decrease in error
for a 1 LV model when compared to using the raw absorbance spectra without
applying the technique. There are slight differences when the transform is applied in
conjunction with EMSCL. Further, as in the case of EMSCW, the “best model” is
obtained using EMSCL with the log(u) spectra. Again, when EMSCLP is used the
best 1 LV model is obtained when the log(u) spectra is used though there is very little
difference when the u spectra is used.
Figure 10 shows the results from “best models” taken from figures 8 and 9. All these
models use the log(u) spectra followed by the EMSC step. It is seen that for a 1 LV
model, EMSCL gives lower error compared to EMSCW. However, when the pure
component information is included their counterparts viz. EMSCWP and EMSCLP
have similar errors.
14
Conclusions
This study indicates that the proposed two-step Transform-EMSC method could lead
to improvements in PLS model performance. The physics-based transformation of the
log(1/T) spectra used in this study is a very simple form with a parameter C1 which
was assumed to be independent of wavelength (or the optical properties). In the case
of the gluten-starch data considered here, this approximation appears to be
satisfactory. The methodology described here can, in principle, be used with more
sophisticated models of light propagation in place of Eq. 1 for the transformation step.
It was found that, in terms of reducing the sample-to-sample physical variations, the
application of EMSC with pure component information taken into account seems to
have the largest influence. The transform step reduces the non-linearities present in
the data and allows for the relevant information be well represented by a linear 1 LV
model.
Two forms of representing the wavelength dependent physical variation in the EMSC
step was considered: a quadratic form of wavelength dependence and a logarithmic
form of wavelength dependence. Analysis suggests that the latter form performs better
when only physical effects are taken into account in the EMSC step. When pure
component chemical information is included, both forms give similar results.
15
List of Figures
Figure 1. RMSECV as a function of C1 for a 1 LV PLS model using EMSCWP and
EMSCLP. (a) With spectra transformed to log(u); (b) With spectra
transformed to u.
Figure 2. RMSECV as a function of C1 for a 1 LV PLS model using EMSCW and
EMSCL. (a) With spectra transformed to log(u); (b) With spectra
transformed to u.
Figure 3. Results when absorbance spectra without pre-processing is used.
Figure 4. Results for EMSCW applied to Absorbance spectra
Figure 5. Results for EMSCWP applied to absorbance spectra.
Figure 6. Results of EMSCWP applied to u.
Figure 7. Results for EMSCWP applied to log(u).
Figure 8. RMSECV curves for (a) EMSCW and (b) EMSCWP when A, u, and log u
are used.
Figure 9. Results for EMSCL and EMSCLP.
Figure 10. Comparison between the “best” Transform-EMSC models.
16
Figure 1. RMSECV as a function of C1 for a 1 LV PLS model using EMSCWP and
EMSCLP. (a) With spectra transformed to log(u); (b) With spectra transformed to u.
17
Figure 2. RMSECV as a function of C1 for a 1 LV PLS model using EMSCW and
EMSCL. (a) With spectra transformed to log(u); (b) With spectra transformed to u.
18
Figure 3. Results when absorbance spectra without pre-processing is used.
19
Figure 4. Results for EMSCW applied to Absorbance spectra
20
Figure 5. Results for EMSCWP applied to absorbance spectra.
21
Figure 6. Results of EMSCWP applied to u.
22
Figure 7. Results for EMSCWP applied to log(u).
23
Figure 8. RMSECV curves for (a) EMSCW and (b) EMSCWP when A, u, and log u
are used.
24
Figure 9. Results for EMSCL and EMSCLP.
25
Figure 10. Comparison between the “best” Transform-EMSC models.
26
References
1
P. Geladi, D. MacDougall, and H. Martens, Appl. Spectrosc. 39, 491 (1985).
2
T. Isaksson and B. Kowalski, Appl. Spectrosc. 47, 702 (1993).
3
R. J. Barnes, M. S. Dhanoa, and S. J. Lister, Appl. Spectrosc. 43, 772 (1989).
4
M. Blanco, J. Coello, I. Montoliu, and M. A. Romero, Anal. Chim. Acta 2 434, 125
(2001).
5
H. Martens, J. P. Nielsen, and S. B. Engelsen, Anal. Chem. 75, 394 (2003).
6
S. N. Thennadil and E. B. Martin, “Empirical pre-processing methods and their
impact on NIR calibrations. A simulation study”, J. Chemometrics, in press.
7
A. Ishimaru, Wave propagation and scattering in random media (IEEE Press, New
York, 1997).
8
T. L. Troy and S. N. Thennadil, J. Biomedical Optics, 6, 167 (2001).
9
S.L. Jacques, Biomedical optical instrumentation and laser-assisted Biotechnology,
Proceedings of the NATO Advanced Science Institute (Erice, Sicily, Kluwer
Academic Publishers, Nov 10-22, 1995), eds. A.M. Verga and Scheggi.
27
View publication stats

Physics-Based Multiplicative Scatter Correction Approaches For Improving The Performance of Calibration Models

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Physics-Based Multiplicative Scatter Correction Approaches For Improving The Performance of Calibration Models

Încărcat de

Drepturi de autor:

Formate disponibile

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Physics-Based Multiplicative Scatter Correction Approaches for Improving

Article in Applied Spectroscopy · April 2006

Suresh N Thennadil Martin Høy

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Improving the Performance of Calibration Models

S. N. Thennadil*, H. Martens and A. Kohler

Keywords: Extended Multiplicative Scatter Correction, Pre-processing, NIR

Spectroscopy, Light Scattering.

of particulate systems such as blood, tissue and pharmaceutical solids. Various

empirical signal correction methods have been proposed to overcome this

problem.1,2,3,4,5 These methods attempt to remove “non-chemical” variations in the

attempts at accounting for wavelength dependence of the variations.2 Recently,

Martens et al.5 proposed an Extended Multiplicative Signal Correction (EMSC)

account through a second order polynomial.

It is possible to include causal, first-principles mathematical models based on the

improvements in the separation of absorption and scattering effects thereby leading to

by replacing the polynomial wavelength dependent terms with a first-principles based

conditioning approach can be implemented by invoking theories of light propagation

such as the Kubelka-Munk theory or the more rigorous Transport theory.7

transformation followed by EMSC on the transformed spectra is then compared with

function of wavelength to account for the light scattering variations.6

According to the transport theory,7 light propagation at wavelength  through a

measurement and the subsequent inversion of the transport equation makes it

information through a simple (and single) measurement such as reflectance,

equation would be a Beer’s law form:

x()  exp  C0 a ()() (1)

C0 is a constant dependent on the measurement configuration and  is the penetration

depth given by:

Jacques9 in the context of total diffuse reflectance measurements. It is reasonable to

current study, C0 is treated as an unknown to be determined by cross-validation as

discussed in the next section.

Substituting in Eq. 2 in Eq. 1 we get,

Taking the logarithm of this expression leads to

could be derived from Eq. 4. An obvious way is to extract the term:

from Eq. 4 where C1 = C-2 . Taking the logarithm of Eq. 5 we get

log u()  log  a ()  log  s' () (6)

scale. Comparing this with the EMSC equation of Martens et al:5

 log  s' ()  a  d  e2

On the other hand, if  s' is of the form:6

 s' ()   (9)

log  s' ()  log    log  (10)

Substituting (10) in (6) we get,

log u ()  log  a ()   '   ' log 

with b = 1, z and zchem as in Eq. 8, a   ' and d   ' .

z corr  log μ a (13)

While a is linear in the concentration of species, log a is not. Therefore, the

model building step:

z 'corr  10 z corr  μ a (14)

equations to EMSC is obtained by letting b = 1, in this study, b was estimated as part

of the EMSC step since the derivation involves approximations.

Analysis and Results

In this section, we investigate the effectiveness of the two-step (Transformation –

EMSC) approach in improving the performance of calibration models using the

account (These will be referred to as EMSCW and EMSCL respectively).

assumptions involved in using Eq.1 (i.e. C0 is independent of the optical properties of

the samples) and Eq. 15 ( log μ a can be adequately represented by a linear

for the dataset considered.

EMSC methods after transforming the spectra. When applying EMSCWP or

the mean absorbance spectrum of the 100 samples.

square error of cross-validation (RMSECV) when a one LV model is used. Noting

satisfy the condition: