Documente Academic
Documente Profesional
Documente Cultură
net/publication/7170764
CITATIONS READS
45 169
3 authors:
Achim Kohler
Norwegian University of Life Sciences (NMBU)
187 PUBLICATIONS 3,086 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Belanoda - Multidisciplinary graduate and post-graduate education in big data analysis for life sciences View project
LipoFungi – Bioconversion of low-cost fat materials into high-value PUFA-Carotenoid-rich biomass View project
All content following this page was uploaded by Suresh N Thennadil on 28 November 2016.
* Corresponding Author
1
Merz Court
Chemical Engineering and Advanced Materials,
University of Newcastle upon Tyne,
Newcastle upon Tyne
NE1 7RU
United Kingdom
Phone: +44 (0) 191 222 5466
Fax: +44(0) 191 222 5748
Email: s.n.thennadil@ncl.ac.uk
Harald Martens1,2,3
1
The Norwegian Food Research Institute, 2CIGENE Center for Integrative Genetics
3
IKBM/Norwegian U. of Life Sciences, Norway.
Osloveien 1
N-1432 Ås, Norway
Phone: +47 64 97 0100
Email: harald.martens@matforsk.no
Achim Kohler
Matforsk/Norwegian Food Research Institute
Osloveien 1
N-1430 Ås, Norway
Phone: +47 64 97 02 40
Email: achimkohler@matforsk.no
1
Abstract
Light scattering effects pose a major problem in the estimation of chemical properties
of particulate systems such as blood, tissue and pharmaceutical solids. Recently,
Martens et al proposed an Extended Multiplicative Signal Correction (EMSC)
approach where light scattering effects were taken into account in an empirical
manner. It is possible to include causal, first-principles mathematical models based on
the physics of light scattering into the EMSC framework. This could lead to
significant improvements in the separation of absorption and scattering effects. A pre-
conditioning step prior to application of EMSC whereby a transformation based on
the physics of light scattering is used to convert the spectra into a form where the
absorption and scattering effects are separable (an underlying assumption of EMSC)
is proposed. Results indicate that the transformation followed by EMSC gives better
calibration models than the direct application of EMSC to the absorbance spectra.
Introduction
Light scattering effects pose a major problem in the estimation of chemical properties
spectra including those due to light scattering in an empirical manner. While light
scattering is strongly dependent on wavelength, these methods do not take this aspect
into account. Application of MSC in a piece-wise manner has been one of the few
2
approach where the wavelength dependence of light scattering effects were taken into
physics of light scattering into the EMSC framework. This could lead to significant
better calibration models. There are two levels in which the physics-based model scan
be incorporated into the EMSC approach. One is to modify the EMSC equation itself
model.6 The second, which can be considered as a higher level pre-conditioning step,
is to find a transformation based on the physics of light scattering, that would convert
the spectra into a form where the absorption and scattering effects are additive (an
underlying assumption of EMSC) and then apply the EMSC technique. This pre-
In this paper, a method for transforming spectra into a form suitable for the
subsequent application of EMSC is proposed using a simple Beer’s law type equation
using the transport theory based parameters. The derivation of the transformation is
given in the next section. The two-step pre-processing consisting of applying the
the direct application of EMSC on the absorbance spectra. Two forms of the EMSC
are considered: One which includes the 2nd order polynomial to describe wavelength
dependence of the scattering variations5 and the other which includes a logarithmic
3
Method for Transforming Spectra
particulate media can be completely described given 3 optical parameters, the bulk
absorption coefficient a(), the bulk scattering coefficient s() and the anisotropy
parameter g(). The bulk absorption coefficient is the sum of the contributions of the
individual species i.e. the sum of the concentration times the absorption cross-section
of each chemical species in the mixture. Using the transport equation, it is possible to
extract the optical parameters of a sample provided suitable sets of measurements (e.g.
diffuse reflectance, diffuse transmission and collimated transmission) are made on the
sample. 8 This approach has been extensively used in the medical diagnostics area
where bulk optical parameters are of interest. The complexity of both the
impractical for most situations where fast measurements and simple instrumentation
are required. Therefore it is desirable to develop methods for extracting the relevant
transmittance or transflectance.
A logical starting point would be to extract the ratio a s' where s' s (1 g) is
the reduced bulk scattering coefficient from the measurement. It is not feasible to
obtain this ratio using the exact transport equation. Therefore, an approximate
equation that would adequately fit the measurements has to be used. A plausible
4
where x () is the measurement such as reflectance or transmittance at wavelength ,
() 3 a a () s' () 0.5 (2)
The penetration depth is related to the average distance travelled by the photons
through the sample before reaching the detector. The form of Eq. 1, was suggested by
assume that the same form should apply equally well for transmittance or
transflectance measurements. By fitting Eq. 1 with simulated data for typical range of
values of the optical parameters for tissue in the NIR region, Jacques obtained a value
of C0 = 7.8 for diffuse reflectance measurements. However, it was pointed out that C0
would be dependent on the range of the optical parameters used for the fit. For the
0.5
x () exp C
a ( )
(3)
( ) ' ( )
a s
where C = C0/3.
0.5
a ( )
A() logx () C (4)
( ) ' ( )
a s
From Eq. 4, it is seen that even if the measurements could be accurately described
using a simplified expression such as Eq. 1, the strictly linear separation of absorption
5
and scattering which is assumed by the EMSC method is not satisfied. It is plausible
that transforming the spectra into a form that will enhance the performance of EMSC
( ) C1 A()2
u ( ) a (5)
s' () 1 C1 A()2
We now have a form where the scattering and absorption are separated albeit in log
z a bz chem dλ eλ 2 (7)
we see that Eq. 6 is of the form given by Eq. 7 with b=1 and
then,
This is of the same form as the EMSC equation used by Thennadil and Martin:6
6
z a bz chem d log λ (12)
In either case, note that after EMSC correction of the transformed spectra we get,
exponential of Eq. 13 should be used as the corrected spectra for the subsequent
One further point to note is that the introduction of spectral information of the pure
species as done in EMSC is not straightforward when using logu as is seen from Eq.
11 because of the log μ a term. Here it is assumed that this term can be approximated
by the linear combination of the log of the pure component spectra i.e.
log μ a log x c1 log k 1 c 2 log k 2 ... (15)
_
where x is the reference or mean spectrum and k1, k2 ... are the spectra of pure
components. While from the derivations, the correspondence between the derived
7
gluten-starch data of Martens et al.5 We examine the effect of applying the transform
in conjunction with the full EMSC equation which includes both the physical and
chemical effects. Both forms of the EMSC, given by Eq. 7 (referred henceforth as
EMSCWP) and Eq. 12 (referred henceforth as EMSCLP), were considered with and
without applying the transformation given by Eq. 5 and 6 to the spectra. The
performance of the resulting PLS calibration models are compared with those
obtained using the absorbance spectra without any transformation and those using
EMSC with only the physical wavelength dependent terms of Eq. 7 and 12 taken into
While in the last section, it was deduced that the log u transform (i.e. Eq. 5 followed
by Eq. 6) will lead to spectra which will satisfy the EMSC form, analysis was also
done with the u spectra (i.e. using only Eq. 5). This is mainly used as a check on the
combination of the log of the pure component spectra and the reference spectrum). If
the u spectra led to better PLS models then these assumptions have to be called into
question. On the other hand if the log(u) spectra lead to better models and thus
consistent with the derived theory, it would indicate that these assumptions are valid
The data set of Martens et al.5 consisted of 5 different mixtures of gluten and starch (0,
25, 50, 75, 100% gluten by weight) with 20 samples at each concentration, with the
transmission spectra being collected with different packing (firm or loose) and using
different cuvettes. For details of sample preparation and experimental design, the
8
reader is referred to the original paper. Samples with the same gluten/starch ratio will
be considered as replicates for PLS modelling and analysis purposes. PLS with cross-
validation (leave one sample i.e. 20 replicates out) was used to build the calibration
models for predicting the concentration of gluten. For pure spectral information k, the
difference between sample 1 (100% gluten) and sample 91 (100% starch) was used.
The reference spectrum was computed as the mean of the transformed spectra i.e. (u
or log u depending on the transformation used) of all the 100 samples when applying
EMSCLP without transforming the data, the reference was computed in the same way
as Martens et al. That is, the reference was taken as the average of the pure gluten and
starch spectra. When EMSCW and EMSCL were used, the reference was taken to be
Optimum value of C1
In order to transform the spectra using Eq. 5, the value of C1 has to be determined.
One way would be to choose the value of C1 along with the number of latent variables
(LV) using cross-validation. Since this dataset consists of spectra of a binary mixture,
if the chemical and light scattering variations were completely separated by the
transform, only one LV would be required to model the data. Therefore, in this study
the optimum value of C1 was chosen as that which leads to the lowest root mean
that the value of u(λ) should be positive for it to be physically realistic (since it is the
ratio of two positive quantities), from Eq. 5 it is seen that the value of C 1 should
1 C1A()2 0 (16)
9
Or
1
C1 (17)
A()2
From this equation, it is seen that the value of C1 should be less than the right-hand
side expression for the maximum absorbance value in the dataset since in this study,
C1 is taken to be constant across all wavelength ranges and samples. For the gluten-
starch dataset, it is found that C1 < 0.102. Thus the optimum value has to lie in the
range 0 < C1,opt < 0.102. It can expected that the value of C1,opt will depend on the
EMSC method used and also on how the extracted u, is used subsequently i.e.
whether the EMSC step is applied to u or log(u). Figure 1(a) shows the RMSECV for
log(u). For these cases, the reference spectrum was taken to be the mean of log(u) of
all the 100 samples. It is seen that the RMSECV curve exhibits a clear minimum for
both methods with C1,opt = 0.066 for EMSCWP and 0.072 for EMSCLP. It can be
seen that EMSCWP leads to lower error than EMSCLP for the range of C1
considered. However, when the optimal value for C1 is chosen, the difference in the
Figure 1(b) shows the RMSECV curves for the two methods when u is used. For
these cases, the reference spectrum was taken to be the mean of u of all the 100
samples. Here, the optimal values for C1 are 0.088 and 0.064 for EMSCWP and
although, it is evident from the figures that the minimum values of RMSECV
10
Figure 2(a) shows the RMSECV of a 1 LV PLS model as a function of C 1 when
EMSCW and EMSCL i.e. EMSC with only the physical effects taken into
consideration, are applied to log(u). The RMSECV profiles do not exhibit a clear
there is a slight monotonic decrease with decreasing C1. EMSCL produces lower
RMSECV values for all values of C1 compared to EMSCW regardless of what values
of C1 are chosen for the two methods. Since the RMSECV curve is flat for EMSCW,
the optimal value was chosen as 0.05. For EMSCL, a value of 0.07 was chosen since
from the figure, it is seen that the greatest rate of change in the RMSECV occurs in
the region of 0.1 to 0.07 after which the RMSECV values fall much slower as C1 is
reduced. Similar behaviour is exhibited in Figure 2b which shows the RMSECV vs.
C1 when the u spectra is used. The subsequent analysis uses these values of C1 for the
Figure 3 shows the 1 LV PLS model performance when the absorbance i.e. –log(1/T)
spectra is used without subjecting it to the pre-processing or the transform step. The
absorbance versus wavelength for the 100 samples is shown in figure 3(a) and figure
3(b) shows the same when plotted against the reference spectrum. It can be seen that
there is a significant overlap of samples with different concentrations due to the high
“within sample” variation i.e. the variation in the spectra of samples with the same
gluten concentration. Figure 3(c) shows the gluten concentration plotted against the
scores of the first LV. The large variance in the scores for samples with the same
concentrations is evident and this leads to large errors when a 1 LV model is used as
11
Comparing figure 4(a) and (b) with figure 3(a) and (b), it is seen that the application
of EMSCW i.e. EMSC with only the physical (quadratic wavelength dependent)
effects taken into account, significantly reduces “within sample” variance in the
reflected in the plot of gluten concentration against the scores of the first latent
variable, figure 4(c), where the “within sample” variance is greatly reduced. However,
The results of applying EMSCWP i.e. the inclusion of pure component information in
the correction scheme, to the –log(1/T) spectra is illustrated in Figure 5. The corrected
absorbance shown in figures 5(a) and 5(b) exhibit a greater extent of separation
gluten versus scores of the first latent variable as seen from Figure 5(c) and (d).
The results when EMSCWP is transformed using Eq. 5 i.e. converting the absorbance
to u is shown in Figure 6. Comparing with figure 5, we see that the “within sample”
variations (i.e. variations in samples with the same gluten concentration) in the
spectra. The residual plot still exhibits a slight curvature (figures 6f).
When the spectra are transformed using Eq. 5 and EMSCWP is applied to the log(u)
spectra (i.e. Eq. 6) and a PLS model built after transforming back to u, a linear
12
relationship between the gluten concentration and the scores of the first latent variable
is evident from figures 7(e) and (f). Thus using EMSCWP on log(u) instead on
between gluten concentration and the scores of the first latent variable. This result is
consistent with the logic presented in the derivation of the transform in section 2.
presented here includes the reduction of “within sample” variation through the EMSC
step and the linearization of the response curve of gluten concentration as a function
of the first LV by the transformation step. The above analysis was repeated with
EMSCL and EMSLP i.e. using the log wavelength relationship to model the physical
(light scattering) effects as given by Eq. 12. The effect on the spectra and first LV
scores were visually similar to the corresponding EMSCW and EMSCWP plots and
therefore are not shown here. However, they do differ in the magnitude of RMSECV
as will be seen in section 3(c) below where the discussion centres on the RMSECV
The RMSECV curves for PLS models where EMSCW is used with absorbance
spectra (A), u and log(u) are shown in figure 8(a) along with the curve using raw
absorbance spectra without the application of EMSCW. It is seen that there is very
little difference in the various methods when considering a 1 LV model. The “best
model” i.e. the one with the lowest RMSECV is obtained when using the log(u)
obtained by including the pure component information in the EMSC scheme, the
13
various combinations of Transform-EMSCWP methods was compared to this “best
EMSCW model” in figure 8(b). It is seen that when a 1 LV model is used, the
RMSECV. The lowest RMS errors for a 1LV model are obtained when EMSCWP is
used on the log(u) spectra. It is also interesting to note that when this transform is
used the minimum RMSECV is obtained with a model using only one latent variable.
In fact, it outperforms the other methods even when they include more than 1 LV in
their models.
The analysis was repeated using the EMSCL and EMSCLP variation of the EMSC
technique. When EMSCL is used (Figure 9a), there is an appreciable decrease in error
for a 1 LV model when compared to using the raw absorbance spectra without
applying the technique. There are slight differences when the transform is applied in
conjunction with EMSCL. Further, as in the case of EMSCW, the “best model” is
obtained using EMSCL with the log(u) spectra. Again, when EMSCLP is used the
best 1 LV model is obtained when the log(u) spectra is used though there is very little
Figure 10 shows the results from “best models” taken from figures 8 and 9. All these
models use the log(u) spectra followed by the EMSC step. It is seen that for a 1 LV
model, EMSCL gives lower error compared to EMSCW. However, when the pure
14
Conclusions
This study indicates that the proposed two-step Transform-EMSC method could lead
log(1/T) spectra used in this study is a very simple form with a parameter C1 which
was assumed to be independent of wavelength (or the optical properties). In the case
satisfactory. The methodology described here can, in principle, be used with more
sophisticated models of light propagation in place of Eq. 1 for the transformation step.
It was found that, in terms of reducing the sample-to-sample physical variations, the
application of EMSC with pure component information taken into account seems to
have the largest influence. The transform step reduces the non-linearities present in
the data and allows for the relevant information be well represented by a linear 1 LV
model.
Two forms of representing the wavelength dependent physical variation in the EMSC
form of wavelength dependence. Analysis suggests that the latter form performs better
when only physical effects are taken into account in the EMSC step. When pure
15
List of Figures
transformed to u.
transformed to u.
Figure 8. RMSECV curves for (a) EMSCW and (b) EMSCWP when A, u, and log u
are used.
16
Figure 1. RMSECV as a function of C1 for a 1 LV PLS model using EMSCWP and
EMSCLP. (a) With spectra transformed to log(u); (b) With spectra transformed to u.
17
Figure 2. RMSECV as a function of C1 for a 1 LV PLS model using EMSCW and
EMSCL. (a) With spectra transformed to log(u); (b) With spectra transformed to u.
18
Figure 3. Results when absorbance spectra without pre-processing is used.
19
Figure 4. Results for EMSCW applied to Absorbance spectra
20
Figure 5. Results for EMSCWP applied to absorbance spectra.
21
Figure 6. Results of EMSCWP applied to u.
22
Figure 7. Results for EMSCWP applied to log(u).
23
Figure 8. RMSECV curves for (a) EMSCW and (b) EMSCWP when A, u, and log u
are used.
24
Figure 9. Results for EMSCL and EMSCLP.
25
Figure 10. Comparison between the “best” Transform-EMSC models.
26
References
1
P. Geladi, D. MacDougall, and H. Martens, Appl. Spectrosc. 39, 491 (1985).
2
T. Isaksson and B. Kowalski, Appl. Spectrosc. 47, 702 (1993).
3
R. J. Barnes, M. S. Dhanoa, and S. J. Lister, Appl. Spectrosc. 43, 772 (1989).
4
M. Blanco, J. Coello, I. Montoliu, and M. A. Romero, Anal. Chim. Acta 2 434, 125
(2001).
5
H. Martens, J. P. Nielsen, and S. B. Engelsen, Anal. Chem. 75, 394 (2003).
6
S. N. Thennadil and E. B. Martin, “Empirical pre-processing methods and their
7
A. Ishimaru, Wave propagation and scattering in random media (IEEE Press, New
York, 1997).
8
T. L. Troy and S. N. Thennadil, J. Biomedical Optics, 6, 167 (2001).
9
S.L. Jacques, Biomedical optical instrumentation and laser-assisted Biotechnology,
Academic Publishers, Nov 10-22, 1995), eds. A.M. Verga and Scheggi.
27