Documente Academic
Documente Profesional
Documente Cultură
discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/303687650
CITATIONS READS
3 162
4 authors, including:
Russell Jones
Abt Environmental Research, Boulder, Colora…
32 PUBLICATIONS 509 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Spatial study of Lung, Liver, Colon and Cervical Cancer mortality in the American West, Particularly
Utah View project
All content following this page was uploaded by Pierre Goovaerts on 07 June 2016.
Research article
a r t i c l e i n f o a b s t r a c t
Article history: Stranded oil covering soil and plant stems in fragile Louisiana marshes was one of the most visible
Received 9 February 2016 impacts of the 2010 Deepwater Horizon (DWH) oil spill. As part of the assessment of marsh injury after
Received in revised form the DWH spill, plant stem oiling was broken into five categories (0%, 0e10%, 10e50%, 50e90%, 90e100%)
13 April 2016
and used as the independent variable for estimating death of vegetation, accelerated erosion, and other
Accepted 17 May 2016
metrics of injury. The length of shoreline falling into each of these stem oiling categories was therefore a
key measure of the total extent of marsh injury, and its accurate estimation is the focus of this paper.
First, we used geographically-weighted logistic regression (GWR) to explore and model spatially varying
Keywords:
Marsh
relationships between stem oiling field data and secondary information (oiling exposure category)
Geographically-weighted regression collected during shoreline surveys. We then combined GWR probability estimates with field data using
Cokriging indicator cokriging to predict the probability of exceeding four stem oiling thresholds (0, 10, 50, and 90%)
Indicator kriging at 50 m intervals along the Louisiana shoreline. Cross-validation using Receiver Operating Characteristic
False positives (ROC) Curves demonstrate the greater prediction accuracy of the multivariate geostatistical approach
relative to either aspatial regression or indicator kriging that ignores secondary information.
© 2016 Published by Elsevier Ltd.
http://dx.doi.org/10.1016/j.jenvman.2016.05.041
0301-4797/© 2016 Published by Elsevier Ltd.
P. Goovaerts et al. / Journal of Environmental Management 180 (2016) 264e271 265
category but nonzero stem oiling observations within this category of the five stem oiling categories described above (0%, 0e10%,
were generally clustered in space. 10e50%, 50e90%, or 90e100%). The marsh pre-assessment
An alternative is to use a geostatistical approach that can ac- dataset also includes 185 additional sites where stem oiling was
count for the geographical location of the environmental data and simply categorized as “oiled” or “not oiled.” These 185 “soft” data
their spatial correlation, as modeled by variograms (e.g., Goovaerts provide additional information on the spatial distribution of oiled
et al., 2008; Kitsiou and Karydis, 2011). The application of geo- plant stems (see Fig. 1).
statistics to this particular dataset, however, presented several Between 2010 and 2013, spatially continuous descriptions of
challenges. First, field data were collected with different spatial oiling were also collected along the Louisiana shoreline as part of
resolutions and degrees of reliability. Some data represent precise the shoreline cleanup and assessment technique (SCAT) program
measurements of percentage of stem oiling at specific locations (Michel et al., 2013; NOAA, 2013). These observations were
(“hard data”). At other specific locations, we only know whether collected primarily to inform response activities, and summarized
the vegetation was oiled or not; we do not know the percentage of oiling along the shoreline using qualitative, categorical descriptors.
stem oiling (“soft data”). Finally, although the shoreline exposure As a supplement to the SCAT data, spatially continuous, qualitative
database provides a more comprehensive spatial coverage, the descriptions of shoreline oiling were also collected for the NRDA as
oiling descriptors in this dataset are more qualitative with respect part of the Rapid Assessment program (Deepwater Horizon NRDA,
to stem oiling (“secondary information”). 2010b). Although both of these data sources provide spatially
The second major challenge arises from the complex geometry continuous coverage, they did not include detailed measurements
of the site. Louisiana has a deeply dissected and crenulated marsh of stem oiling.
coastline, and oil was transported into this region via the bays and During the injury assessment phase of the NRDA, the SCAT and
inlets that dissect it. As a result, spatial correlation of stem oiling Rapid Assessment datasets were combined into a single database
may be more directly related to over-water distance than straight referred to as the shoreline exposure database (Nixon, 2015) dis-
line (Euclidian) distance (Baraba s et al., 2001; Money et al., 2009). played in Fig. 2. Oiling exposure in this dataset is classified into one
Third, the heterogeneity of the shoreline and the fact that surveys of the following four categories: NOO, lighter oiling, heavier oiling,
were conducted by different teams at different times likely impacts and heavier persistent oiling. For the purposes of our study, the
locally the relationship between secondary data (oiling exposure categorical descriptors of shoreline oiling within this dataset are
category) and percentage of stem oiling measured in the field (i.e. referred to as “secondary information” on stem oiling. 729 of the
non-stationary relationships). Fourth, the sheer size of the datasets 911 locations with hard data on stem oiling were co-located with
analyzed (e.g. 1100 field data and 118,151 shoreline grid nodes to secondary information from the shoreline exposure database
predict) precluded the use of Bayesian methodologies based on (Fig. 3).
traditional MCMC (Markov chain Monte Carlo) schemes (Gelfand The geospatial analysis was thus based on four main types of
et al., 2003), while more powerful approaches, e.g. the INLA (inte- data:
grated nested Laplace approximations) methodology, rely heavily
on numerical methods and computer programming that are (1) Measurement of the percentage of plant stem oiling from the
beyond the scope of this study (Martins et al., 2013). pre-assessment dataset (911 “hard” data)
This paper describes the procedure developed to estimate the (2) Indicators of presence/absence of plant stem oiling from the
expected lengths of mainland herbaceous shoreline in Louisiana pre-assessment dataset (185 “soft” data)
falling into four stem oiling categories: 0e10%, 10e50%, 50e90%, (3) Oiling exposure category (“secondary information”) sur-
and 90e100%. The varying reliability of the different pieces of in- veyed along approximately 1600 km of mainland herbaceous
formation was integrated using a soft and hard indicator coding of marsh coastline and at 729 of the 911 hard data locations.
the data (Goovaerts, 1997; Hu et al., 2005), whereas geographically- (4) Length of shoreline located within 118,151 50 50 m squares
weighted logistic regression (Fotheringham et al., 2002; Goovaerts discretizing the Louisiana coastline.
et al., 2015; van Donkelaar et al., 2015) was used to explore and
model spatially varying relationships between stem oiling field
data and secondary information collected during shoreline surveys.
2.2. Methodology
Probabilities of exceeding stem oiling thresholds estimated from
field measurement and survey data were combined using indicator
The analysis was conducted using the following software: 1)
cokriging (Goovaerts and Journel, 1995). Sensitivity analysis and
SpaceStat 4.0 (Jacquez et al., 2014) for geographically-weighted
cross-validation helped guide the choice of optimal sets of pa-
regression and variogram modeling, 2) SAS 9.3 (SAS Institute Inc.,
rameters and investigate the impact of search strategy and distance
2011) for aspatial logistic regression and the creation of ROC
metrics on prediction accuracy.
curves, 3) SGeMS (Remy et al., 2008) and Gslib (Deutsch and
Journel, 1998) for cross-variogram modeling and indicator cokrig-
2. Materials and methods
ing, and 4) code written by Dr. Goovaerts for data manipulation and
computation of expected lengths of shoreline in different cate-
2.1. Datasets
gories of plant stem oiling. The flowchart in Fig. 4 illustrates the
main steps in the analysis, as described below.
Detailed measurements of stem oiling were collected at 911
discrete points within mainland herbaceous marshes of coastal
Louisiana (Fig. 1). These marshes are located along the edges of 2.2.1. Indicator coding of plant stem oiling data
saline to brackish estuaries and bays throughout Louisiana, and are The analysis started with the coding of each percentage of stem
dominated by the marsh vegetation Spartina alterniflora. Stem oil- oiling data into a vector of indicators of exceedance of four
ing measurements were collected in the late summer and early fall thresholds zc ¼ 0, 10, 50, and 90%. Let ua ¼ (xa,ya) be a vector of UTM
of 2010, as part of a study referred to as the Marsh Pre-Assessment coordinates representing the geographical location of a stem oiling
Study (Deepwater Horizon NRDA, 2010a). At each of these 911 lo- data point, denoted z(ua) for hard data and s(ua) for soft data. The
cations, field data were recorded as the percent of stem height set of four indicators at any hard data location ua was then con-
oiled, and these raw measurements were then condensed into one structed as:
266 P. Goovaerts et al. / Journal of Environmental Management 180 (2016) 264e271
Fig. 1. Geographical location of hard (-) and soft (þ) data on percentage of plant stem oiling that were used in the geospatial analysis. Red color denotes locations where oiling was
observed while blue symbols correspond to zero percentage of plant stem oiling. (For interpretation of the references to colour in this figure legend, the reader is referred to the web
version of this article.)
Fig. 2. Continuous shoreline exposure data (secondary information) compiled on mainland herbaceous marsh shorelines. This linear dataset was discretized into 118,151
computational nodes on a regular, 50-m grid.
Fig. 3. Oiling exposure category recorded at 729 pre-assessment sites, superimposed on shoreline exposure data shown in Fig. 2.
Fig. 4. Flowchart describing the different steps of the geospatial analysis for the computation of expected length of shoreline falling into specific classes of percentage of plant stem
oiling. Grey boxes represent the input data, light blue boxes correspond to outputs, and green boxes denote key (geo)statistical methods. (For interpretation of the references to
colour in this figure legend, the reader is referred to the web version of this article.)
268 P. Goovaerts et al. / Journal of Environmental Management 180 (2016) 264e271
shoreline lengths within each 50 50 m square and the probability continuous exposure variable in geographically-weighted regres-
that oiling is above zc within that square. Only locations in the vi- sion (GWR), which allowed using smaller search windows.
cinity of a PA site were included in the computation to avoid
extrapolating results to sparsely sampled segments of shoreline. 3.3. Aspatial versus geographically-weighted regression
The distances of 1 and 2 km correspond to the size of the cokriging
search windows. A sensitivity analysis (described in Note S3, supplementary
material) was conducted to support the choice of a search strat-
3. Results and discussion egy for GWR and the use of a probability threshold T for merging
aspatial and GWR estimates. The best predictive power, as
3.1. Data configuration and summary statistics measured by the AUC statistic, was achieved when using the 50
closest observations and T ¼ 0.0075. The results listed in Table 3
Fig. 1 shows where hard and soft data on percentage of plant (top 3 rows) demonstrate the greater accuracy of merged proba-
stem oiling were collected at pre-assessment (PA) sites. Summary bilities relative to the separate application of aspatial regression or
statistics indicate that stem oiling was observed at 39.6% of hard GWR.
data sites and only 6% of soft data sites (Table 1). Similarly, the NOO
exposure category is more prevalent at soft data sites (81.4%) 3.4. Cross-validation analysis
relative to hard data sites (47.9%); see Table 2. This table also
highlights the fact that secondary information on oil exposure was The stem oiling probabilities derived by merging aspatial and
not recorded at a number of sites: 20% (182) of the hard data sites GWR estimates are referred to as “prior” probabilities, as they are
and 36.2% (67) of the soft data sites did not have oil exposure data. based solely on the oiling exposure category recorded at each grid
In other words, the PA survey data extend beyond the boundaries of node. Cokriging was applied to update these “prior” probabilities
the shoreline exposure dataset. using the additional information provided by indicator coding of
The oiling exposure category surveyed along the coastline, hard and soft data (Eqs. (1) and (2)). For each threshold, direct and
which was discretized using a 50 m spacing grid, is mapped in cross indicator variograms (Goovaerts, 1997; Goovaerts and Journel,
Fig. 2. The entire grid includes 118,151 nodes under mainland her- 1995) were computed and modeled using a combination of one
baceous marsh, and their repartition among four categories of oil- exponential variogram model with range of 600 m, and another
ing exposure is listed in the final column of Table 2. The percentage exponential model with a range of 15 km for thresholds zc ¼ 0 and
of observations in the NOO category is much larger in the shoreline 10%, 10 km for threshold zc ¼ 50%, and 5 km for threshold zc ¼ 90%.
exposure dataset compared to the hard dataset: 85% of the shore- The benefit of this updating was assessed using a cross-validation
line exposure data are NOO, compared to 47.9% of the PA points, similar to the one conducted during the sensitivity analysis.
which reflects the preferential sampling of oiled locations during Table 3 indicates that indicator cokriging increases the accuracy of
the PA survey. the prediction compared to aspatial regression and indicator krig-
ing which uses only PA data. However, indicator cokriging fails to
3.2. Continuous versus categorical secondary data improve over GWR results. Because of the geographical clustering
of vegetation oiling data and the fact that the soft data could not be
Table S3 (supplementary material) summarizes the results for cross-validated, cokriging is still expected to benefit the prediction
the aspatial logistic regression between the oiling exposure cate- and was retained as the interpolation technique of choice.
gories (4 levels) and the indicator of exceedance of one of the four
thresholds zc for the percentage of plant stem oiling. Incorporating
3.5. Euclidean distance versus over-water distance
the exposure categories as a categorical vs continuous variable has
a moderate impact on the estimated probabilities of exceedance,
For both geographically-weighted regression and cokriging, the
yet the AUC statistic indicates that the predictive power of the two
selection of neighbors and computation of spatial weight functions
types of model is identical. This result justifies the use of a
were based on Euclidean distances. Due to the complex shape of
the Louisiana marsh edge, and the mechanisms by which oil was
Table 1 transported over water during the spill, we might expect over-
Distribution of hard and soft data between the five classes of percentage of plant water distance to be a better metric of spatial correlation than
stem oiling. Numbers in parenthesis are percentages of the total number of non- Euclidian distance. However, translating from Euclidian to over
missing data.
water distance for all of 118,151 nodes in the domain was prohibi-
% Stem oiling Hard data (n ¼ 911) Soft data (n ¼ 185) tively expensive computationally.
0% 551 (60.4%) 174 (94.0%) To explore the sensitivity of the results to the use of Euclidian
0e10% 59 (6.5%) distances, we instead performed a coordinate transformation for
10e50% 169 (18.6%) 11 (6.0%) the set of 729 locations where both the percentage of stem oiling
50e90% 80 (8.8%) and oiling exposure categories were recorded, following the pro-
>90% 52 (5.7%)
cedure described by Løland and Høst (2003). This approach, which
Table 2
Distribution of hard and soft data between the four categories of oiling exposure. The last column reports the number of shoreline nodes within each oiling exposure category.
Numbers in parenthesis are percentages of the total number of data points (excluding missing data).
Oiling exposure category Hard data (n ¼ 911) Soft data (n ¼ 185) Shoreline (n ¼ 118,151)
0e10% 305 199 We are grateful to two anonymous reviewers for their
10e50% 730 628 constructive comments on this manuscript, and to Andy Finley for
50e90% 279 225 comments on an earlier version of the paper. This study was con-
>90% 156 109 ducted within the Deepwater Horizon NRDA investigation and was
Total 1470 1161
funded by the State of Louisiana.
P. Goovaerts et al. / Journal of Environmental Management 180 (2016) 264e271 271
Appendix A. Supplementary data Hu, K., Huang, Y., Li, H., Li, B., Chen, D., White, R.E., 2005. Spatial variability of
shallow groundwater level, electrical conductivity and nitrate concentration,
and risk assessment of nitrate contamination in North China Plain. Environ. Int.
Supplementary data related to this article can be found at http:// 31, 896e903.
dx.doi.org/10.1016/j.jenvman.2016.05.041. Jacquez, G.M., Goovaerts, P., Kaufmann, A., Rommel, R., 2014. SpaceStat 4.0 User
Manual: Software for the Space-time Analysis of Dynamic Complex Systems,
4th, ed. BioMedware, Ann Arbor https://www.biomedware.com/files/SpaceStat_
References 4.0_Documentation.pdf.
Kitsiou, D., Karydis, M., 2011. Coastal marine eutrophication assessment: a review
Anastas, P.T., Sonich-Mullin, C., Fried, B., 2010. Designing science in a crisis: the on data analysis. Environ. Int. 37, 778e801.
Deepwater horizon oil spill. Environ. Sci. Technol. 44, 9250e9251. Løland, A., Høst, G., 2003. Spatial covariance modelling in a complex coastal domain
Barabas, N., Goovaerts, P., Adriaens, P., 2001. Geostatistical assessment and valida- by multidimensional scaling. Environmetrics 14 (3), 307e321.
tion of uncertainty for three-dimensional dioxin data from sediments in an Martins, T.G., Simpson, D., Lindgren, F., Rue, H., 2013. Bayesian computing with
estuarine river. Environ. Sci. Technol. 35 (16), 3294e3301. INLA: new features. Comput. Stat. Data Anal. 67, 68e83.
Deepwater Horizon, N.R.D.A., 2010a. Deepwater Horizon/MC252/BP Shoreline/ Michel, J., Owens, E.H., Zengel, S., Graham, A., Nixon, Z., et al., 2013. Extent and
Vegetation NRDA Pre-assessment Data Collection Plan. July 12 2010. Available. degree of shoreline oiling: Deepwater Horizon oil spill, Gulf of Mexico. PLoS One
https://pub-dwhdatadiver.orr.noaa.gov/dwh-ar-documents/900/DWH- 8, e65087 doi:65010.61371/journal.pone.0065087.
AR0013202.pdf. Money, E.S., Carter, G.P., Serre, M.L., 2009. Modern space/time geostatistics using
Deepwater Horizon, N.R.D.A., 2010b. Deepwater Horizon/MC252/BP Shoreline river distances: data integration of turbidity and e. coli measurements to assess
Vegetation Rapid Oiling Survey NRDA Pre-assessment Data Collection Plan. fecal contamination along the Raritan river in New Jersey. Environ. Sci. Technol.
December 15, 2010. Available. https://pub-dwhdatadiver.orr.noaa.gov/dwh-ar- 43 (10), 3736e3742.
documents/900/DWH-AR0013221.pdf. Nixon, Z., 2015. Deepwater Horizon Shoreline Oil Exposure Mapping and Database.
Deutsch, C.V., Journel, A.G., 1998. GSLIB: Geostatistical Software Library and User Technical Report. Prepared for National Oceanic and Atmospheric Administra-
Guide. Oxford University Press, New York. tion. Available. https://pub-dwhdatadiver.orr.noaa.gov/dwh-ar-documents/
Fotheringham, A.S., Brunsdon, C., Charlton, M., 2002. Geographically Weighted 901/DWH-AR0270684.pdf.
Regression: the Analysis of Spatially Varying Relationships. John Wiley & Sons, NOAA., 2013. Shoreline Assessment Manual, fourth ed. U.S. Dept. of Commerce,
Chichester. Seattle, WA. Emergency Response Division, Office of Response and Restoration,
Gelfand, A.E., Kim, H.-J., Sirmans, C.F., Barnejee, S., 2003. Spatial modeling with National Oceanic and Atmospheric Administration. 73 pp þ appendices.
spatially varying coefficient processes. J. Am. Stat. Assoc. 98, 387e396. Available. http://response.restoration.noaa. gov/sites/default/files/manual_
Goovaerts, P., 1997. Geostatistics for Natural Resources Evaluation. Oxford Univer- shore_assess_aug2013.pdf.
sity Press, New-York. Peterson, C.H., Anderson, S.S., Cherr, G.N., Ambrose, R.F., Anghera, S., et al., 2012.
Goovaerts, P., Glass, G., 2014. Geostatistical modeling of the spatial distribution of A tale of two spills: novel science and policy implications of an emerging new
surface soil arsenic around a smelter. J. Jpn. Soc. Soil Phys. 128, 5e10. oil spill model. Bioscience 62, 461e469. http://dx.doi.org/10.1525/
Goovaerts, P., Journel, A.G., 1995. Integrating soil map information in modelling the bio.2012.62.5.7.
spatial variation of continuous soil properties. Eur. J. Soil Sci. 46 (3), 397e414. Remy, N., Boucher, A., Wu, J., 2008. Applied Geostatistics with SGeMS: a User’s
Goovaerts, P., Trinh, H.T., Demond, A.H., Franzblau, A., Garabrant, D., Gillespie, B., Guide. Cambridge University Press, Cambridge.
Lepkowski, J., Adriaens, P., 2008. Geostatistical modeling of the spatial distri- SAS Institute Inc, 2011. SAS/STAT 9.3 User’s Guide. SAS Institute Inc, Cary, NC.
bution of soil dioxin in the vicinity of an incinerator. 1. Theory and application Silliman, B., He, Q., Dixon, P., Wobus, C., Willis, J., Hester, M., 2015. Accelerated
to midland. Mich. Environ. Sci. Technol. 42 (10), 3648e3654. marsh loss following the BP Deepwater Horizon oil spill: a region wide survey.
Goovaerts, P., Schofield, J., Telech, J., 2009. Geostatistical estimation of contaminated DWH NRDA shoreline technical working group report. Prepared for the Loui-
sediment volumes: review of common challenges and solution. In: Proceedings siana Coastal Protection and Restoration Authority.
of StatGIS 2009, Milos, Greece. Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science 240,
Goovaerts, P., Xiao, H., Adunlin, G., Ali, A., Tan, F., Gwede, C.K., Huang, Y., 2015. 1285e1293.
Geographically-weighted regression analysis of percentage of late-stage pros- van Donkelaar, A., Martin, R.V., Spurr, R.J.D., Burnett, R.T., 2015. High-resolution
tate cancer diagnosis in Florida. Appl. Geogr. 62, 191e200. satellite-derived PM2.5 from optimal estimation and geographically weighted
Hester, M.W., Willis, J.M., Rouhani, S., Steinhoff, M., Baker, M., 2015. Impacts of the regression over north America. Environ. Sci. Technol. 49 (17), 10482e10491.
Deepwater Horizon Oil Spill on the Salt Marsh Vegetation of Louisiana. Avail- http://dx.doi.org/10.1021/acs.est.5b02076.
able. https://pub-dwhdatadiver.orr.noaa.gov/dwh-ar-documents/901/DWH-
AR0270701.pdf.