B925-03 - Standard Practices For Production and Preparation of PM Test Specimens

Accepted Manuscript
Research papers
Simulation-Optimization Framework for Multi-Site Multi-Season Hybrid Sto-

chastic Streamflow Modeling
Roshan Srivastav, K. Srinivasan, K.P. Sudheer
PII: S0022-1694(16)30579-0
DOI: http://dx.doi.org/10.1016/j.jhydrol.2016.09.025
Reference: HYDROL 21521
To appear in: Journal of Hydrology
Received Date: 23 January 2016

Revised Date: 12 August 2016
Accepted Date: 8 September 2016
Please cite this article as: Srivastav, R., Srinivasan, K., Sudheer, K.P., Simulation-Optimization Framework for
Multi-Site Multi-Season Hybrid Stochastic Streamflow Modeling, Journal of Hydrology (2016), doi: http://
dx.doi.org/10.1016/j.jhydrol.2016.09.025
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Simulation-Optimization Framework for Multi-Site Multi-Season Hybrid
Stochastic Streamflow Modeling
Roshan Srivastava,b, K.Srinivasanc, K.P.Sudheerc
a
Associate Professor, School of Civil and Chemical Engineering, VIT University, Vellore, Tamilnadu, India 632014
b
Former PhD Scholar,EWRE Division, Dept. of Civil Engineering, IIT Madras, Chennai, Tamilnadu, India 600036
c
Professor, EWRE Division, Dept. of Civil Engineering, IIT Madras, Chennai, Tamilnadu, India 600036
Abstract
A simulation-optimization (S-O) framework is developed for the hybrid stochastic modeling of
multi-site multi-season streamflows. The multi-objective optimization model formulated is the
driver and the multi-site, multi-season hybrid matched block bootstrap model (MHMABB) is the
simulation engine within this framework. The multi-site multi-season simulation model is the
extension of the existing single-site multi-season simulation model. A robust and efficient
evolutionary search based technique, namely, non-dominated sorting based genetic algorithm
(NSGA - II) is employed as the solution technique for the multi-objective optimization within the
S-O framework. The objective functions employed are related to the preservation of the multi-
site critical deficit run sum and the constraints introduced are concerned with the hybrid model
parameter space, and the preservation of certain statistics (such as inter-annual dependence
and/or skewness of aggregated annual flows). The efficacy of the proposed S-O framework is
brought out through a case example from the Colorado river basin. The proposed multi-site
multi-season model AMHMABB (whose parameters are obtained from the proposed S-O
framework) preserves the temporal as well as the spatial statistics of the historical flows. Also,
the other multi-site deficit run characteristics namely, the number of runs, the maximum run
length, the mean run sum and the mean run length are well preserved by the AMHMABB model.
Overall, the proposed AMHMABB model is able to show better streamflow modeling
performance when compared with the simulation based SMHMABB model, plausibly due to the
significant role played by: (i) the objective functions related to the preservation of multi-site
critical deficit run sum; (ii) the huge hybrid model parameter space available for the evolutionary
search and (iii) the constraint on the preservation of the inter-annual dependence. Split-sample
validation results indicate that the AMHMABB model is able to predict the characteristics of the
multi-site multi-season streamflows under uncertain future. Also, the AMHMABB model is
found to perform better than the linear multi-site disaggregation model (MDM) in preserving the
statistical as well as the multi-site critical deficit run characteristics of the observed flows.
However, a major drawback of the hybrid models persists in case of the AMHMABB model as
well, of not being able to synthetically generate enough number of flows beyond the observed
extreme flows, and not being able to generate values that are quite different from the observed
flows.
Key words: Stochastic streamflow models; Simulation-Optimization;NSGA-II;Evolutionary
Algorithms;Hybrid matched block bootstrap.

1. Introduction
Starting with Fiering (1964) and Matalas (1967), there have been a number of attempts in
hydrology to model multi-site/multi-variate streamflows. These belong to one of the two basic
types, i) parametric and ii) non- parametric models. A detailed review of the parametric type of
multivariate/multi-site time seriesmodels used in hydrology is presented by Salas et al. (1980),
Salas (1993) and McLeod and Hipel (1994), while the various types of non-parametric models in
use are reviewed by Lall (1995), Lall and Sharma (1996), Srinivas and Srinivasan (2005) and
Salas and Lee (2010).
The parametric type of models may be classified as: i) Periodic vector AR/ARMA models; ii)
contemporaneous AR/ARMA models; iii) Disaggregation models. The PAR / PARMA models
need to estimate a large number of parameters jointly, to account for the periodic space-time
dependence, especially at shorter time scale, with the available historical samples of limited
record length. Moreover, the parameter estimates may be unstable and may lead to poor
reproduction of some of the important statistics. This motivated the development of a simplified
set of models known as contemporaneous AR/ARMA models (CAR/CARMA), wherein the
preservation of dependence structure of concurrent streamflows at the various stations was
effected through model decoupling (Stedinger et al., 1985 and Salas et al., 1985). However, the
complex structure of some of the individual site models could impede the exact preservation of
the spatial cross-correlations of flows. (Rasmussen et al., 1996)..
The need to preserve the statistical properties at more than one level necessitated the
development of disaggregation models in the hydrologic literature(Harms and Campbell,1967;
Valencia and Schaake,1973; Mejia and Rousselle,1976). Preservation of a wide range of

statistical relationships between both multiple time scales and space scales needs accurate
estimation of a large number of parameters in case of disaggregation models which may not be
feasible with the limited hydrologic data available. Hence, staged disaggregation models(Lane,
1982; Stedinger andVogel, 1984; Grygier and Stedinger, 1988; Santos and Salas, 1992) and
condensed disaggregation models (Lane, 1982; Stedinger and Pei, 1982; Pereira et al., 1984;
Oliveira et al., 1988; Stedinger et al., 1985;Grygier and Stedinger, 1988) were developed with a
view to reduce the number of parameters to make them computationally more amenable.
Moreover, empirical adjustment procedures were suggested by Grygier and Stedinger (1988)to
restore the summability of the disaggregated flows to the aggregate flows, especially when
normalizing transformations were applied to flows. The traditional linear parametric models of
streamflows of the AR/ARMA (Box-Jenkins) type including the multi-site parametric
disaggregation models can provide only a linear control system representation of watershed
processes, while the various physical components of streamflow such as snowmelt runoff, soil
water retention as well as soil drainage are dynamic, non-linear processes. Also, non-stationarity
trends owing to the underlying dynamics of the physical processes may not be captured
effectively.
Koutsoyiannis (1999) developed a parsimonious nonlinear multi-variate dynamic disaggregation
model (DDM) that followed a two-step approach for simulation of hydrologic time series.
Following this, a generalized mathematical framework for stochastic simulation and forecasting
problems in hydrology was proposed by Koutsoyiannis (2000) for modeling stochastic processes
with short- or long-term memory structure, in which a generalized autocovariance function was
implemented within a generalized moving average generating scheme. Although the DDM and
the further developments (Koutsoyiannis, 2000, 2001) were reported to reproduce long-term
dependence, and were validated for practical water resourcesuse, the computational complexity
involved was high. Langousis (2006) proposed an approach that directly deals with the
hydrologic data at the seasonal time scale, but still preserves both the seasonal and the annual
statistics and the over-year scaling behavior without restoring to disaggregation techniques.
However, it is reported to be quite complex owing to several steps of nonlinear multi-variate
optimization(Langousis, 2006).Recently, Efstratiadis et al. (2014) have presented a multi-variate
parametric stochastic modeling framework that preserves the important statistical characteristics
of the data at multiple sites and at daily, monthly and annual time scales which also involves a
number of computational complexities concerning multi-variate parameter estimation and the
adjustments and refinements required to reduce the biases in the simulations. The limitations of
the parametric stochastic disaggregation models concerning preservation of complex spatial and
temporal dependence structure and reproduction of the non-standard marginal distributions have
been brought out by Sharma and O'Neill(2002). Recently, copula-based multisite stochastic
simulation models have been proposed by Chen et al. (2015). The spatial and temporal
dependencies were modeled by combining bivariate copulas and conditional probability
distributions. The main advantages of this method are (i) the parameters of the model can be
easily estimated and (ii) the computational time is less.
On the other hand, non-parametric models can provide more accurate representation of the non-
linear dynamics of the physical watershed processes by way of effectively modeling the complex
dependence structure present in the streamflow data. Also, they can successfully mimic the bi-
modality present in the marginal distributions in certain months that may be caused due to
different runoff generating mechanisms. A unique feature of the non-parametric technique is to
reproduce the empirical structure of multi-variate datasets without recourse to assumptions about
data or model structure. Moreover, the complexities associated with parameter estimation are not
experienced. Silverman (1986) discusses a wide range of non-parametric methods, while
Lall(1995) provides a review of the non-parametric techniques applied to a variety of water and
environmental applications. Non-parametric methods have been applied to a wide variety of
hydro-climate modeling problems that include stochastic daily weather generation (Rajagopalan
and Lall, 1999; Yates et al., 2003), streamflow simulation(Lall and Sharma, 1996; Sharma et al.,
1997; Prairie et al., 2006), streamflow forecasting (Grantz et al., 2006; Singhrattna et al., 2005),
and flood frequency estimation (Moon and Lall, 1994). Some of the non-parametric techniques
that are often used in hydrology are: moving block bootstrap (MBB) (Vogel and Shallcross,
1996);k-nearest neighbor (k-NN) bootstrap (Lall and Sharma, 1996) and its variations and
improvements (Prairie et al., 2007; Lee et al., 2010; Salas and Lee, 2010); kernel based methods
(Sharma et al., 1997; Tarboton et al., 1998); and matched block bootstrap (MABB) (Srinivas and
Srinivasan, 2005b).Salas and Lee (2010) have presented a review of the non-parametric models
used in streamflow modeling, clearly bringing out the limitations of each model.
Prairie et al. (2006) proposed a modified k-NN approach that enables the simulation of values
not seen in the historical record, which has recently been improved by Li and Singh (2014)
through the implementation of a multi-model simulation scheme. Also, Salas and Lee(2010)
have employed the k-nearest neighbor resampling algorithm with gamma kernel perturbation to
generate the seasonal data by conditioning the annual data. Although these models perform well
in simulating multi-season streamflows, they are applicable only to modeling single site data.
Prairie et al. (2007) presented a parsimonious non-parametric disaggregation model for space-
time simulation of streamflows at river basin level, extending the single site temporal
disaggregation scheme of Tarboton et al. (1998) by replacing the tedious kernel based methods
with the k-NN approach. Although this method captures the distributional characteristics and the
spatial dependencies well, a number of limitations have been pointed out by Lee et al. (2010)
concerning underestimation of critical drought characteristics and repetitious nature of the data
patterns being generated. Lee et al. (2010) have proposed a spatio-temporal disaggregation
model that generates the higher level variable (e.g., annual flow data) based on any parametric or
non-parametric model, then generates the lower level sequence (e.g., seasonal flow data) by
applying k-nearest neighbor resampling in such a way that their sum is close to the higher level
generated flow data. Moreover, genetic algorithm based mixing is implemented to achieve
variety in the generated data. This multi-site multi-season non-parametric disaggregation model
is reported to yield better simulations than that of Prairie et al. (2007).More recently, based on
the maximum entropy bootstrap (MEB) modeling approach proposed by Vinod (2006) for
economic time series. Srivastav and Simonovic (2014) have developed a computationally less
demanding and simple procedure to model multi-site, multi-season stream flows. The orthogonal
transformation is used with MEB to capture the spatial dependence present in the multi-site
collinear data. Ilich and Despotvic (2008) and Ilich (2014) have developed a three step non-
parametric algorithm for multi-site generation of hydrologic series. This involves the generation
of random variables that reproduce any arbitrary marginal, followed by reordering and permuting
of the generated data such that the serial correlations, cross-correlations, annual level
autocorrelations and the correlations between the end of the previous year and the beginning of
the current year are preserved. Following this, Markovic et al. (2015) have introduced two
modifications to the above algorithm to model high skew and outliers present in the data and to
obtain a number of extreme dry years in the simulated series. In recent times, data-driven models
(Ahmed and Sharma, 2007; Sudheer et al., 2008; Ünes et al., 2015) are also used in the stochastic
hydrology literature to model hydrologic data. However, these prediction models seem to be
limited to single site modeling. Moreover, these models cannot generate data outside the
observed range.
Srinivas and Srinivasan (2000, 2001) introduced hybrid stochastic streamflow models based on
the post-blackening approach proposed by Davison and Hinkley (1997). This approach used a
parsimonious linear parametric model for partial pre-whitening of the observed streamflows,
followed by resampling of the residuals extracted using moving block bootstrap (MBB) to
generate innovations which were then post-blackened to synthesize stochastic replicates of the
observed flows. The single site multi-season HMBB model of Srinivas and Srinivasan (2001)
was extended to multi-site multi-season model by Srinivas and Srinivasan (2005). Moreover,
Srinivas and Srinivasan (2006) proposed the hybrid matched block bootstrap (HMABB) for
modeling single site multi-season streamflows, using the rank matching idea of Carlstein et al.
(1998) for resampling the residuals. In comparison to the low-order linear parametric models and
the HMBB, the HMABB model was shown to provide better simulations of multi-season
streamflows with complex dependence structure. Moreover, the HMABB model is able to yield
sufficient variability of the streamflow characteristics owing to the use of smaller within-year
block sizes. However, the following limitations of the HMBB models seem to be present in case
of the HMABB as well: (i) poor preservation of the statistics at the aggregated time scale which
affects the preservation of the critical drought characteristics at higher truncation levels; (ii) the
smoothing and the extrapolation value added is limited, since the generated flows lie close to the
observed flow values; (iii) the identification of the appropriate hybrid model is quite tedious.
Further improvement to the single site multi-season HMABB model was done by Srivastav and
Srinivasan (2011) by way of automating the selection of the appropriate HMABB model through
a simulation-optimization framework and introducing a constraint into the framework for the
preservation of the streamflow statistics at the aggregated (annual) level. This effected in an
improvement in the better preservation of the storage and the drought characteristics.
The current research study proposes an extension of the simulation-optimization (S-O)
framework developed for single-site multi-season streamflows by Srivastav et al.(2011) to multi-
site multi-season streamflows. Another contribution is the extension of the single-site multi-
season simulation model HMABB proposed by Srinivas and Srinivasan (2006) to the multi-site
multi-season simulation model (MHMABB) for use as the simulation module within the
proposed S-O framework for modeling the multi-site multi-season streamflows. A robust and
efficient evolutionary search based technique, namely, non-dominated sorting based genetic
algorithm (NSGA - II) (Deb et al., 2002) is employed as the solution technique for the multi-
objective optimization within the S-O framework. The multi-objective optimization model
formulated will be the driver and the multi-site, multi-season hybrid matched block bootstrap
model (MHMABB) will be the simulation engine within this framework. The idea of using the
relevant water-use related objective functions explicitly into the simulation-optimization
framework directs the evolutionary search to explore the wide parameter space of the multi-site,
multi-season hybrid model HMABB, subject to the necessary constraints on the hybrid model
parameter space and to find the appropriate hybrid model (described by a combination of
parametric and non-parametric components). In addition to the constraints on the model
parameter space, some specific constraints regarding the preservation of certain statistics (such
as inter-annual dependence or skewness of aggregated annual flows) are introduced explicitly
into the modeling framework with a view to arrive at a hybrid stochastic model with an improved
performance in terms of preserving the statistics at more than one level and consequently,
preserving the multi-site deficit run (drought) characteristics accurately. The efficacy of the
proposed S-O framework in simulating the multi-site multi-season streamflows is shown through
a case example from the Colorado River basin.
2. Simulation-Optimization Framework for Multi-site Multi-season Streamflow Modeling
Simulation-optimization modeling can be defined as the process of finding the best input
variable values from among all possibilities without explicitly evaluating each possibility. The
output of a simulation model is used by an optimization strategy to provide feedback on progress
of the search for the optimal solution. This in turn guides further input to the simulation model
(Carson and Maria, 1997). A comprehensive review on theory and applications of simulation-
optimization modeling has been presented by Tekin and Ihsan (2004). The S-O methodology has
been employed beneficially in a number of research works in the field of water resources
planning and management, which have been documented by Nicklow et al. (2010).
In the last few decades, there has been an increasing interest in using Evolutionary Algorithms
(EAs) in simulation-optimization problems mainly because they do not require restrictive
assumptions or prior knowledge about the shape of the response surface (Back and Schwefel,
1993). Evolutionary Algorithms (EAs) are heuristic search methods that implement ideas from
the evolution process. As opposed to a single solution used in traditional methods, EAs work on
a population of solutions in such a way that poor solutions become extinct, whereas the good
solutions are likely to reach the optimum (survival of the fittest). When the response surface is
high-dimensional, discontinuous, and non-differentiable, the traditional methods may often fail
to find the optimal solution, while methods such as evolutionary algorithms can be applied
successfully to these types of problems (Azadivarand Tompkins, 1999; Pierreval and Paris,
2000).In general, an EA for simulation-optimization can be described as follows: (i) generate a
population of solutions; (ii) evaluate these solutions through a simulation model; (iii)perform
selection, apply genetic operators to produce a new offspring (or solution), and insert it into the
population; and (iv) repeat until some stopping criterion is reached. From the literature, the most
popular EAs are known to be Genetic Algorithms (GAs) (Goldberg, 1989). In general, each point
in the solution space is represented by a string of values for the decision variables. The use of
appropriate cross-over and mutation operators reduces the probability of trapping to a local
optimum. The elitism property enables the carry-over of competent solutions through successive
generations.
2.1. Application of GA in Time Series Modeling
Rolf et al. (1997) stated that the aim of combining traditional ARMA modeling knowledge and
evolutionary algorithms would be to provide a tool that would be able to automate the three step
process of time series modeling. Following this, in the last decade, a few research studies (Cortez
et al., 2004; Voss and Feng, 2002; Minerva andPoli, 2001; Peng and Chen, 2003; Ong et al.,
2005; Chen et al., 2002) have employed evolutionary search algorithms for automating the three-
step time series modeling approach of Box-Jenkins ARMA models. The above research studies
bring out the efficacy of evolutionary techniques in model identification and parameter
estimation of Box-Jenkins type of models (AR, ARMA, ARIMA, SARIMA, FARIMA). For
model identification, fitness functions such as AIC, BIC are used in the GA framework, while
some form of statistical performance criteria (such as minimization of sum of squared errors,
maximization of likelihood functions) are used in case of parameter estimation. In case of non-
parametric models (such as k-NN, Kernel based models), the fitness function can be to minimize
the generalized cross-validation score. However, in case of the more complex multi-site, multi-
season hybrid models, no such statistical criteria are available for model identification and
parameter estimation. Hence, it has been decided to adopt (employ) water-use (reservoir
storage/drought) related criteria (mentioned in the following section) as the objective functions in
the Multi-objective GA (MOGA) based framework proposed in this study. Incidentally, these
criteria can be expected to preserve the basic statistical characteristics (such as summary
statistics, marginal distributions). It is to be mentioned that this approach has already been
successfully applied to single-site, multi-season streamflow modeling by Srivastav and
Srinivasan (2011).
2.2. Simulation-Optimization Modeling Framework
The proposed simulation-optimization (S-O) modeling framework for multi-site multi-season
streamflow modeling is shown in Fig. 1. It consists of the multi-objective optimization model (as
the driver), and the multi-site multi-season hybrid matched block bootstrap model MHMABB
developed in this study as the simulator embedded into it. The multi-site multi-season hybrid
simulation model MHMABB is an extension of the single-site HMABB model proposed by
Srinivas and Srinivasan (2006).
Figure 1: Simulation-Optimization Framework for Stochastic Modeling of Multi-site
Multi-season Streamflows
As discussed earlier, the S-O modeling framework primarily aims to enhance the performance of
the hybrid stochastic models in simulating the streamflows for water resources planning use. The
secondary aim of the framework is to minimize the drudgery, judgment and subjectivity involved
in the selection of the most appropriate hybrid stochastic model. The special features introduced
into the S-O framework to achieve the above are: i) critical water-use related objective functions
in the driver of the framework; ii) a powerful multi-objective evolutionary search based tool
(NSGA-II) (Deb et al., 2002) to explore the huge hybrid model parameter space and obtain a set
of competent hybrid stochastic models automatically; and iii) a constraint to enable the
preservation of the inter-annual dependence, which may be helpful in the preservation of the
critical water-use related statistics at higher truncation levels.
2.2.1. Multi-site Multi-season HMABB as the Simulator
In this study, the single-site multi-season hybrid matched block bootstrap (HMABB) proposed
by Srinivas and Srinivasan (2006) is extended to multi-site multi-season hybrid matched block
bootstrap (MHMABB). The hybrid model effectively blends the parametric component (the low-
order PAR(1)model at each site) and the non-parametric component (multi-site multi-season
matched block bootstrap). The proposed extension of the simulation algorithm is presented
below.
Proposed Algorithm for MHMABB
Let the time series of historical streamflows be denoted by the vector where the superscript k
denotes the site index (k = 1,...,nk), v is the index for year (v = 1,...,N) and denotes the index for
season (period)within the year ( = 1,...,ω); nk refers to the number of sites; N represents the
number of years of historical record and ω denotes the number of periods within the year. The
modeling steps are as follows:
1. Standardize the elements of the historical streamflows, i.e., the vector using
(1)
where and represents the mean and the standard deviation respectively, of the observed
streamflows in the period at the kth site. Note that the historical streamflows are not transformed
to remove skewness.
2. Pre-whiten the standardized historical streamflows, , partially, using a parsimonious
periodic model PAR(1), and extract the residuals, , at each
site k, using
(2)
where is the first order periodic autoregressive parameter for period , at the kth site. The
purpose of partial pre-whitening using a parsimonious PAR(1) structure at each site is to utilize
the potential of the proposed non-parametric component, multi-site multi-season MABB, that
can capture the weak linear dependence structure and the non-linear dependence structure
present in the multi-site multi-season residuals effectively.
3. Obtain one set of replicates of the simulated innovations
at eachsite k, by contemporaneous resampling of the multi-site, within-year non-
overlapping blocks of residuals, using the proposed multi-site rank-matched block bootstrap
(MABB) method. The key steps involved in the resampling algorithm are as follows:
(a) For each site k, prepare n non-overlapping within-year blocks (such as ) using
the residuals with the respective lengths being L1,...,Ln such that the lengths of all the within-
year blocks sum to ω, i.e., . Note that the lengths of all the within-year blocks are the
same for all the sites to enable resampling of contemporaneous blocks of residuals, so that the
site-to-site cross-correlations (dependence across the sites) are captured. Herein, denotes the
ith within-year block for the year v of the record, at site k. Let denote the end elementof .
Form the sets
(b) For the contemporaneous selection of the within-year blocks, the end elements of the block i
for each site k, has to be combined by using an appropriate strategy, to obtain a fictitious
contemporaneous end element. In this research work, the strategy based on the Euclidean
distance (ED) is adopted and presented here. Form the sets where
isthe contemporaneous end element which can be obtained using
(3)
(c) Arrange the elements of in ascending (or descending order)of their magnitude and assign
ranks. Let denote the contemporaneous rank of , where and . The
algorithm is initialized by randomly selecting one of the “N” first within-year blocks
contemporaneously. Let it be the current contemporaneous within-year block for all the sites.
The following are the steps in the resampling algorithm:
i. Identify the rank corresponding to the current contemporaneous within-year block. Let it be
denoted by .
ii. Select all the contemporaneous end elements whose ranks fall within a bandwidth w (= 2m+1),
ranging from and , where m is the window parameter which is a small positive
integer. These form the set of nearest neighbors to the current contemporaneous end element
(which has rank ). From this, randomly select one of the neighboring contemporaneous end
elements. This requires generating a uniform random number "U" in the range of integers
and .
iii. Obtain the contemporaneous within-year block that follows the selected contemporaneous
within-year block (which corresponds to the contemporaneous end element selected in (ii)) and
append it to the current within-year block. It is to be noted that the appending of the
corresponding neighboring contemporaneous within-year block is to be done for all the ‘k’ sites.
iv. The recently appended contemporaneous within-year block becomes the new current
contemporaneous within-year block. This holds for all ‘k’ sites.
v. To generate more innovations, repeat steps from (i) to (iv) till the desired length of one
replicate of simulated innovations ( is obtained.
4. Post-blacken the resampled innovation series, to obtain the standardized synthetic streamflows
(4)
5. Inverse standardize to obtain synthetic streamflow series .
Note that, for k = 1, this algorithm reduces to single-site multi-season hybrid matched block
bootstrap model (HMABB) proposed by Srinivas and Srinivasan (2006).
The use of short contemporaneous within-year block sizes ensures reasonable amount of
variability in the synthetic replicates to be generated at various sites. Moreover, the site-to-site
dependence is preserved due to the contemporaneous resampling of the within-year blocks.
While, the window size selected based on the rank matching approach ensures that one of the
nearest neighbors to the current within-year block is selected and appended.

2.2.2. Multi-Objective Optimization Model
Conventional wisdom in stochastic modeling of streamflows suggests that if a stochastic
streamflow model preserves the summary statistics, the marginal distributions and the
dependence structure present in the historical streamflows well, then, it is likely to preserve the
water-use characteristics such as the storage capacity and the critical drought characteristics.
However, there is no explicit proof for this and there is no general functional relationship
between the accuracy of preservation of the water-use characteristics and the accuracy of
reproduction of the basic statistical characteristics of streamflows and/or the stochastic model
parameters. Moreover, in case of hybrid models, there are no statistical criteria (such as AIC,
BIC) for the selection of the hybrid model parameters. On the other hand, manually exploring the
huge parameter space of the multi-site multi-season hybrid model (MHMABB) through a large
number of simulations to find the best hybrid model, would involve drudgery and subjectivity.
Given this, it would be pragmatic to formulate a simulation-optimization (S-O) modeling
framework that would explicitly relate the objective functions based on the accuracy of
preservation of the water-use characteristics (such as critical drought characteristics) to the
hybrid model parameter space.
All extreme (or critical) streamflow droughts encounter large deficits. On the other hand, a long
drought duration may not necessarily signify an extreme (or critical) drought if the
corresponding deficit volume encountered during the drought event is not large. Likewise, a low
mean discharge may not indicate necessarily an extreme drought if its duration is short. The
variation of drought duration is primarily governed by climate, while the deficit volume is more
related to catchment characteristics. According to Zelenhasic and Salvai (1987) and Zelenhasic
(1997), the stochastic process of streamflow droughts can be described by nine descriptive
parameters, of which the critical drought deficit volume is the most informative parameter.
Hence, critical drought deficit volume may be considered to be the essential and single pivotal
characteristic that effectively represents the process of critical streamflow droughts. Hence, the
efficacy of preservation of the critical drought deficit volumes estimated from the historical
streamflows corresponding to various pre-specified truncation levels, is vital for the effective
stochastic simulation of streamflows.
The streamflow drought characteristics are often described using the theory of runs (Yevjevich,
1967). Specifically negative runs of streamflow sequences with respect to a specified truncation
level, represent deficit conditions. A number of stochastic models preserve the deficit run
(drought) characteristics either at lower or higher truncation levels, but not both. But, a good
synthetic streamflow model is expected to preserve the run characteristics with minimum bias
and root mean square error (overall truncation levels considered) when compared with the
corresponding estimates from the historical streamflows, while ensuring sufficient variability to
account for future uncertainty. Quite often, if the bias of the estimate is reduced, then the
variance of the same may increase and vice-versa. If only the R-RMSE related objective function
is used, then, the hybrid stochastic model identified may have minimum ∑R-RMSE(MARS), but
may result ina high value of ∑|R-Bias(MARS)|, which is not desirable at all. Hence, in this
research work, i) Minimize the sum of absolute values of the relative Bias in the preservation of
the multi-site critical deficit run sum over all truncation levels considered; ii) Minimize the sum
of relative RMSE in the preservation of the multi-site critical deficit run sum over all truncation
levels considered are employed as the objectives in the simulation-optimization framework
proposed for the multi-site multi-season hybrid stochastic streamflow modeling.

2.2.3. Mathematical Formulation
Objective Functions
Based on a detailed exploration of the use of different plausible water-use related objective
functions, the following two objective functions are proposed within the framework: (i)
Minimize the aggregated relative bias and (ii) Minimize the aggregated relative RMSE, in the
preservation of the maximum multi-site deficit run sum (MARS) over the truncation levels
varying from 50% to 95%of the historical mean monthly flow (MMF) at intervals of 5% MMF.
(5)
(6)
in which (7)
(8)
where is the estimated MARS based on the historical streamflows at the ith truncation
level. The maximum run sum (MARS) is expressed as: MARS = max(ds1, ds2,…,dsnr), where the
multi-site run-sum for a specified truncation level and run is defined as: wherein
j denotes the run number, k refers to the site number, nk denotes the total number of sites being
modeled and nr denotes the total number of runs. In eq. 7, E [ ] is the mean value of
MARS corresponding to the ith truncation level, estimated over Nr synthetically generated
replicates and is expressed as:
(9)
In eq. 8, var[ ] is the variance of MARS at the ith truncation level estimated over the Nr
synthetically generated replicates and is expressed as:
(10)
It is possible to use other water use objective functions in place of the two objective functions
mentioned (eqs. 5 and 6).
Constraints
Constraints on Model Parameters: Certain constraints are developed within the proposed S-O
framework to describe the model parameter space.
Parametric Component: In the simulation-optimization multi-season hybrid model proposed in
this study, the partial pre-whitening is done using a parsimonious parametric model, namely,
periodic autoregressive model of order 1 (PAR(1)). This means that the parameter space of the
parametric component of the multi-season hybrid model is defined by the range of values taken
by the periodic autoregressive parameter of order 1, . For the stationarity condition, the roots
of the characteristic equation must lie within the unit circle. However, in most practical situations
of stochastic modeling of the hydrologic random process "streamflow", the physical
considerations suggest that the lag-1 serial correlation coefficient (ρ1) be positive, which means
that (Hipel and Mcleod, 1994). Accordingly, the following constraint on the first
order PAR parameter ( ) has been introduced into the simulation-optimization framework:
(11)
where, refers to the periodic autoregressive parameter of order ‘1’ for month ‘ ’ at site k.
Non-Parametric Component: In the proposed framework, the multi-site multi-season MABB
model has been used as the non-parametric component. The conditional resampling is done
contemporaneously on non-overlapping within-year blocks formed from the residuals at each
site. The parameters of the multi-site multi-season MABB model are: (i) the non-overlapping
within-year block sizes and (ii) the band width. In case of within-year blocks, there exist a large
number of possible combinations of non-overlapping block sizes. However, the sum of all the
within-year blocks should be equal to the total number of periods within a year (ω= 12 for
monthly), i.e.,
L1 + L2 + . . . + Ln = ω (12)
Further, in case of selection of bandwidth (w), it is observed from various trials that adopting
large `w', increases the bias in the preservation of historical dependence structure and in the
prediction of storage capacities at different demand levels. While, adopting a low ‘w’ leads to the
reduction in variety of replicates in synthetic generation (Srinivas and Srinivasan, 2006).Hence,
based on the experience gained by the authors in modeling periodic streamflows of various rivers
using multi-season HMABB hybrid models, the bandwidth is restricted to fall between 3 and 13.
3 ≤ ω≤ 13 (13)
Constraints on Statistical Characteristics. In this research work, the issue of preserving the inter-
annual dependence is addressed by introducing an explicit constraint that can ensure the
preservation of the dependence at the aggregated annual level. This is done through a constraint
on R-bias in preserving the lag-1 correlation of flows at the aggregated annual level, which is
usually effective in modeling the inter-annual dependence. This is expected to enable the
preservation of the various statistics at the aggregated annual level, and as a result, enhance the
preservation of storage capacity at higher demand levels. In general, the modeler can introduce
any appropriate constraints into the S-O framework explicitly, depending on the statistics to be
preserved, either at the monthly level and/or the annual level.
(14)
where denotes the basic periodic statistical characteristics(s) at any sitek (such as mean,
standard deviation, skewness of month) and is theallowable upper limit of the relative bias
(that can be specified by the modeler), for each month ( ) for each statistical characteristic
considered foreach site k. In eq. (14), represents the basic aggregated annual statistical
characteristic at any site, k (such as mean, standard deviation, skewness and autocorrelation),
while, is the allowable upper limit of the relative bias at each site, k (that can be specified by
the modeler), for each statistical characteristic (A) considered at the aggregated annual level. In
addition, represents the site-to-site correlations and denotes the allowable upper limit of
the relative bias (that can be specified by the modeler), for the site-to-site correlations.
The hybrid model parameter space of the multi-site multi-season hybrid streamflow model
(MHMABB) consists of two components, i.e., the parametric component at each site and the
non-parametric component, and is quite huge. The parameters of the parametric component of
the model can take combinations of real values within the unit circle resulting from multiple sites
and multiple seasons (12 in case of monthly modeling). The non-parametric component, matched
block bootstrap, contains bl number of within-year blocks and m number of window sizes. The
sizes of each of these blocks can take any integer value between 1 and 12, such that the sum of
all such within-year blocks equals 12 and a reasonable range of band width can be from 3 to 13.
Thus, the total number of combinations of HMABB models possible considering the parametric
and non-parametric components together, will be quite large.
2.2.4. Solution Technique - NSGA-II
There is no known explicit functional relationship between the accuracy of preservation of the
critical drought characteristics and the reproduction of basic statistical characteristics of
streamflows and/or the stochastic model parameters, especially for the complex hybrid stochastic
model, HMABB, considered in this study. Hence, traditional optimization techniques cannot be
employed to find the optimal hybrid model. Moreover, the hybrid parameter space is too large
and complex to be explored using only simulations. In such problems, multi-objective
evolutionary algorithms (MOEA) are known to be appropriate, since the objective functions can
be explicitly evaluated by interacting with the simulation model. Moreover, their inherent ability
to handle complex problems, including features such as discontinuities, multi-modality, disjoint
feasible space and noisy functions makes MOEA appropriate for complex real world problems
(Fonseca and Fleming, 1995).Also, these algorithms are efficient and can obtain a number of
non-dominated solutions from a random initial population in a single run (Deb et al.,2002).
Moreover, both discrete and continuous variables can be handled together simultaneously such
as in case of the hybrid parameter space (block sizes and window size being discrete and
parameters of the PAR(1) model being continuous).In the proposed simulation-optimization
framework (Fig.1), the multi-objective evolutionary technique, Non-dominated Sorting Genetic
Algorithm - II (NSGA-II) developed by Deb et al. (2002) is adopted. Although the number of
alternative hybrid models to be searched appears to be very large, the NSGA-II based genetic
search used in this research work, being an efficient, robust and elitist non-dominated search
based approach, converges to the near Pareto-optimal solutions within reasonable number of
evaluations.
The decision vector consists of both discrete and continuous variables represented within the
NSGA-II string as a chromosome. All the variables are coded in binary strings to represent both
the parametric component (such as ϕ1, ϕ2,. . ., ϕ12 of PAR(1) model defined in a continuous space)
and the non-parametric model parameters (such as one window parameter and within-year block
sizes of MABB model defined in a discrete space). In the decision vector, the first discrete
variable in a chromosome represents the contemporaneous window parameter for the multi-site
multi-season HMABB model. The next twelve discrete variables represent the within-year block
sizes of a maximum possible 12 blocks. The sum of within-year block sizes should be equal to
12 (total number of months in a year). The selection of the within-year block sizes is made in
such a way that the aggregated sum of the sizes of the within-year blocks equals 12.If the number
of within-year blocks is less than 12, then, the remaining number of variables (out of 12) are set
as dummy variables. Moreover, in case, the sum of the within-year block sizes happens to be
greater than 12 in any of the chromosomes (in a given population), then that chromosome is not
allowed to pass through the hybrid model simulator and instead a large positive value is assigned
to the fitness function in order to eliminate that string. The next 12k number of continuous
variables in the chromosome represent the parametric component (PAR(1)) corresponding to the
k number of sites of the multi-site multi-season HMABB model.

2.2.5. Functioning of the Simulation-Optimization Framework
To evaluate the fitness functions based on the reservoir storage statistics, the generated
chromosomes from NSGA-II (each chromosome represents a multi-site HMABB model) are sent
to the synthetic simulation module. Once the synthetic replicates are generated, the simulation
module computes the required statistics (summary statistics, distribution related statistics,
correlations, storage capacity required at the specified demand levels) and sends the same to the
NSGA-II module to evaluate the fitness functions and the constraints formulated. Based on the
fitness function values evaluated, the solutions are then sorted according to the fast elitist-based
non-dominated approach (Deb et al., 2002) to identify the different levels of non-dominated
fronts. The generation/reproduction based on tournament selection, will pick only the best among
the existing population. The cross-over and the mutation operations are performed to introduce
variability among the generations. To handle both discrete and continuous variable space,
uniform cross-over operator is adopted. The crowded comparison operator enables the diversity
preservation and the elitism operator helps in significantly speeding up the search process and
preserving the good non-dominated solutions. For further details on the NSGA-II approach and
the genetic operators used, the readers are referred to Deb et al. (2002).
3. Simulation based Multi-site HMABB (SMHMABB) Model
In this research work, the single site multi-season HMABB model proposed by Srinivas and
Srinivasan (2006) has been extended to multi-site multi-season HMABB (MHMABB) model.
The modeling steps involved in the synthetic generation of streamflows using the proposed
multi-site multi-season HMABB model are presented in section 2.2.1. The simulation based
MHMABB models are herein referred as SMHMABB models. The SMHMABB model building
is divided into two stages. In stage 1, the parametric model parameters for each site are obtained
using method of moments followed by stage 2 in which the parameters of the nonparametric
models (i.e., the contemporaneous block size and window size) is selected by numerous trials
based on the overall performance of the model. It is to be mentioned that in this study, only equal
within-year blocks sizes (1, 2, 3, 4, and 6) and the window sizes 3,5,7,9,11 and 13 are tried.
Thus, if equal within-year block sizes are used, then the total number of combinations of both the
parametric and the nonparametric components results in 30 hybrid models. It is to be noted that
if unequal within-year block sizes are to be used, then the total number of hybrid models will be
quite large and the manual inspection and selection will be extremely tedious. In fact, the
parameter space of the MHMABB model is huge, and the same is under-explored in case of the
simulation based MHMABB (SMHMABB) model, since the selection of the model parameters
for both the parametric and the non-parametric components of the SMHMABB model is
obtained independently and the residual space explored by the non-parametric model is limited.
The drawbacks of the simulation based hybrid models can be summarized as: (i) Joint parameter
space exploration is not done; (ii) conditioning of variables for the reproduction of statistics at
the aggregated level is not possible; (iii) the manual effort involved in inspection and selection of
the alternate hybrid models is enormous.
4. Application to Upper Colorado River Basin
The efficacy of the AHMABB model obtained from the proposed S-O framework in modeling
the multi-site multi-season streamflows is evaluated by applying to the monthly streamflows
measured at four streamflow stations located on the Upper Colorado River basin. The monthly
naturalized streamflows at the following four streamflow gauging stations for the 102-year
period (1906 to 2007)(source:http://www.usbr.gov/lc/region/g4000/NaturalFlow/index.html) are
considered for the multi-site multi-season streamflow modeling application: Colorado River near
Cisco, Utah (site 1); Green River at Green River, Utah (site 2); San Juan River near Bluff, Utah
(site 3); and Colorado River at Lees Ferry, Arizona (site 4). The location of the stations are
presented in Table 1and Fig.2. These streamflow data sets have been chosen for the study
because they exhibit complex dependence, and also bimodality in a few months. Also, these
bench-mark data sets have been used by Prairie et al. (2007) and Salas and Lee (2010)for multi-
site multi-season modeling of streamflows.
Figure 2: Location of Streamflow Stations - Colorado River Basin (source: Google Maps)
Table 1: Location of the selected stations for the multi-site multi-season flow modeling
The efficacy of the AMHMABB model is shown through: (i) a comparison with the selected
simulation based hybrid model (SMHMABB), in order to bring out the advantages of the
proposed model in terms of model performance due to the automation achieved by the S-O
framework and the preservation of inter-annual dependence; (ii) a comparison with the multi-site
parametric disaggregation model (MDM) fitted using SAMS2007 (Sveinsson et al., 2007), a
state-of-the-art stochastic streamflow modeling package; and (iii) a split-sample validation test to
assess the performance of the proposed AHMABB model in capturing the statistics of the multi-
site streamflows that may occur in the uncertain future.
The performance comparisons are based on the ability of the models to preserve the following
statistics: (i) summary statistics (mean, standard deviation and skewness coefficient) at within-
year (monthly) and aggregated annual time scales at each site; (ii) marginal distribution of
monthly flows at each site; (iii) lag-1 autocorrelation at aggregated annual level at each site; (iv)
monthly serial correlations at each site; (v) serial state-dependent correlations (Sharma et al.,
1997) of monthly flows at each site (representing nonlinear dependence); (vi) lag-zero site-to-site
correlations at the monthly level; (vii)minimum and maximum monthly flows at each site; and
(viii)the multi-site deficit run characteristics (Yevjevich,1972; Haltiner and Salas, 1988)
expressed in terms of (a) maximum deficit run sum; (b) maximum deficit run length; (c) mean
deficit run sum; and (d) mean deficit run length.
4.1. Models Considered and Selection of Model
For the AMHMABB and the SMHMABB, the details of the models considered for the selection
and the selected model for comparison are discussed in the following paragraphs.
AMHMABB model: Since this model is to be obtained from the S-O framework based on the
multi-objective evolutionary search using NSGA-II (Deb et al., 2002), a sensitivity analysis is
performed for the application example considered in this study. The sensitivity analysis on
MOGA-parameters (population size, number of generations, cross-over probability, mutation
probability and random seed) has been carried out with an intention to obtain the non-dominated
Pareto-optimal solutions. The MOGA parameters adopted based on the sensitivity analysis are as
follows: population size = 100; number of generations = 300; probability of cross-over = 0.6;
mutation probability = 0.001; random seed = 0.3. The non-dominated front obtained for the
application example using the evolutionary search based technique NSGA-II (Deb et al., 2002),
is presented in Fig. 3. In Fig. 3, the solutions A and C represent the AMHMABB models
corresponding to the two extremes on the non-dominated front, one with the "minimum ∑|R-
bias(MARS)|" and the other with the "minimum ∑R-RMSE(MARS)" respectively; the solution B
represents the AMHMABB model that corresponds to a typical compromising solution between
the two extremes. The compromising solution is the one that is located closest to the origin on
the pareto-front presented in Fig. 3. It is to be noted from Fig. 3 that the Pareto-front has a
narrow range, resulting in practically very close solutions, which is plausibly due to the inter-
annual dependence constraint introduced into the framework. Hence, in this study, only the
AMHMABB-A solution is used for the comparisons and the same will be hereafter referred as
AMHMABB.
SMHMABB model: While the parametric component of the SMHMABB model is restricted to
PAR(1) at all the sites for partial pre-whitening, the non-parametric components of the
SMHMABB model are picked from one of the combinations resulting from: (i) equal sized
within-year contemporaneous blocks of 1,2,3,4,6 months and (ii) window sizes of 3,5,7,9,11,13.
It is to be noted that the PAR(1) model parameters are estimated independently at each station
using the method of moments and the within-year block sizes adopted for resampling the
residuals are equal and contemporaneous, since the unequal block sizes result in a large number
of possible hybrid models, which will be too cumbersome to evaluate manually. The above
combinations of parametric and non-parametric components result in 30hybrid models. For the
purpose of comparison, the most competent model is chosen based on the reproduction of all the
temporal as well as the spatial statistics and the preservation of the deficit run characteristics.
The SMHMABB model selected herein for the Colorado river basin has the PAR(1) model at
each site as the parametric component and the contemporaneous within-year block size of 4
months and the window size of 9 as the nonparametric model components used for resampling
the residuals (Table2).
Table 2: Parameters for the selected multi-site HMABB models SMHMABB and
AMHMABB-Colorado River Basin
Figure 3: Pareto-front between R-Bias (MARS) and R-RMSE (MARS)
4.2. Comparison with SMHMABB Model
A comparison of model performance between the selected AMHMABB model and the selected
SMHMABB model is presented in the next few paragraphs. Table 2 summarizes the parameters
of the selected SMHMABB and AMHMABB models for Colorado River Basin, from which it
can be observed that the parameters of the AMHMABB model (both the parametric component
and the non-parametric component) are quite different from those of the SMHMABB model for
the streamflows at all the sites. This is because, in case of the SMHMABB model, the periodic
parameters (parametric component) are obtained at each site (independently) by fitting a PAR(1)
model using the method of moments (SAMS 2007). Following this, the multi-site residuals are
contemporaneously resampled using a multi-site HMABB with a set of equal within-year block
sizes and a pre-selected window size. Subsequently, the post-blackening operation is performed.
Thus, there are multiple steps and these steps have to be sequentially performed. While, in case
of the AMHMABB model, the parameters of the parametric component are simultaneously
obtained in combination with contemporaneous non-parametric component using the multi-
objective evolutionary search technique, NSGA-II, efficiently guided by objective functions that
are based on multi-site critical deficit run sum preservation and the constraint on preservation of
inter-annual dependence. Moreover, in case of the proposed framework, a huge parameter space
is available for the search (unlike the simulation based SMHMABB model). As mentioned
earlier, the AMHMABB model yielding the minimum ∑|R-bias(MARS)| from the pareto-optimal
front is used for the comparison with the SMHMABB model.
4.2.1. Reproduction of Summary Statistics and Dependence Structure
For both the AMHMABB and the SMHMABB models, the reproduction of summary statistics
and the preservation of the serial correlations at monthly level are presented in Figs. 4 and 5
respectively. For brevity, the results are shown for only two sites of the Colorado river basin,
since a similar trend of results are observed at the other two sites. The reproduction of the
summary statistics and the lag-1 autocorrelation of the aggregated annual flows are presented in
Table 3. The summary statistics of the flows are well reproduced by both the models at the
monthly level (Fig. 4) and at the aggregated annual level (Table 3). However, the standard
deviation at the annual level is deflated by the SMHMABB model. It is seen from Fig. 5 that the
monthly serial correlations at all the 4 lags are well preserved by the AMHMABB model,
whereas, the SMHMABB model shows considerable bias in preserving the lag-2, lag-3 and lag-4
correlations, and this bias is found to increase with the order of the lag. The lag-1 autocorrelation
at the aggregated annual level is well preserved at all the sites by the AMHMABB model (Table
3), due to the inter-annual dependence constraint introduced into the framework. On the other
hand, the SMHMABB model (Table 3) does not preserve the lag-1 autocorrelation at the annual
level at any of the four sites, since the simulation based hybrid model is not conditioned to
preserve the same. The lag-zero site-to-site correlations (Fig. 6) are well reproduced by both the
models, due to the residual resampling using contemporaneous within-year blocks. Although the
state-dependent correlations (Fig. 7) are well reproduced at all the four sites by both the models,
the SMHMABB model exhibits relatively more bias in a few months in comparison with the
AMHMABB model.
Table 3: Reproduction of the Aggregated Annual Flow Statistics - Comparison between
SMHMABB and AMHMABB models (values in parentheses denote the standard deviation
over 300 replicates)
Figure 4: Reproduction of Summary Statistics - A Comparison between AMHMABB and
SMHMABB Models (flow units: Mm3/month): a) at site 2; b) at site 4.
Figure 5: Preservation of Serial Correlations - A Comparison between AMHMABB and
SMHMABB Models: a) at site 2; b) at site 4.
Figure 6: Preservation of Lag-zero site-to-site monthly correlations - A Comparison
between AMHMABB and SMHMABB Models: a) site 1 to site 3; b) site 1 to site 4; c) site 2
to site 4.
Figure 7: Preservation of state-dependent correlations - A Comparison between
AMHMABB and SMHMABB Models: a) at site 2; b) at site 4.
4.2.2. Preservation of Marginal Distributions and minimum and maximum monthly flows
Typical results of preservation of the marginal distributions of monthly streamflows at site 1 for
July and December months are presented in Figs. 8 and 9 respectively and the same at site 2 for
August month is presented in Fig. 10. For brevity, the results are presented and discussed only
for flows of a few typical months that exhibit peakedness and/or bimodality. In general, it is
observed that the AMHMABB model is able to mimic the distribution characteristics of
historical flows very well (especially the non-normal features such as peakedness and bi-
modality), when compared with the SMHMABB model. It is also observed that at all the four
sites, both the models show limited smoothing as well as extrapolation beyond the extremes
(minimum and maximum flows). The preservation of the minimum and the maximum flows at
the key site 4 is presented only for AHMABB in Fig. 11 since the behavior of SMHMABB is
quite similar. Both the hybrid models do not preserve the minimum and the maximum flows
effectively, since very limited number of flows are generated beyond the historical extremes. In
summary, the automated model obtained from the simulation-optimization framework
(AMHMABB) outperforms the simulation based model SMHMABB.
Figure 8: Preservation of the marginal Distribution of the July month streamflows at site 1-
A Comparison between AMHMABB and SMHMABB Models (flow units: Mm3/month)
Figure 9: Preservation of the marginal Distribution of the December month streamflows at
site 1 - A Comparison between AMHMABB and SMHMABB Models (flow units:
Mm3/month)
Figure 10: Preservation of the marginal Distribution of the August month streamflows at
site 2 - A Comparison between AMHMABB and SMHMABB Models (flow units:
Mm3/month)
Figure 11: Preservation of a) minimum flows and b) maximum flows at site 4 - Model:
AMHMABB (flow units: Mm3/month)
4.2.3. Preservation of Multi-site Deficit Run Characteristics
The results of preservation of the multi-site deficit run characteristics are presented in Fig. 12for
the selected SMHMABB model and the selected AMHMABB model. From Fig. 12, it is
observed that the selected AMHMABB model clearly outperforms the SMHMABB model in
preserving the number of runs, the critical and the mean deficit run characteristics at all the
truncation levels. Also, a good and consistent percent of exceedance of the various deficit run
characteristics (compared to their historical flow counterparts) is noted in case of the generated
streamflows from the AMHMABB model, which is not shown here for brevity. It is to be noted
that the selected AMHMABB model is able to preserve the critical run sum accurately owing to
the objective functions adopted (that are explicitly related to∑|R-bias(MARS)| and ∑R-
RMSE(MARS), over various truncation levels). On the other hand, the SMHMABB model
shows high bias either at lower and/or higher truncation levels. Herein, it is to be mentioned that
although no objective functions/constraints are introduced into the S-O framework with regard to
preserving the other deficit run characteristics (such as number of runs, maximum run length,
mean run sum and mean run length), these are well preserved by the selected AMHMABB
model, when compared with the simulation based SMHMABB model (Fig. 12).
Figure 12: Preservation of historical drought characteristics - A comparison between
SMHMABB and AMHMABB models
The automated multi-site hybrid model AMHMABB proposed in this study scores over the
simulation based hybrid model SMHMABB plausibly due to the more effective combination of
the parametric and the non-parametric components owing to the exploration of the huge
parameter space of HMABB enabled by the objective function that minimizes the aggregated
errors in the preservation of the multi-site critical drought magnitude over a wide range of
threshold levels, subject to the constraint for the preservation of the inter-annual dependence of
the simulated streamflows at the various sites. The provision for unequal within-year block sizes
in the structure of the AMHMABB model is expected to offer a better representation of the
short-term persistence due to the recession of the seasonal ground water flows in the sub-basins
considered and the pronounced seasonality due to seasonal storage in snow packs. Moreover, the
implementation of the constraint on conditional preservation of the inter-annual (long-term)
dependence of the streamflows at the various sites considered, is expected to represent the over-
year response time of deep ground water runoff more accurately (Claps and Murrone, 20??). In
general, the better preservation of the streamflow drought durations at the various sites exhibited
by AMHMABB is indicative of the better representation of the storage and the response times of
the different catchments considered in this study. The observation that both the multisite drought
durations and the drought severities (magnitudes) at various threshold levels are better preserved
by the AMHMABB model indicates that the non-linear dynamics behind the propagation of the
streamflow droughts is better captured (Loon and Laaha, 2015).
4.3 Comparison with Multi-site Parametric Disaggregation Model (MDM)
The following paragraphs bring out the performance comparison between the selected
AMHMABB model obtained from the proposed S-O framework and the multi-site parametric
disaggregation model (MDM). Twenty-eight different multi-site parametric disaggregation
models arising out of the combinations resulting from: i) aggregated annual model at the key
site; ii) the available schemes for the spatial and the temporal disaggregations; and iii) the
sequence of disaggregation adopted, are fitted using SAMS 2007. The best model based on the
preservation of spatial and temporal statistics as well as deﬁcit run characteristics, is selected.
The selected MDM model adopts AR(1) for generation of the aggregated annual ﬂows at the key
site, Lees Ferry on Colorado river (Table 1, Fig. 2),Valencia and Schaake model for spatial
disaggregation of the generated annual streamflows at the key site, followed by temporal
disaggregation using Lane’s model. For brevity, only the results for the key site (site 4) are
presented in this section, since similar trend of performance is observed for the other three sites
as well. For both AMHMABB and MDM, the reproduction of the summary statistics,
preservation of serial correlations, preservation of state-dependent correlations, marginal
distributions at the key site (site 4) and multi-site drought characteristics are presented in Figs
13-17, respectively.
Fig 13: Reproduction of Summary Statistics for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model (Flow units -
Mm3/month)
Fig 14: Preservation of Serial Correlations for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model
Fig 15: Preservation of State-dependent Correlations for Colorado River Basin at Site 4 - A
Fig 16: Preservation of Marginal Distribution of March and June month flows at Site 4 for
Colorado River Basin - A comparison between Disaggregation model (MDM) and
AMHMABB models (Flow units - Mm3/month)
Fig 17: Preservation of multi-site drought characteristics (a) Number of runs; (b)
Maximum Run Length; (c) Maximum Run Sum for Colorado River Basin - A comparison
between Disaggregation model (MDM) and AMHMABB models
It is observed from Figure 13 that the monthly mean and standard deviations of flows at Lees
Ferry (Site 4) are well reproduced by both the models, although the MDM model exhibits some
bias in a few months in terms of preserving the skewness coefficient. A detailed performance
comparison of preservation of monthly serial correlations has been done, in this study, but for
brevity, the results are presented only for lag-1, lag-2 (lower lags), lag 3 and lag-4 (higher lags)
monthly serial correlations in Figure 14. In case of AMHMABB models, it is observed that both
the lower lag (lag-1 andlag-2) and the higher lag (lag-3 and lag-4) correlations are well preserved
for Site – 4. In case of MDM, it is observed that the lower lag serial correlations are reasonably
well preserved. While, the higher lag serial correlations are not preserved by the disaggregation
model. This is because the selected MDM is not designed to preserve the serial correlations
beyond lag-1 at the seasonal level. The results for the preservation of the lag-1 state-dependent
correlations for the Site 4 is presented in Figure 15. It is observed from these figures that in
general, the AMHMABB model is seen to reproduce the monthly lag-1 state-dependent
correlations very well. On the other hand, it is seen that the disaggregation models fail to
preserve the same. Being a linear parametric model, the MDM is not expected to preserve the
state-dependent correlations that are indicative of the nonlinear dependence present in the data
(Sharma et al., 1997).
Typical results of preservation of the marginal distribution of flows of the Colorado River Basin
for the AMHMABB model and the multi-site disaggregation model (MDM) are presented in
Figure 16. For brevity, the results are presented only for the March and June month flows. It is
observed, that the AMHMABB model is able to mimic the distribution characteristics of
historical flows very well, when compared to MDM. However, being a parametric model, MDM
offers better smoothing when compared to AMHMABB (hybrid model). Moreover, it is
observed that, both the models show only some limited extrapolation near minimum and/or
maximum flows. It is to be noted that the selected AMHMABB model is found to preserve the
statistical characteristics of multi-site multi-season streamflows better than the selected multi-site
disaggregation model (MDM), although the objective functions used in the proposed S-O
framework for the AMHMABB model are based on the preservation of the multi-site critical
deficit run sum.

The results for preservation of the multi-site critical deficit run characteristics is presented in
Figure 17 for both AMHMABB and MDM. From Figure 17, it is observed that the selected
AMHMABB model is able to preserve the deficit run characteristics better compared to the
MDM. It is to be noted that although there are no objective functions/constraints used to achieve
the preservation of the other deficit run characteristics (such as number of runs and maximum
run length), these characteristics are also well preserved by the AMHMABB model when
compared to the MDM. Overall, it is observed that the selected AMHMABB model shows better
performance in simulating the historical streamflows of the Colorado river basin, when
compared with the selected disaggregation model (MDM). The better performance of the
AMHMABB model in comparison with the MDM may be attributed to the better preservation of
the marginal distributions, the higher lag serial correlations, skewness coefficient of aggregated
annual flows and the state-dependent correlations of the monthly flows.
4.4 Split-sample Validation
Split sample validation is conducted with a view to ensure that the AMHMABB model obtained
from the simulation-optimization framework is able to capture the repeatable statistical structure
present in the historical streamflow data. In other words, this kind of validation will endorse the
adaptability of the proposed model for the possible streamflow sequences that may occur in the
uncertain future. The split sample validation is carried out in two phases, namely, calibration and
validation, using the 102-yearmultisite monthly streamflows measured at the four streamflow
stations located on the Upper Colorado River basin (Table 1, Fig. 2). In the calibration phase, the
first 60 years of the streamflow data are employed in obtaining the parameters of the parametric
and the non-parametric components using the S-O framework. The AMHMABB model obtained
in the calibration stage is then used to model the validation data set (remaining 42 years of
historical streamflows). The stochastic model tested is considered acceptable for practical use, if
it can provide a very good simulation of the measured streamflows at the calibration as well as
the validation phases by way of reproducing the basic statistical characteristics of the streamflow
data as well as preserving both the critical and the mean deficit run characteristics obtained from
the historical flow data, at various truncation levels, with minimum errors. For brevity, only the
results for the Site 4 are presented in this section since similar statistical performance is
observed for the other sites as well.
Fig 18: Reproduction of Summary Statistics for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 19: Reproduction of Summary Statistics for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 20: Preservation of Serial Correlations for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 21: Preservation of Serial Correlations for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 22: Preservation of Marginal Distribution of the February and April monthflows for
Colorado River Basin (Calibration and Validation) at Site 4 - Model: AMHMABB
Fig 23: Preservation of multi-site drought characteristics Number of runs and Maximum Run
Sum for Colorado River Basin (Calibration and Validation)- Model: AMHMABB
The results of the reproduction of the summary statistics, the preservation of the dependence
structure of the historical streamflows and the preservation of marginal distributions for the
calibration and validation data sets are presented in Figs. 18-22.-. It is observed from Figures 18
and 19 that the mean and the standard deviation are well reproduced for all the months in case of
both calibration and validation. On the other hand, although the skewness is well preserved in the
calibration dataset, it is slightly deflated during three of the high skewness months (November,
December and January) in case of validation. In the calibration stage, the AMHMABB model is
able to preserve the lower lag (lag-1 andlag-2) as well as the higher lag (lag-3 and lag-4) serial
correlations (Fig: 20) well, although a small bias is noted in a few months, in case of higher lag
cross-year serial correlations. While, in case of validation, some bias is observed in the higher
lag serial correlations (lag-3 and lag-4) in a few months (Fig: 21). It may be observed from
Figure 22 that AMHMABB is able to mimic the distribution characteristics of the monthly
historical flows in both calibration and validation data sets. However, it may be noted that the in
both calibration and validation stages, the AMHMABB model exhibits only very limited
extrapolation value beyond the extremes.
For brevity, only the results of the preservation of number of runs, maximum (critical) run length
and the maximum (critical deficit) run sum for both calibration and validation datasets are
presented in Figure 23. It can be observed that the AMHMABB model is able to preserve the
maximum run sum well at all the truncations considered in calibration as well as validation. This
is expected in case of calibration data set, since the proposed model uses an objective function
that is explicitly related to preserving the critical deficit run sum characteristics. However, it can
be observed, that in case of validation also, the maximum run sum at the various truncation
levels specified are preserved; moreover, the number of runs are well preserved in both the
calibration and the validation stages (Fig. 23), although they are not explicitly specified in the
objective functions or constraints of the framework. It is to be noted that the critical run length is
slightly underestimated at both the calibration as well as the validation stages (Fig. 23). Thus, the
split sample validation performed in this study brings out the ability of the proposed hybrid
multi-season streamflow model (AMHMABB) in predicting the statistical as well as the multi-
site critical deficit run characteristics of the multi-site streamflows likely to occur in future. It
also shows that the hybrid model parameters obtained through evolutionary search using the
maximum run sum based objective functions and the constraint related to the preservation of the
inter-annual dependence, do not result in overfitting the calibration data set.
5. Summary and Conclusions
The simulation-optimization (S-O) framework developed for modeling the single-site multi-
season streamflows by Srivastav and Srinivasan (2011) is extended to modeling the multi-site
multi-season streamflows. Moreover, the single-site streamflow simulation model HMABB
proposed by Srinivas and Srinivasan (2006) is extended to the multi-site streamflow simulation
model and the same is also used as the simulation module within the proposed S-O framework
for modeling the multi-site multi-season streamflows.
The multi-objective optimization model formulated is the driver and the multi-site, multi-season
hybrid matched block bootstrap model is the simulation engine within this framework. In
addition to the constraints on the hybrid model parameter space, some specific constraints
regarding the preservation of certain statistics (such as inter-annual dependence or skewness of
aggregated annual flows) are introduced explicitly into the modeling framework with a view to
arrive at a hybrid stochastic model with an improved performance in terms of preserving the
statistics at more than one level and consequently, preserving the deficit run (drought)
characteristics accurately. A robust and efficient evolutionary search based technique, namely,
non-dominated sorting based genetic algorithm (NSGA - II) (Deb et al., 2002) is employed as the
solution technique for the multi-objective optimization within the S-O framework. The use of the
deficit sum related objective functions and the constraints imposed explicitly into the simulation-
optimization framework apparently enable the evolutionary search to effectively explore the
wide parameter space of the multi-site, multi-season hybrid model HMABB. The efficacy of the
proposed framework in simulating the multi-site multi-season streamflows is shown through a
case example from the Colorado river basin.
The proposed hybrid model AMHMABB preserves the temporal statistics (summary statistics,
linear dependence structure, state-dependent correlations (non-linear dependence) and the
marginal distributions at each site at monthly and aggregated annual levels) as well as the spatial
statistics (site-to-site lag-zero correlations) very well. Also, the other deficit run characteristics
namely, the number of runs, the maximum run length, the mean run sum and the mean run length
are well preserved by the AMHMABB model.
Overall, the AMHMABB model (obtained from the proposed S-O framework) is able to show
better streamflow modeling performance when compared with the simulation based SMHMABB
model (which is an extension of the single-site HMABB model proposed by Srinivas and
Srinivasan, 2006).The improved performance of the proposed AMHMABB over the
SMHMABB model, is plausibly due to the significant role played by: (i) the objective functions
related to the preservation of deficit run sum, which drives (directs) the search effectively; (ii)
the huge hybrid model parameter space available for the search and (iii) the constraint on the
preservation of the inter-annual dependence. Further, the split-sample validation results indicate
that the AMHMABB model is able to perform well in both calibration and validation phases,
indicating that the parameters derived from the proposed S-O framework are robust in predicting
the statistical and the critical deficit characteristics of the multi-site multi-season streamflows
likely to occur in the uncertain future. Moreover, the proposed AMHMABB model is found to
perform better in preserving the statistical as well as the preservation of the critical deficit run
characteristics of the observed flows, when compared with the multi-site parametric
disaggregation model (MDM).
Limitations and/or Future extensions of the AMHMABB models
1. The model dose not generate values that are quite different from the observed flows and
hence only limited smoothing is achieved. One way of alleviating this issue is to use
suitable perturbation methods to the resampled multi-site residuals. The perturbations
could be restricted to each within year block or the entire residuals.
2. Only limited number of flows are generated beyond extremes.
3. Although the AMHMABB is automated and searches the multi-dimensional space
effectively, , it is computationally demanding. Parallel computing could be resorted in
order to reduce the computational time.
4. In place of the linear parametric model, non-linear regression based models or data-
driven models such as ANN can be adopted.

5. Instead of the matched block bootstrap for resampling the residuals, some of the other
bootstrap methods such as multisite maximum entropy bootstrap (Srivastav et al., 2014)
can be tried.
Acknowledgments
The authors wish to thank the Indian Institute of Technology Madras, Chennai, India for the
continuous support and facilities offered to carry out this research work. The help rendered by
Prof. V.V. Srinivas, Indian Institute of Science, Bengaluru, India through sharing some of the
computer codes for verifying and validating the stochastic model is gratefully acknowledged.
The constructive suggestions and the insightful comments of the anonymous reviewers, the
associate editor and the Editor, Andreas Bardossy, were helpful in improving the quality of the
manuscript. The first author wishes to thank VIT University for providing the required
computational facilities through RGEMS.
References
Azadivar, F., Tompkins, G., 1999. Simulation optimization with qualitative variables and
structural model changes - a genetic algorithm approach. European Journal of Operational
Research 113, 169-182.
Back, T., Schwefel, H. P., 1993. An overview of evolutionary algorithms for parameter
optimization. Evolutionary Computation 1 (1), 1-24.
Carlstein, E., Do, K. A., Hall, P., Hesterberg, T., Kunsch, H., 1998. Matched-block bootstrap for
dependent data. Bernoulli 4, 305-328.

Carson, Y., Maria, A., 1997. Simulation optimization: methods and applications. Proceedings of
the 1997 Winter Simulation Conference, 118-126.
Chen, B. S., Lee, B. K., Peng, S. C., 2002. Maximum likelihood parameter estimation of
FARIMA processes using the genetic algorithm in the frequency domain. IEEE Transactions on
Signal Processing 50 (9), 2208-2220.
Chen L, V.P. Singh, S.L. Guo, J.Z. Zhou, J.H. Zhang, 2015. Copula-based method for multisite
monthly and daily streamflow simulation. Journal of Hydrology, 528, pp. 369–384
Claps P., F. Murrone 1994, Optimal parameter estimation of conceptually-based streamflow
models by time series aggregation, Stochastic and Statistical Methods in Hydrology and
Environmental Engineering, Springer, Netherlands, pp. 421–434
Cortez, P., Rocha, M., Neves, J., 2004. Evolving time series forecasting ARMA models. Journal
of Heuristics 10 (4), 415-429.
Davison, A. C., Hinkley, D. V. (Eds.), 1997. Bootstrap methods and theirapplication. Cambridge
University Press, Cambridge.
Deb, K., Pratap, A., Agrawal, S., Meyarivan, T., 2002. A fast and elitist multi-objective genetic
algorithm NSGA-II. IEEE Transactions on Evolutionary Computation 6 (2), 182-197.
Efstratiadis, A., Dialynas, Y.G., Kozanis, S., Koutsoyiannis, D., 2014. A multivariate
stochastic model for the generation of synthetic time series at multiple time scales reproducing
long-term persistence, Environmental Modeling & Software 62, 139-152.
Fiering, J. D., 1964. Multivariate technique for synthetic hydrology. Journal of Hydraulic
Engineering Division, ASCE 90, 43-60.

Fonseca, C. M., Fleming, P. J., 1995. An overview of evolutionary algorithmsin multi-objective
optimization. Evolutionary Computation Journal 3 (1),1-16.
Grantz, K., Rajagopalan, B., Clark, M., Zagona, E., 2006. A technique for incorporating large-
scale climate information in basin-scale ensemble streamflow forecasts. Water Resources
Research 41, W10410.
Grygier, J. C., Stedinger, J. R., 1988. Condensed disaggregation procedures and conservation
corrections for stochastic hydrology. Water Resources Research 24 (10), 1574-1584.
Haltiner, J. P., Salas, J. D., 1988. Development and testing of a multivariate seasonal ARMA
(1,1) model. Journal of Hydrology 104, 247-272.
Harms, A. A., Campbell, T. H., 1967. An extension to the Thomas-Fiering model for the
sequential generation of streamflow. Water Resources Research 3 (3), 653-661.
Hipel, K. W., Mcleod, A. I., 1994. Time series modeling of water resources and environmental
systems. Elseveir Science.
Ilich, N., 2014. An effective three-step algorithm for multi-site generation of weekly
stochastic hydrologic time series. Hydrol. Sci. J. 59 (1), 85-98.
Ilich N, Despotovic J., 2008, A simple method for effective multi-site generation of stochastic
hydrologic time series. Stochastic Environ Res Risk Assess 22(2):265–279
Izzeldin, M., Murphy, A., 2000. Bootstrapping the small sample critical values of the rescaled
range statistic. The Economic and Social Review31 (4), 351-359.
Koutsoyiannis, D., 1999. A nonlinear disaggregation model with a reduced parameter set for
simulation of hydrologic series. Water Resources Research28 (12), 3175-3191.

Koutsoyiannis, D., 2000. A generalized mathematical framework for stochastic simulation and
forecast of hydrologic time series. Water Resources Research 36 (6), 1519-1533.
Lall, U., 1995. Recent advances in nonparametric function estimation. Reviews of Geophysics,
1093-1102.
Lall, U., Sharma, A., 1996. A nearest neighbour bootstrap for resampling hydrologic time series.
Water Resources Research 32 (3), 679693.
Lane, W. L. (Ed.), 1982. Corrected parameter estimates for disaggregation schemes. In V. P.
Singh, Statistical Analysis of Rainfall and Runoff. WaterResources Publications, Littleton,
Colorado.
Langousis, 2006. A stochastic methodology for generation of seasonal time series reproducing
over year scaling behavior. Journal of Hydrology322 (1-4), 138-154.
Lee, T., Salas, J. D., Prairie, J., 2010. An enhanced nonparametric streamflow disaggregation
model with genetic algorithm. Water Resources Research 46 (W08545),
doi:10.1029/2009WR007761.
Li, C. and Singh, V.P., 2014. A Multi-model Regression-sampling Algorithm for generating rich
streamflow scenarios, Water Resources Research, 50, 5958-5979.
Loon V A, G. Laaha (2015), Hydrological drought severity explained by climate and catchment
characteristics, J. Hydrol., 526, pp. 3–14
Marković, Đ., Plavšić, J., Ilich N, Ilić S (2015), Non-parametric Stochastic Generation of
Streamflow Series at Multiple Locations, Water Resource Management, 29: 4787.
doi:10.1007/s11269-015-1090-z
Matalas, N. C., 1967. Mathematical assessment of synthetic hydrology. WaterResources
Research 3 (4), 937-945.
Mejia, J. M., Rousselle, J., 1976. Disaggregation models in hydrology revisited. Water
Resources Research 13 (2), 679-693.
Minerva, T., Poli, I., 2001. Building ARMA models with genetic algorithms, lecture notes in
computer science. Lecture Notes in Computer Science 2037,335-342.
Moon, Y. I., Lall, U., 1994. Kernel function estimator for flood frequency analysis. Water
Resources Research 30 (11), 3095-3103.
Oliveira, G. C., Kelman, J., Pereira, M. V. F., Stedinger, J. R., 1988.A representation of spatial
correlations in large stochastic seasonal streamflow models. Water Resources Research 24 (5),
781-785.
Ong, C. S., Huang, J. J., Tzeng, G. H., 2005. Model identification of ARIMA family using
genetic algorithms. Applied Mathematics and Computation164 (3), 885-912.
Peng, P., Chen, Q., 2003. Improved genetic algorithm and application to ARMA modeling. SICE
Annual Conference in Fukui August 4-6.
Pereira, M. V. F., Oliveira, G. C., Costa, C. C. G., Kelman, J., 1984.Stochastic streamflow
models for hydroelectric systems. Water Resources Research 20 (3), 379-390.
Pierreval, H., Paris, J. L., 2000. Distributed evolutionary algorithms for simulation optimization.
IEEE Transaction on Systems, Man and Cybernetics 20 (11), 15-24.

Prairie, J., Rajagopalan, B., Lall, U., Fulp, T., 2007. A stochastic nonparametric technique for
space-time disaggregation of streamflows. Water Resources Research 43 (3).
Prairie, J., Rajagopalan, B., Fulp, T. J., Zagona, E. A., 2006. Modified k-NN model for stochastic
stream flow simulation. Journal of hydrology11 (4), 371-378.
Rajagopalan, B., Lall, U., 1999. A k-nearest-neighbour simulator for daily precipitation and
other weather variables. Water Resources research35 (10), 3089-3101.
Rasmussen, P. F., Salas, J. D., Fagherazzi, L., Rassam, J.-C., Bobee, B.,1996. Estimation and
validation of contemporaneous PARMA models for streamflow simulation. Water Resources
Research 32 (10), 3151-3160.
Rolf, S., Sprave, J., Urfer, W., 1997. Model identification and parameter estimation of ARMA
models by means of evolutionary algorithms. In Proc.of IEEE/IAFE Conf. Computational
Intelligence for Financial Engineering (CIFEr'97), 237-243.
Salas, J. D. (Ed.), 1993. Analysis and modeling of hydrologic time series, in Handbook of
Hydrology, edited by D. R. Maidment, McGraw Hill, NewYork.
Salas, J. D., Delleur, J. W., Yevjevich, V., Lane, W. (Eds.), 1980. Applied Modeling of
Hydrologic Time Series. Water Resources Publications, Littleton, CO, USA.
Salas, J. D., Guillermo, Q., III, T., Bartolini, P., 1985. Approaches to multivariate modeling of
water resources time series. Water Resources Research 21 (4), 683-708.
Santos, E. G., Salas, J. D., 1992. Stepwise disaggregation scheme for synthetic hydrology.
Journal of Hydraulic Engineering 118 (5), 765-784.

Sharma, A., Tarboton, D. G., Lall, U., 1997. Streamflow simulation: A nonparametric approach.
Water Resources Research 33 (2), 291-308.
Silverman, B. W., 1986. Density estimation for statistics and dataanalysis: monograph on
statistics and applied probablilty.
Singhrattna, N., Rajagopalan, B., Clark, M., Kumar, K. K., 2005.Forecasting Thailand summer
monsoon rainfall. International Journal of Climatology 25 (5), 649-664.
Srinivas, V. V., Srinivasan, K., 2000. Post-blackening approach for modeling dependent annual
streamflows. Journal of Hydrology 230 (1-2), 86-126.
Srinivas, V. V., Srinivasan, K., 2001. A hybrid stochastic model for multiseason streamflow
simulation. Water Resources Research 37 (10),2537-2549.
Srinivas, V.V., Srinivasan, K., 2005. Hybrid moving block bootstrap for stochastic simulation of
multi-site multi-season streamflows. J. Hydrol. 302 (1-4),307-330.
Srinivas, V. V., Srinivasan, K., 2006. Hybrid matched-block bootstrap for stochastic simulation
of multi-season streamflows. Journal of Hydrology329 (1-2), 1-15.
Srivastav, R.K., Simonovic, S.P., 2014. An analytical procedure for multi-site, multiseason
streamflow generation using maximum entropy bootstrapping, Environmental Modelling
Software, 59, 59-75.
Srivastav, R.K., Srinivasan, K., Sudheer, K.P., 2011. Simulation-Optimization framework
for multi-season hybrid stochastic models, Journal of Hydrology 404 (3-4), 209-225.
Stedinger, J. R., Pei, D. (Eds.), 1982. An annual-monthly streamflow model for incorporating
parameter uncertainty into reservoir simulation. TimeSeries Methods in Hydroscience, Dev.
Water Sci., Elsevier, New York.
Stedinger, J. R., Pei, D., Cohn, T. A., 1985. A condensed disaggregation model for incorporating
parameter uncertainty into monthly reservoir simulation. Water Resources Research 21 (5), 665-
675.
Stedinger, J. R., Vogel, R.M., 1984. Disaggregation procedures for generating serially correlated
flow vectors. Water Resources Research 20 (11), 47-56.
Sudheer K P, K. Srinivasan, T.R. Neelakantan, V.V. Srinivas (2008), A non-linear data-driven
model for synthetic generation of annual stream-flows, Hydrol. Process., 22 (12), pp. 1831–1845
Tarboton, D. G., Sharma, A., Lall, U., 1998. Disaggregation procedures for stochastic hydrology
based on nonparametric density estimation. Water Resources Research 34 (1), 107-119.
Tekin, E., Ihsan, S., 2004. Simulation optimization: A comprehensive review on theory and
applications. IIE Transactions 36 (11), 1067-1081.
Ünes, F., Demirci, M., Kişi, Ö. (2015). Prediction of Millers Ferry Dam reservoir level in USA
using Artificial Neural Network. Periodica Polytechnica Civil Engineering, 59 (3) 309-318.
Valencia, Schaake, 1973. Disaggregation processes in stochastic hydrology. Water Resources
Research 9 (3), 580-585.
Vinod, H.D., 2006.Maximumentropy ensembles for time series inference in economics,
J. Asian Econ. 17 (6), 955-978.

Vogel, R. M., Shallcross, A. L., 1996. 1996. The moving blocks bootstrap versus parametric time
series models 32 (6), 1875-1882.
Voss, M. S., Feng, X., 2002. ARMA model selection using particle swarm optimization and AIC
criteria. 15th TriennialWorld Congress 15th TriennialWorld Congress, Barcelona, Spain.
Yates, D., Gangopadhyay, S., Rajagopalan, B., Strzepek, K., 2003. A technique for generating
regional climate scenarios using a nearest neighbor bootstrap. Water Resources Research 39 (7),
1199.
Yevjevich, V. (Ed.), 1976. Stochastic Processes in Hydrology. Water Resources Publications,
Fort Collins, CO.
Zelenhasic, E.,2002. On the Extreme Streamflow Drought Analysis, Water Resources
Management, 16, 105-132.
Zelenhasic, E. and Salvai, A., 1987. A Method of Streamflow Drought Analysis, Water
Resources Research, 23(1), 156-168.

Figure 1: Simulation-Optimization Framework for Stochastic Modeling of
Multi-site Multi-season Streamflows
Figure 2: Location of Streamflow Stations - Colorado River Basin (source: Google Maps)
2.20
Pareto Front
2.18
Selected Solutions
A
2.16
R-RMSE in MARS
2.14
2.12
B
2.10
2.08
C
0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60

R-Bias in MARS
Figure 3: Pareto-front between R-Bias(MARS) and R-RMSE(MARS)

a)
Historical AMHMABB SMHMABB

2000
1600
Mean 1200
800
400
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
800
Standard Deviation
600
400
200
0
2.4
2.0
Skewness
1.6
1.2
0.8
0.4
0.0
Month
b)
Historical AMHMABB SMHMABB

6000
5000
4000
Mean
3000
2000
1000
0
Standard Deviation
2000
1600
1200
800
400
0
2.4
2.0
Skewness
1.6
1.2
0.8
0.4
0.0
Month
Figure 4: Reproduction of Summary Statistics - A Comparison between AMHMABB and

SMHMABB Models (flow units: Mm3/month): a) at site 2; b) at site 4
a)
1.0 1.0
Site 2 Site 2
0.8 0.8
Lag1 Correlation
Lag2 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
Historical Historical
AMHMABB AMHMABB
SMHMABB SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
1.0 1.0
Historical Site 2 Historical Site 2
AMHMABB AMHMABB
SMHMABB SMHMABB
0.8 0.8
Lag3 Correlation
Lag4 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
Month Month
b)
1.0 1.0
Historical
Site 4 AMHMABB
Site 4
SMHMABB
0.8 0.8
Lag 2 Correlation
Lag 1 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Month Month
1.0 1.0
Historical Site 4 Historical Site 4
AMHMABB AMHMABB
SMHMABB SMHMABB
0.8 0.8
Lag 4 Correlation
Lag 3 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
Month Month
Figure 5: Preservation of Serial Correlations - A Comparison between AMHMABB and

SMHMABB Models: a) at site 2; b) at site 4
a)
0.85
Site 1 to Site 3
0.80
0.75
0.70
0.65
Lag 0
0.60
0.55
0.50
0.45 Hist
SMHMABB
0.40 AMHMABB
Month
b) 1.00
Site 1 to Site 4
0.98
0.96
0.94
0.92
0.90
Lag 0
0.88
0.86
0.84
0.82
0.80 Hist
0.78 SMHMABB
AMHMABB-A
0.76
Month
c) 1.00
Site 2 to Site 4
0.95
0.90
0.85
Lag 0
0.80
0.75
0.70 Hist
SMHMABB
AMHMABB
0.65
Month
Figure 6: Preservation of Lag-zero site-to-site monthly correlations - A Comparison between

AMHMABB and SMHMABB Models: a) site 1 to site 3; b) site 1 to site 4; c) site 2 to site 4
1.0 1.0
a) Historical
ABOVE & BACKWARD CORRELATIONS

ABOVE & FORWARD CORRELATIONS
AMHMABB
0.8 0.8 SMHMABB
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Month Month
1.0 1.0
BELOW & BACKWARD CORRELATIONS

BELOW & FORWARD CORRELATIONS
0.8
0.8
0.6
0.6
0.4
0.4
0.2
Historical 0.2
0.0 Historical
AMHMABB AMHMABB
SMHMABB SMHMABB
-0.2 0.0
Month Month
1.0 1.0
b) Historical
ABOVE & BACKWARD CORRELATIONS
AMHMABB
ABOVE & FORWARD CORRELATIONS
SMHMABB
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Month Month
1.0 1.0
Historical
BELOW & BACKWARD CORRELATIONS
AMHMABB
BELOW & FORWARD CORRELATIONS
SMHMABB
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Month Month
Figure 7: Preservation of state-dependent correlations - A Comparison between

AMHMABB and SMHMABB Models: a) at site 2; b) at site 4
Figure 8: Preservation of the marginal Distribution of the July month streamflows at site 1 - A Comparison between AMHMABB and
SMHMABB Models (flow units: Mm3/month)
Figure 9: Preservation of the marginal Distribution of the December month streamflows at site 1 - A Comparison between
AMHMABB and SMHMABB Models (flow units: Mm3/month)
Figure 10: Preservation of the marginal Distribution of the August month streamflows at site 2 - A Comparison between AMHMABB
and SMHMABB Models (flow units: Mm3/month)
Figure 11: Preservation of a) minimum flows and b) maximum flows at site 4 -
Model: AMHMABB (flow units: Mm3/month)
Figure 12: Preservation of Deficit Run (Drought) Characteristics - A Comparison between
AMHMABB and SMHMABB Models
(MARL: maximum run length; MARS: maximum run sum; MERL: mean run length; MERS: mean run sum)
Comparison to Disaggregation Model
Fig 13: Reproduction of Summary Statistics for Colorado River Basin at Site 4 - A comparison
between AMHMABB model and Disaggregation model (Flow units - Mm3/month)
Fig 14: Preservation of Serial Correlations for Colorado River Basin at Site 4 - A comparison
between AMHMABB model and Disaggregation model
Fig 15: Preservation of State-dependent Correlations for Colorado River Basin at Site 4 - A
Fig 16: Preservation of Marginal Distribution of March and June month flows at Site 4 for
Colorado River Basin - A comparison between Disaggregation model (MDM) and AMHMABB
models (Flow units - Mm3/month)
Fig 17: Preservation of multi-site drought characteristics (a) Number of runs; (b) Maximum Run
Length; (c) Maximum Run Sum for Colorado River Basin - A comparison between
Disaggregation model (MDM) and AMHMABB models
Split-Sample Validation
Fig 18: Reproduction of Summary Statistics for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 19: Reproduction of Summary Statistics for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 20: Preservation of Serial Correlations for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 21: Preservation of Serial Correlations for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 22: Preservation of Marginal Distribution of the February and April month flows for
Colorado River Basin (Calibration and Validation) at Site 4 - Model: AMHMABB
35000 30000
30000 Historical Historical

25000 AMHMABB -Validation
AMHMABB -Calibration
Maximum Run Sum
Maximum Run Sum

25000
20000
20000
15000
15000
10000
10000
5000 5000
0 0
40 50 60 70 80 90 100 110 40 50 60 70 80 90 100 110
Threshold Threshold
100 Historical 60
AMHMABB -Calibration Historical
80 AMHMABB -Validation
45
Number of Runs
Number of Runs
60
30
40
15
20
0
0
50 60 70 80 90 100 40 50 60 70 80 90 100 110
Threshold Threshold
21 18
Historical Historical
AMHMABB -Calibration
Maximum Run Length
Maximum Run Length
AMHMABB -Validation
14
12
7
6
0
40 60 80 100 50 60 70 80 90 100
Threshold Threshold
Fig 23: Preservation of multi-site drought characteristics Maximum Run Sum, Number of runs
and Maximum Run Length for Colorado River Basin (Calibration and Validation) - Model:
AMHMABB
Table 1: Location of the selected stations for the multi-site multi-season flow modeling
River (USGS Location of Record Referred as

Station Number) the station Duration in this study
Colorado River Near Cisco,
1906-2007 Site 1
(9180500) Utah
Green River
Utah 1906-2007 Site 2
(9315000)
San Juan River
Bluff, Utah 1906-2007 Site 3
(9379500)
Colorado River Lees Ferry, Site 4
1906-2007
(09380000) Arizona (Key Site)
Table 2: Parameters for the selected multi-site HMABB models SMHMABB and AMHMABB - Colorado River Basin
Non-Parametric
Parametric Component Component
Site Para- Window Block
Number Model meters Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Model Size Size
SMHMABB
MABB 9 4,4,4
Site 1 PAR(1) 0.6247 0.7778 0.8548 0.7492 0.6710 0.5147 0.4539 0.5107 0.6178 0.8515 0.8421 0.6764
Site 2 PAR(1) 0.6515 0.8395 0.7528 0.5319 0.4871 0.3135 0.4968 0.6179 0.5903 0.7946 0.8152 0.7117
Site 3 PAR(1) 0.3054 0.7107 0.7641 0.5903 0.5503 0.5681 0.6455 0.7505 0.7589 0.8379 0.4850 0.5154
Site 4 PAR(1) 0.5104 0.7558 0.8249 0.6546 0.5492 0.4679 0.4507 0.5917 0.6252 0.8365 0.7860 0.6437
AMHMABB
MABB 5 6,6
Site 1 PAR(1) 0.9279 0.9500 0.9426 0.9500 0.9389 0.2865 0.6330 0.4708 0.3049 0.5556 0.0800 0.0505
Site 2 PAR(1) 0.9500 0.9500 0.9500 0.9316 0.9058 0.0395 0.7546 0.9389 0.1243 0.9095 0.4081 0.4671
Site 3 PAR(1) 0.1132 0.3823 0.2607 0.5851 0.1833 0.5224 0.4782 0.4671 0.0948 0.2127 0.1464 0.3897
Site 4 PAR(1) 0.9389 0.9463 0.9389 0.9389 0.9353 0.3270 0.7141 0.8357 0.9242 0.9095 0.7952 0.8468
are the first-order periodic (monthly) autoregressive parameters of the hybrid models
Table 3: Reproduction of the Aggregated Annual Flow Statistics - Comparison between
SMHMABB and AMHMABB models (values in parentheses denote the standard deviation over
300 replicates)
Aggregated Annual Flow Statistics

Mean Standard Skewness Lag 1
Deviation Correlation
Site 1
Historical 8378.9 2413.8 0.25 0.292
SMHMABB 8377.3 2266.4 0.26 0.029
(239.2) (148.4) (0.2) (0.098)
AMHMABB 8382.7 2405.4 0.32 0.229
(293.1) (179.6) (0.22) (0.098)
Site 2
Historical 6642.9 2010.7 0.35 0.291
SMHMABB 6645.6 1877.2 0.27 0.037
(199.8) (125.7) (0.19) (0.099)
AMHMABB 6620.3 1916.6 0.23 0.242
(234.0) (138.9) (0.17) (0.095)
Site 3
Historical 2638.2 1079.4 0.36 0.113
SMHMABB 2632.4 936.8 0.38 0.007
(99.4) (65.3) (0.19) (0.097)
AMHMABB 2632.2 977.2 0.31 0.130
(115.8) (59.8) (0.19) (0.102)
Site 4
Historical 18504.9 5345.6 0.16 0.275
SMHMABB 18500 4985.4 0.18 0.027
(523.2) (314.4) (0.18) (0.098)
AMHMABB 18498.1 5579.6 0.16 0.251
(703.2) (375.1) (0.19) (0.095)
76
Highlight Points:
 Simulation-optimization model for hybrid multi-site multi-season streamflows

 Extended the single-site hybrid matched block bootstrap model to multi-site multi-season
 Better performance due to deficit sum based objective functions and inter-annual
dependence constraint
 Efficacy of the proposed model illustrated by a case example of Colorado river basin
77

B925-03 - Standard Practices For Production and Preparation of PM Test Specimens

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

B925-03 - Standard Practices For Production and Preparation of PM Test Specimens

Încărcat de

Drepturi de autor:

Formate disponibile

Accepted Manuscript

Simulation-Optimization Framework for Multi-Site Multi-Season Hybrid Sto-

Roshan Srivastav, K. Srinivasan, K.P. Sudheer

To appear in: Journal of Hydrology

Received Date: 23 January 2016

Stochastic Streamflow Modeling

Roshan Srivastava,b, K.Srinivasanc, K.P.Sudheerc

A simulation-optimization (S-O) framework is developed for the hybrid stochastic modeling of

multi-site multi-season streamflows. The multi-objective optimization model formulated is the

Key words: Stochastic streamflow models; Simulation-Optimization;NSGA-II;Evolutionary

Algorithms;Hybrid matched block bootstrap.

multivariate/multi-site time seriesmodels used in hydrology is presented by Salas et al. (1980),

Salas and Lee (2010).

set of models known as contemporaneous AR/ARMA models (CAR/CARMA), wherein the

preservation of dependence structure of concurrent streamflows at the various stations was

the spatial cross-correlations of flows. (Rasmussen et al., 1996)..

development of disaggregation models in the hydrologic literature(Harms and Campbell,1967;

Valencia and Schaake,1973; Mejia and Rousselle,1976). Preservation of a wide range of

streamflows of the AR/ARMA (Box-Jenkins) type including the multi-site parametric

Koutsoyiannis (1999) developed a parsimonious nonlinear multi-variate dynamic disaggregation

However, it is reported to be quite complex owing to several steps of nonlinear multi-variate

optimization(Langousis, 2006).Recently, Efstratiadis et al. (2014) have presented a multi-variate

number of computational complexities concerning multi-variate parameter estimation and the

dependencies were modeled by combining bivariate copulas and conditional probability

easily estimated and (ii) the computational time is less.

different runoff generating mechanisms. A unique feature of the non-parametric technique is to

experienced. Silverman (1986) discusses a wide range of non-parametric methods, while

environmental applications. Non-parametric methods have been applied to a wide variety of

The current research study proposes an extension of the simulation-optimization (S-O)

framework developed for single-site multi-season streamflows by Srivastav et al.(2011) to multi-

relevant water-use related objective functions explicitly into the simulation-optimization

parametric and non-parametric components). In addition to the constraints on the model

as inter-annual dependence or skewness of aggregated annual flows) are introduced explicitly

a case example from the Colorado River basin.

2. Simulation-Optimization Framework for Multi-site Multi-season Streamflow Modeling

output of a simulation model is used by an optimization strategy to provide feedback on progress

(EAs) in simulation-optimization problems mainly because they do not require restrictive

2.1. Application of GA in Time Series Modeling

successfully applied to single-site, multi-season streamflow modeling by Srivastav and

2.2. Simulation-Optimization Modeling Framework

The proposed simulation-optimization (S-O) modeling framework for multi-site multi-season

simulation model MHMABB is an extension of the single-site HMABB model proposed by

Srinivas and Srinivasan (2006).

Figure 1: Simulation-Optimization Framework for Stochastic Modeling of Multi-site

critical water-use related statistics at higher truncation levels.

2.2.1. Multi-site Multi-season HMABB as the Simulator

Proposed Algorithm for MHMABB

modeling steps are as follows:

2. Pre-whiten the standardized historical streamflows, , partially, using a parsimonious

periodic model PAR(1), and extract the residuals, , at each

present in the multi-site multi-season residuals effectively.

3. Obtain one set of replicates of the simulated innovations

at eachsite k, by contemporaneous resampling of the multi-site, within-year non-

Form the sets

isthe contemporaneous end element which can be obtained using

ranks. Let denote the contemporaneous rank of , where and . The

The following are the steps in the resampling algorithm:

contemporaneous within-year block. This holds for all ‘k’ sites.

replicate of simulated innovations ( is obtained.

5. Inverse standardize to obtain synthetic streamflow series .

bootstrap model (HMABB) proposed by Srinivas and Srinivasan (2006).

dependence is preserved due to the contemporaneous resampling of the within-year blocks.

nearest neighbors to the current within-year block is selected and appended.