Documente Academic
Documente Profesional
Documente Cultură
Research papers
PII: S0022-1694(16)30579-0
DOI: http://dx.doi.org/10.1016/j.jhydrol.2016.09.025
Reference: HYDROL 21521
Please cite this article as: Srivastav, R., Srinivasan, K., Sudheer, K.P., Simulation-Optimization Framework for
Multi-Site Multi-Season Hybrid Stochastic Streamflow Modeling, Journal of Hydrology (2016), doi: http://
dx.doi.org/10.1016/j.jhydrol.2016.09.025
This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Simulation-Optimization Framework for Multi-Site Multi-Season Hybrid
a
Associate Professor, School of Civil and Chemical Engineering, VIT University, Vellore, Tamilnadu, India 632014
b
Former PhD Scholar,EWRE Division, Dept. of Civil Engineering, IIT Madras, Chennai, Tamilnadu, India 600036
c
Professor, EWRE Division, Dept. of Civil Engineering, IIT Madras, Chennai, Tamilnadu, India 600036
Abstract
driver and the multi-site, multi-season hybrid matched block bootstrap model (MHMABB) is the
simulation engine within this framework. The multi-site multi-season simulation model is the
extension of the existing single-site multi-season simulation model. A robust and efficient
evolutionary search based technique, namely, non-dominated sorting based genetic algorithm
(NSGA - II) is employed as the solution technique for the multi-objective optimization within the
S-O framework. The objective functions employed are related to the preservation of the multi-
site critical deficit run sum and the constraints introduced are concerned with the hybrid model
parameter space, and the preservation of certain statistics (such as inter-annual dependence
and/or skewness of aggregated annual flows). The efficacy of the proposed S-O framework is
brought out through a case example from the Colorado river basin. The proposed multi-site
multi-season model AMHMABB (whose parameters are obtained from the proposed S-O
framework) preserves the temporal as well as the spatial statistics of the historical flows. Also,
the other multi-site deficit run characteristics namely, the number of runs, the maximum run
length, the mean run sum and the mean run length are well preserved by the AMHMABB model.
Overall, the proposed AMHMABB model is able to show better streamflow modeling
performance when compared with the simulation based SMHMABB model, plausibly due to the
significant role played by: (i) the objective functions related to the preservation of multi-site
critical deficit run sum; (ii) the huge hybrid model parameter space available for the evolutionary
search and (iii) the constraint on the preservation of the inter-annual dependence. Split-sample
validation results indicate that the AMHMABB model is able to predict the characteristics of the
multi-site multi-season streamflows under uncertain future. Also, the AMHMABB model is
found to perform better than the linear multi-site disaggregation model (MDM) in preserving the
statistical as well as the multi-site critical deficit run characteristics of the observed flows.
However, a major drawback of the hybrid models persists in case of the AMHMABB model as
well, of not being able to synthetically generate enough number of flows beyond the observed
extreme flows, and not being able to generate values that are quite different from the observed
flows.
Starting with Fiering (1964) and Matalas (1967), there have been a number of attempts in
hydrology to model multi-site/multi-variate streamflows. These belong to one of the two basic
types, i) parametric and ii) non- parametric models. A detailed review of the parametric type of
Salas (1993) and McLeod and Hipel (1994), while the various types of non-parametric models in
use are reviewed by Lall (1995), Lall and Sharma (1996), Srinivas and Srinivasan (2005) and
The parametric type of models may be classified as: i) Periodic vector AR/ARMA models; ii)
contemporaneous AR/ARMA models; iii) Disaggregation models. The PAR / PARMA models
need to estimate a large number of parameters jointly, to account for the periodic space-time
dependence, especially at shorter time scale, with the available historical samples of limited
record length. Moreover, the parameter estimates may be unstable and may lead to poor
reproduction of some of the important statistics. This motivated the development of a simplified
effected through model decoupling (Stedinger et al., 1985 and Salas et al., 1985). However, the
complex structure of some of the individual site models could impede the exact preservation of
The need to preserve the statistical properties at more than one level necessitated the
estimation of a large number of parameters in case of disaggregation models which may not be
feasible with the limited hydrologic data available. Hence, staged disaggregation models(Lane,
1982; Stedinger andVogel, 1984; Grygier and Stedinger, 1988; Santos and Salas, 1992) and
condensed disaggregation models (Lane, 1982; Stedinger and Pei, 1982; Pereira et al., 1984;
Oliveira et al., 1988; Stedinger et al., 1985;Grygier and Stedinger, 1988) were developed with a
view to reduce the number of parameters to make them computationally more amenable.
Moreover, empirical adjustment procedures were suggested by Grygier and Stedinger (1988)to
restore the summability of the disaggregated flows to the aggregate flows, especially when
normalizing transformations were applied to flows. The traditional linear parametric models of
disaggregation models can provide only a linear control system representation of watershed
processes, while the various physical components of streamflow such as snowmelt runoff, soil
water retention as well as soil drainage are dynamic, non-linear processes. Also, non-stationarity
trends owing to the underlying dynamics of the physical processes may not be captured
effectively.
model (DDM) that followed a two-step approach for simulation of hydrologic time series.
Following this, a generalized mathematical framework for stochastic simulation and forecasting
problems in hydrology was proposed by Koutsoyiannis (2000) for modeling stochastic processes
with short- or long-term memory structure, in which a generalized autocovariance function was
implemented within a generalized moving average generating scheme. Although the DDM and
the further developments (Koutsoyiannis, 2000, 2001) were reported to reproduce long-term
dependence, and were validated for practical water resourcesuse, the computational complexity
involved was high. Langousis (2006) proposed an approach that directly deals with the
hydrologic data at the seasonal time scale, but still preserves both the seasonal and the annual
statistics and the over-year scaling behavior without restoring to disaggregation techniques.
parametric stochastic modeling framework that preserves the important statistical characteristics
of the data at multiple sites and at daily, monthly and annual time scales which also involves a
adjustments and refinements required to reduce the biases in the simulations. The limitations of
the parametric stochastic disaggregation models concerning preservation of complex spatial and
temporal dependence structure and reproduction of the non-standard marginal distributions have
been brought out by Sharma and O'Neill(2002). Recently, copula-based multisite stochastic
simulation models have been proposed by Chen et al. (2015). The spatial and temporal
distributions. The main advantages of this method are (i) the parameters of the model can be
On the other hand, non-parametric models can provide more accurate representation of the non-
linear dynamics of the physical watershed processes by way of effectively modeling the complex
dependence structure present in the streamflow data. Also, they can successfully mimic the bi-
modality present in the marginal distributions in certain months that may be caused due to
reproduce the empirical structure of multi-variate datasets without recourse to assumptions about
data or model structure. Moreover, the complexities associated with parameter estimation are not
Lall(1995) provides a review of the non-parametric techniques applied to a variety of water and
hydro-climate modeling problems that include stochastic daily weather generation (Rajagopalan
and Lall, 1999; Yates et al., 2003), streamflow simulation(Lall and Sharma, 1996; Sharma et al.,
1997; Prairie et al., 2006), streamflow forecasting (Grantz et al., 2006; Singhrattna et al., 2005),
and flood frequency estimation (Moon and Lall, 1994). Some of the non-parametric techniques
that are often used in hydrology are: moving block bootstrap (MBB) (Vogel and Shallcross,
1996);k-nearest neighbor (k-NN) bootstrap (Lall and Sharma, 1996) and its variations and
improvements (Prairie et al., 2007; Lee et al., 2010; Salas and Lee, 2010); kernel based methods
(Sharma et al., 1997; Tarboton et al., 1998); and matched block bootstrap (MABB) (Srinivas and
Srinivasan, 2005b).Salas and Lee (2010) have presented a review of the non-parametric models
used in streamflow modeling, clearly bringing out the limitations of each model.
Prairie et al. (2006) proposed a modified k-NN approach that enables the simulation of values
not seen in the historical record, which has recently been improved by Li and Singh (2014)
through the implementation of a multi-model simulation scheme. Also, Salas and Lee(2010)
have employed the k-nearest neighbor resampling algorithm with gamma kernel perturbation to
generate the seasonal data by conditioning the annual data. Although these models perform well
in simulating multi-season streamflows, they are applicable only to modeling single site data.
Prairie et al. (2007) presented a parsimonious non-parametric disaggregation model for space-
time simulation of streamflows at river basin level, extending the single site temporal
disaggregation scheme of Tarboton et al. (1998) by replacing the tedious kernel based methods
with the k-NN approach. Although this method captures the distributional characteristics and the
spatial dependencies well, a number of limitations have been pointed out by Lee et al. (2010)
concerning underestimation of critical drought characteristics and repetitious nature of the data
patterns being generated. Lee et al. (2010) have proposed a spatio-temporal disaggregation
model that generates the higher level variable (e.g., annual flow data) based on any parametric or
non-parametric model, then generates the lower level sequence (e.g., seasonal flow data) by
applying k-nearest neighbor resampling in such a way that their sum is close to the higher level
generated flow data. Moreover, genetic algorithm based mixing is implemented to achieve
variety in the generated data. This multi-site multi-season non-parametric disaggregation model
is reported to yield better simulations than that of Prairie et al. (2007).More recently, based on
the maximum entropy bootstrap (MEB) modeling approach proposed by Vinod (2006) for
economic time series. Srivastav and Simonovic (2014) have developed a computationally less
demanding and simple procedure to model multi-site, multi-season stream flows. The orthogonal
transformation is used with MEB to capture the spatial dependence present in the multi-site
collinear data. Ilich and Despotvic (2008) and Ilich (2014) have developed a three step non-
parametric algorithm for multi-site generation of hydrologic series. This involves the generation
of random variables that reproduce any arbitrary marginal, followed by reordering and permuting
of the generated data such that the serial correlations, cross-correlations, annual level
autocorrelations and the correlations between the end of the previous year and the beginning of
the current year are preserved. Following this, Markovic et al. (2015) have introduced two
modifications to the above algorithm to model high skew and outliers present in the data and to
obtain a number of extreme dry years in the simulated series. In recent times, data-driven models
(Ahmed and Sharma, 2007; Sudheer et al., 2008; Ünes et al., 2015) are also used in the stochastic
hydrology literature to model hydrologic data. However, these prediction models seem to be
limited to single site modeling. Moreover, these models cannot generate data outside the
observed range.
Srinivas and Srinivasan (2000, 2001) introduced hybrid stochastic streamflow models based on
the post-blackening approach proposed by Davison and Hinkley (1997). This approach used a
parsimonious linear parametric model for partial pre-whitening of the observed streamflows,
followed by resampling of the residuals extracted using moving block bootstrap (MBB) to
generate innovations which were then post-blackened to synthesize stochastic replicates of the
observed flows. The single site multi-season HMBB model of Srinivas and Srinivasan (2001)
was extended to multi-site multi-season model by Srinivas and Srinivasan (2005). Moreover,
Srinivas and Srinivasan (2006) proposed the hybrid matched block bootstrap (HMABB) for
modeling single site multi-season streamflows, using the rank matching idea of Carlstein et al.
(1998) for resampling the residuals. In comparison to the low-order linear parametric models and
the HMBB, the HMABB model was shown to provide better simulations of multi-season
streamflows with complex dependence structure. Moreover, the HMABB model is able to yield
sufficient variability of the streamflow characteristics owing to the use of smaller within-year
block sizes. However, the following limitations of the HMBB models seem to be present in case
of the HMABB as well: (i) poor preservation of the statistics at the aggregated time scale which
affects the preservation of the critical drought characteristics at higher truncation levels; (ii) the
smoothing and the extrapolation value added is limited, since the generated flows lie close to the
observed flow values; (iii) the identification of the appropriate hybrid model is quite tedious.
Further improvement to the single site multi-season HMABB model was done by Srivastav and
Srinivasan (2011) by way of automating the selection of the appropriate HMABB model through
a simulation-optimization framework and introducing a constraint into the framework for the
preservation of the streamflow statistics at the aggregated (annual) level. This effected in an
improvement in the better preservation of the storage and the drought characteristics.
site multi-season streamflows. Another contribution is the extension of the single-site multi-
season simulation model HMABB proposed by Srinivas and Srinivasan (2006) to the multi-site
multi-season simulation model (MHMABB) for use as the simulation module within the
proposed S-O framework for modeling the multi-site multi-season streamflows. A robust and
efficient evolutionary search based technique, namely, non-dominated sorting based genetic
algorithm (NSGA - II) (Deb et al., 2002) is employed as the solution technique for the multi-
objective optimization within the S-O framework. The multi-objective optimization model
formulated will be the driver and the multi-site, multi-season hybrid matched block bootstrap
model (MHMABB) will be the simulation engine within this framework. The idea of using the
framework directs the evolutionary search to explore the wide parameter space of the multi-site,
multi-season hybrid model HMABB, subject to the necessary constraints on the hybrid model
parameter space and to find the appropriate hybrid model (described by a combination of
parameter space, some specific constraints regarding the preservation of certain statistics (such
into the modeling framework with a view to arrive at a hybrid stochastic model with an improved
performance in terms of preserving the statistics at more than one level and consequently,
preserving the multi-site deficit run (drought) characteristics accurately. The efficacy of the
proposed S-O framework in simulating the multi-site multi-season streamflows is shown through
Simulation-optimization modeling can be defined as the process of finding the best input
variable values from among all possibilities without explicitly evaluating each possibility. The
of the search for the optimal solution. This in turn guides further input to the simulation model
(Carson and Maria, 1997). A comprehensive review on theory and applications of simulation-
optimization modeling has been presented by Tekin and Ihsan (2004). The S-O methodology has
been employed beneficially in a number of research works in the field of water resources
planning and management, which have been documented by Nicklow et al. (2010).
In the last few decades, there has been an increasing interest in using Evolutionary Algorithms
assumptions or prior knowledge about the shape of the response surface (Back and Schwefel,
1993). Evolutionary Algorithms (EAs) are heuristic search methods that implement ideas from
the evolution process. As opposed to a single solution used in traditional methods, EAs work on
a population of solutions in such a way that poor solutions become extinct, whereas the good
solutions are likely to reach the optimum (survival of the fittest). When the response surface is
high-dimensional, discontinuous, and non-differentiable, the traditional methods may often fail
to find the optimal solution, while methods such as evolutionary algorithms can be applied
successfully to these types of problems (Azadivarand Tompkins, 1999; Pierreval and Paris,
2000).In general, an EA for simulation-optimization can be described as follows: (i) generate a
population of solutions; (ii) evaluate these solutions through a simulation model; (iii)perform
selection, apply genetic operators to produce a new offspring (or solution), and insert it into the
population; and (iv) repeat until some stopping criterion is reached. From the literature, the most
popular EAs are known to be Genetic Algorithms (GAs) (Goldberg, 1989). In general, each point
in the solution space is represented by a string of values for the decision variables. The use of
appropriate cross-over and mutation operators reduces the probability of trapping to a local
optimum. The elitism property enables the carry-over of competent solutions through successive
generations.
Rolf et al. (1997) stated that the aim of combining traditional ARMA modeling knowledge and
evolutionary algorithms would be to provide a tool that would be able to automate the three step
process of time series modeling. Following this, in the last decade, a few research studies (Cortez
et al., 2004; Voss and Feng, 2002; Minerva andPoli, 2001; Peng and Chen, 2003; Ong et al.,
2005; Chen et al., 2002) have employed evolutionary search algorithms for automating the three-
step time series modeling approach of Box-Jenkins ARMA models. The above research studies
bring out the efficacy of evolutionary techniques in model identification and parameter
estimation of Box-Jenkins type of models (AR, ARMA, ARIMA, SARIMA, FARIMA). For
model identification, fitness functions such as AIC, BIC are used in the GA framework, while
some form of statistical performance criteria (such as minimization of sum of squared errors,
maximization of likelihood functions) are used in case of parameter estimation. In case of non-
parametric models (such as k-NN, Kernel based models), the fitness function can be to minimize
the generalized cross-validation score. However, in case of the more complex multi-site, multi-
season hybrid models, no such statistical criteria are available for model identification and
parameter estimation. Hence, it has been decided to adopt (employ) water-use (reservoir
storage/drought) related criteria (mentioned in the following section) as the objective functions in
the Multi-objective GA (MOGA) based framework proposed in this study. Incidentally, these
criteria can be expected to preserve the basic statistical characteristics (such as summary
statistics, marginal distributions). It is to be mentioned that this approach has already been
Srinivasan (2011).
streamflow modeling is shown in Fig. 1. It consists of the multi-objective optimization model (as
the driver), and the multi-site multi-season hybrid matched block bootstrap model MHMABB
developed in this study as the simulator embedded into it. The multi-site multi-season hybrid
Multi-season Streamflows
As discussed earlier, the S-O modeling framework primarily aims to enhance the performance of
the hybrid stochastic models in simulating the streamflows for water resources planning use. The
secondary aim of the framework is to minimize the drudgery, judgment and subjectivity involved
in the selection of the most appropriate hybrid stochastic model. The special features introduced
into the S-O framework to achieve the above are: i) critical water-use related objective functions
in the driver of the framework; ii) a powerful multi-objective evolutionary search based tool
(NSGA-II) (Deb et al., 2002) to explore the huge hybrid model parameter space and obtain a set
of competent hybrid stochastic models automatically; and iii) a constraint to enable the
preservation of the inter-annual dependence, which may be helpful in the preservation of the
In this study, the single-site multi-season hybrid matched block bootstrap (HMABB) proposed
by Srinivas and Srinivasan (2006) is extended to multi-site multi-season hybrid matched block
bootstrap (MHMABB). The hybrid model effectively blends the parametric component (the low-
order PAR(1)model at each site) and the non-parametric component (multi-site multi-season
matched block bootstrap). The proposed extension of the simulation algorithm is presented
below.
Let the time series of historical streamflows be denoted by the vector where the superscript k
denotes the site index (k = 1,...,nk), v is the index for year (v = 1,...,N) and denotes the index for
season (period)within the year ( = 1,...,ω); nk refers to the number of sites; N represents the
number of years of historical record and ω denotes the number of periods within the year. The
1. Standardize the elements of the historical streamflows, i.e., the vector using
(1)
where and represents the mean and the standard deviation respectively, of the observed
streamflows in the period at the kth site. Note that the historical streamflows are not transformed
to remove skewness.
site k, using
(2)
where is the first order periodic autoregressive parameter for period , at the kth site. The
purpose of partial pre-whitening using a parsimonious PAR(1) structure at each site is to utilize
the potential of the proposed non-parametric component, multi-site multi-season MABB, that
can capture the weak linear dependence structure and the non-linear dependence structure
overlapping blocks of residuals, using the proposed multi-site rank-matched block bootstrap
(MABB) method. The key steps involved in the resampling algorithm are as follows:
(a) For each site k, prepare n non-overlapping within-year blocks (such as ) using
the residuals with the respective lengths being L1,...,Ln such that the lengths of all the within-
year blocks sum to ω, i.e., . Note that the lengths of all the within-year blocks are the
same for all the sites to enable resampling of contemporaneous blocks of residuals, so that the
site-to-site cross-correlations (dependence across the sites) are captured. Herein, denotes the
ith within-year block for the year v of the record, at site k. Let denote the end elementof .
(b) For the contemporaneous selection of the within-year blocks, the end elements of the block i
for each site k, has to be combined by using an appropriate strategy, to obtain a fictitious
contemporaneous end element. In this research work, the strategy based on the Euclidean
distance (ED) is adopted and presented here. Form the sets where
(3)
(c) Arrange the elements of in ascending (or descending order)of their magnitude and assign
algorithm is initialized by randomly selecting one of the “N” first within-year blocks
contemporaneously. Let it be the current contemporaneous within-year block for all the sites.
i. Identify the rank corresponding to the current contemporaneous within-year block. Let it be
denoted by .
ii. Select all the contemporaneous end elements whose ranks fall within a bandwidth w (= 2m+1),
ranging from and , where m is the window parameter which is a small positive
integer. These form the set of nearest neighbors to the current contemporaneous end element
(which has rank ). From this, randomly select one of the neighboring contemporaneous end
elements. This requires generating a uniform random number "U" in the range of integers
and .
iii. Obtain the contemporaneous within-year block that follows the selected contemporaneous
within-year block (which corresponds to the contemporaneous end element selected in (ii)) and
append it to the current within-year block. It is to be noted that the appending of the
corresponding neighboring contemporaneous within-year block is to be done for all the ‘k’ sites.
iv. The recently appended contemporaneous within-year block becomes the new current
v. To generate more innovations, repeat steps from (i) to (iv) till the desired length of one
4. Post-blacken the resampled innovation series, to obtain the standardized synthetic streamflows
(4)
Note that, for k = 1, this algorithm reduces to single-site multi-season hybrid matched block
The use of short contemporaneous within-year block sizes ensures reasonable amount of
variability in the synthetic replicates to be generated at various sites. Moreover, the site-to-site
While, the window size selected based on the rank matching approach ensures that one of the
streamflow model preserves the summary statistics, the marginal distributions and the
dependence structure present in the historical streamflows well, then, it is likely to preserve the
water-use characteristics such as the storage capacity and the critical drought characteristics.
However, there is no explicit proof for this and there is no general functional relationship
between the accuracy of preservation of the water-use characteristics and the accuracy of
reproduction of the basic statistical characteristics of streamflows and/or the stochastic model
parameters. Moreover, in case of hybrid models, there are no statistical criteria (such as AIC,
BIC) for the selection of the hybrid model parameters. On the other hand, manually exploring the
huge parameter space of the multi-site multi-season hybrid model (MHMABB) through a large
number of simulations to find the best hybrid model, would involve drudgery and subjectivity.
framework that would explicitly relate the objective functions based on the accuracy of
All extreme (or critical) streamflow droughts encounter large deficits. On the other hand, a long
drought duration may not necessarily signify an extreme (or critical) drought if the
corresponding deficit volume encountered during the drought event is not large. Likewise, a low
mean discharge may not indicate necessarily an extreme drought if its duration is short. The
variation of drought duration is primarily governed by climate, while the deficit volume is more
related to catchment characteristics. According to Zelenhasic and Salvai (1987) and Zelenhasic
(1997), the stochastic process of streamflow droughts can be described by nine descriptive
parameters, of which the critical drought deficit volume is the most informative parameter.
Hence, critical drought deficit volume may be considered to be the essential and single pivotal
characteristic that effectively represents the process of critical streamflow droughts. Hence, the
efficacy of preservation of the critical drought deficit volumes estimated from the historical
streamflows corresponding to various pre-specified truncation levels, is vital for the effective
The streamflow drought characteristics are often described using the theory of runs (Yevjevich,
1967). Specifically negative runs of streamflow sequences with respect to a specified truncation
level, represent deficit conditions. A number of stochastic models preserve the deficit run
(drought) characteristics either at lower or higher truncation levels, but not both. But, a good
synthetic streamflow model is expected to preserve the run characteristics with minimum bias
and root mean square error (overall truncation levels considered) when compared with the
corresponding estimates from the historical streamflows, while ensuring sufficient variability to
account for future uncertainty. Quite often, if the bias of the estimate is reduced, then the
variance of the same may increase and vice-versa. If only the R-RMSE related objective function
is used, then, the hybrid stochastic model identified may have minimum ∑R-RMSE(MARS), but
may result ina high value of ∑|R-Bias(MARS)|, which is not desirable at all. Hence, in this
research work, i) Minimize the sum of absolute values of the relative Bias in the preservation of
the multi-site critical deficit run sum over all truncation levels considered; ii) Minimize the sum
of relative RMSE in the preservation of the multi-site critical deficit run sum over all truncation
Objective Functions
Based on a detailed exploration of the use of different plausible water-use related objective
functions, the following two objective functions are proposed within the framework: (i)
Minimize the aggregated relative bias and (ii) Minimize the aggregated relative RMSE, in the
preservation of the maximum multi-site deficit run sum (MARS) over the truncation levels
varying from 50% to 95%of the historical mean monthly flow (MMF) at intervals of 5% MMF.
(5)
(6)
in which (7)
(8)
where is the estimated MARS based on the historical streamflows at the ith truncation
level. The maximum run sum (MARS) is expressed as: MARS = max(ds1, ds2,…,dsnr), where the
multi-site run-sum for a specified truncation level and run is defined as: wherein
j denotes the run number, k refers to the site number, nk denotes the total number of sites being
modeled and nr denotes the total number of runs. In eq. 7, E [ ] is the mean value of
MARS corresponding to the ith truncation level, estimated over Nr synthetically generated
(9)
In eq. 8, var[ ] is the variance of MARS at the ith truncation level estimated over the Nr
(10)
It is possible to use other water use objective functions in place of the two objective functions
Constraints
Constraints on Model Parameters: Certain constraints are developed within the proposed S-O
this study, the partial pre-whitening is done using a parsimonious parametric model, namely,
periodic autoregressive model of order 1 (PAR(1)). This means that the parameter space of the
parametric component of the multi-season hybrid model is defined by the range of values taken
by the periodic autoregressive parameter of order 1, . For the stationarity condition, the roots
of the characteristic equation must lie within the unit circle. However, in most practical situations
considerations suggest that the lag-1 serial correlation coefficient (ρ1) be positive, which means
that (Hipel and Mcleod, 1994). Accordingly, the following constraint on the first
order PAR parameter ( ) has been introduced into the simulation-optimization framework:
(11)
where, refers to the periodic autoregressive parameter of order ‘1’ for month ‘ ’ at site k.
Non-Parametric Component: In the proposed framework, the multi-site multi-season MABB
model has been used as the non-parametric component. The conditional resampling is done
site. The parameters of the multi-site multi-season MABB model are: (i) the non-overlapping
within-year block sizes and (ii) the band width. In case of within-year blocks, there exist a large
number of possible combinations of non-overlapping block sizes. However, the sum of all the
within-year blocks should be equal to the total number of periods within a year (ω= 12 for
monthly), i.e.,
L1 + L2 + . . . + Ln = ω (12)
Further, in case of selection of bandwidth (w), it is observed from various trials that adopting
large `w', increases the bias in the preservation of historical dependence structure and in the
prediction of storage capacities at different demand levels. While, adopting a low ‘w’ leads to the
based on the experience gained by the authors in modeling periodic streamflows of various rivers
using multi-season HMABB hybrid models, the bandwidth is restricted to fall between 3 and 13.
3 ≤ ω≤ 13 (13)
Constraints on Statistical Characteristics. In this research work, the issue of preserving the inter-
annual dependence is addressed by introducing an explicit constraint that can ensure the
preservation of the dependence at the aggregated annual level. This is done through a constraint
on R-bias in preserving the lag-1 correlation of flows at the aggregated annual level, which is
usually effective in modeling the inter-annual dependence. This is expected to enable the
preservation of the various statistics at the aggregated annual level, and as a result, enhance the
preservation of storage capacity at higher demand levels. In general, the modeler can introduce
any appropriate constraints into the S-O framework explicitly, depending on the statistics to be
(14)
where denotes the basic periodic statistical characteristics(s) at any sitek (such as mean,
standard deviation, skewness of month) and is theallowable upper limit of the relative bias
(that can be specified by the modeler), for each month ( ) for each statistical characteristic
considered foreach site k. In eq. (14), represents the basic aggregated annual statistical
characteristic at any site, k (such as mean, standard deviation, skewness and autocorrelation),
while, is the allowable upper limit of the relative bias at each site, k (that can be specified by
the modeler), for each statistical characteristic (A) considered at the aggregated annual level. In
addition, represents the site-to-site correlations and denotes the allowable upper limit of
the relative bias (that can be specified by the modeler), for the site-to-site correlations.
The hybrid model parameter space of the multi-site multi-season hybrid streamflow model
(MHMABB) consists of two components, i.e., the parametric component at each site and the
non-parametric component, and is quite huge. The parameters of the parametric component of
the model can take combinations of real values within the unit circle resulting from multiple sites
and multiple seasons (12 in case of monthly modeling). The non-parametric component, matched
block bootstrap, contains bl number of within-year blocks and m number of window sizes. The
sizes of each of these blocks can take any integer value between 1 and 12, such that the sum of
all such within-year blocks equals 12 and a reasonable range of band width can be from 3 to 13.
Thus, the total number of combinations of HMABB models possible considering the parametric
There is no known explicit functional relationship between the accuracy of preservation of the
streamflows and/or the stochastic model parameters, especially for the complex hybrid stochastic
model, HMABB, considered in this study. Hence, traditional optimization techniques cannot be
employed to find the optimal hybrid model. Moreover, the hybrid parameter space is too large
evolutionary algorithms (MOEA) are known to be appropriate, since the objective functions can
be explicitly evaluated by interacting with the simulation model. Moreover, their inherent ability
feasible space and noisy functions makes MOEA appropriate for complex real world problems
(Fonseca and Fleming, 1995).Also, these algorithms are efficient and can obtain a number of
non-dominated solutions from a random initial population in a single run (Deb et al.,2002).
Moreover, both discrete and continuous variables can be handled together simultaneously such
as in case of the hybrid parameter space (block sizes and window size being discrete and
Algorithm - II (NSGA-II) developed by Deb et al. (2002) is adopted. Although the number of
alternative hybrid models to be searched appears to be very large, the NSGA-II based genetic
search used in this research work, being an efficient, robust and elitist non-dominated search
based approach, converges to the near Pareto-optimal solutions within reasonable number of
evaluations.
The decision vector consists of both discrete and continuous variables represented within the
NSGA-II string as a chromosome. All the variables are coded in binary strings to represent both
the parametric component (such as ϕ1, ϕ2,. . ., ϕ12 of PAR(1) model defined in a continuous space)
and the non-parametric model parameters (such as one window parameter and within-year block
sizes of MABB model defined in a discrete space). In the decision vector, the first discrete
variable in a chromosome represents the contemporaneous window parameter for the multi-site
multi-season HMABB model. The next twelve discrete variables represent the within-year block
sizes of a maximum possible 12 blocks. The sum of within-year block sizes should be equal to
12 (total number of months in a year). The selection of the within-year block sizes is made in
such a way that the aggregated sum of the sizes of the within-year blocks equals 12.If the number
of within-year blocks is less than 12, then, the remaining number of variables (out of 12) are set
as dummy variables. Moreover, in case, the sum of the within-year block sizes happens to be
greater than 12 in any of the chromosomes (in a given population), then that chromosome is not
allowed to pass through the hybrid model simulator and instead a large positive value is assigned
to the fitness function in order to eliminate that string. The next 12k number of continuous
variables in the chromosome represent the parametric component (PAR(1)) corresponding to the
To evaluate the fitness functions based on the reservoir storage statistics, the generated
chromosomes from NSGA-II (each chromosome represents a multi-site HMABB model) are sent
to the synthetic simulation module. Once the synthetic replicates are generated, the simulation
module computes the required statistics (summary statistics, distribution related statistics,
correlations, storage capacity required at the specified demand levels) and sends the same to the
NSGA-II module to evaluate the fitness functions and the constraints formulated. Based on the
fitness function values evaluated, the solutions are then sorted according to the fast elitist-based
non-dominated approach (Deb et al., 2002) to identify the different levels of non-dominated
fronts. The generation/reproduction based on tournament selection, will pick only the best among
the existing population. The cross-over and the mutation operations are performed to introduce
variability among the generations. To handle both discrete and continuous variable space,
uniform cross-over operator is adopted. The crowded comparison operator enables the diversity
preservation and the elitism operator helps in significantly speeding up the search process and
preserving the good non-dominated solutions. For further details on the NSGA-II approach and
the genetic operators used, the readers are referred to Deb et al. (2002).
In this research work, the single site multi-season HMABB model proposed by Srinivas and
Srinivasan (2006) has been extended to multi-site multi-season HMABB (MHMABB) model.
The modeling steps involved in the synthetic generation of streamflows using the proposed
multi-site multi-season HMABB model are presented in section 2.2.1. The simulation based
MHMABB models are herein referred as SMHMABB models. The SMHMABB model building
is divided into two stages. In stage 1, the parametric model parameters for each site are obtained
using method of moments followed by stage 2 in which the parameters of the nonparametric
models (i.e., the contemporaneous block size and window size) is selected by numerous trials
based on the overall performance of the model. It is to be mentioned that in this study, only equal
within-year blocks sizes (1, 2, 3, 4, and 6) and the window sizes 3,5,7,9,11 and 13 are tried.
Thus, if equal within-year block sizes are used, then the total number of combinations of both the
parametric and the nonparametric components results in 30 hybrid models. It is to be noted that
if unequal within-year block sizes are to be used, then the total number of hybrid models will be
quite large and the manual inspection and selection will be extremely tedious. In fact, the
parameter space of the MHMABB model is huge, and the same is under-explored in case of the
simulation based MHMABB (SMHMABB) model, since the selection of the model parameters
for both the parametric and the non-parametric components of the SMHMABB model is
obtained independently and the residual space explored by the non-parametric model is limited.
The drawbacks of the simulation based hybrid models can be summarized as: (i) Joint parameter
space exploration is not done; (ii) conditioning of variables for the reproduction of statistics at
the aggregated level is not possible; (iii) the manual effort involved in inspection and selection of
The efficacy of the AHMABB model obtained from the proposed S-O framework in modeling
measured at four streamflow stations located on the Upper Colorado River basin. The monthly
naturalized streamflows at the following four streamflow gauging stations for the 102-year
considered for the multi-site multi-season streamflow modeling application: Colorado River near
Cisco, Utah (site 1); Green River at Green River, Utah (site 2); San Juan River near Bluff, Utah
(site 3); and Colorado River at Lees Ferry, Arizona (site 4). The location of the stations are
presented in Table 1and Fig.2. These streamflow data sets have been chosen for the study
because they exhibit complex dependence, and also bimodality in a few months. Also, these
bench-mark data sets have been used by Prairie et al. (2007) and Salas and Lee (2010)for multi-
Figure 2: Location of Streamflow Stations - Colorado River Basin (source: Google Maps)
Table 1: Location of the selected stations for the multi-site multi-season flow modeling
The efficacy of the AMHMABB model is shown through: (i) a comparison with the selected
simulation based hybrid model (SMHMABB), in order to bring out the advantages of the
proposed model in terms of model performance due to the automation achieved by the S-O
framework and the preservation of inter-annual dependence; (ii) a comparison with the multi-site
parametric disaggregation model (MDM) fitted using SAMS2007 (Sveinsson et al., 2007), a
state-of-the-art stochastic streamflow modeling package; and (iii) a split-sample validation test to
assess the performance of the proposed AHMABB model in capturing the statistics of the multi-
The performance comparisons are based on the ability of the models to preserve the following
statistics: (i) summary statistics (mean, standard deviation and skewness coefficient) at within-
year (monthly) and aggregated annual time scales at each site; (ii) marginal distribution of
monthly flows at each site; (iii) lag-1 autocorrelation at aggregated annual level at each site; (iv)
monthly serial correlations at each site; (v) serial state-dependent correlations (Sharma et al.,
1997) of monthly flows at each site (representing nonlinear dependence); (vi) lag-zero site-to-site
correlations at the monthly level; (vii)minimum and maximum monthly flows at each site; and
(viii)the multi-site deficit run characteristics (Yevjevich,1972; Haltiner and Salas, 1988)
expressed in terms of (a) maximum deficit run sum; (b) maximum deficit run length; (c) mean
For the AMHMABB and the SMHMABB, the details of the models considered for the selection
and the selected model for comparison are discussed in the following paragraphs.
AMHMABB model: Since this model is to be obtained from the S-O framework based on the
multi-objective evolutionary search using NSGA-II (Deb et al., 2002), a sensitivity analysis is
performed for the application example considered in this study. The sensitivity analysis on
probability and random seed) has been carried out with an intention to obtain the non-dominated
Pareto-optimal solutions. The MOGA parameters adopted based on the sensitivity analysis are as
follows: population size = 100; number of generations = 300; probability of cross-over = 0.6;
mutation probability = 0.001; random seed = 0.3. The non-dominated front obtained for the
application example using the evolutionary search based technique NSGA-II (Deb et al., 2002),
is presented in Fig. 3. In Fig. 3, the solutions A and C represent the AMHMABB models
corresponding to the two extremes on the non-dominated front, one with the "minimum ∑|R-
bias(MARS)|" and the other with the "minimum ∑R-RMSE(MARS)" respectively; the solution B
represents the AMHMABB model that corresponds to a typical compromising solution between
the two extremes. The compromising solution is the one that is located closest to the origin on
the pareto-front presented in Fig. 3. It is to be noted from Fig. 3 that the Pareto-front has a
narrow range, resulting in practically very close solutions, which is plausibly due to the inter-
annual dependence constraint introduced into the framework. Hence, in this study, only the
AMHMABB-A solution is used for the comparisons and the same will be hereafter referred as
AMHMABB.
SMHMABB model: While the parametric component of the SMHMABB model is restricted to
PAR(1) at all the sites for partial pre-whitening, the non-parametric components of the
SMHMABB model are picked from one of the combinations resulting from: (i) equal sized
within-year contemporaneous blocks of 1,2,3,4,6 months and (ii) window sizes of 3,5,7,9,11,13.
It is to be noted that the PAR(1) model parameters are estimated independently at each station
using the method of moments and the within-year block sizes adopted for resampling the
residuals are equal and contemporaneous, since the unequal block sizes result in a large number
of possible hybrid models, which will be too cumbersome to evaluate manually. The above
combinations of parametric and non-parametric components result in 30hybrid models. For the
purpose of comparison, the most competent model is chosen based on the reproduction of all the
temporal as well as the spatial statistics and the preservation of the deficit run characteristics.
The SMHMABB model selected herein for the Colorado river basin has the PAR(1) model at
each site as the parametric component and the contemporaneous within-year block size of 4
months and the window size of 9 as the nonparametric model components used for resampling
Table 2: Parameters for the selected multi-site HMABB models SMHMABB and
A comparison of model performance between the selected AMHMABB model and the selected
SMHMABB model is presented in the next few paragraphs. Table 2 summarizes the parameters
of the selected SMHMABB and AMHMABB models for Colorado River Basin, from which it
can be observed that the parameters of the AMHMABB model (both the parametric component
and the non-parametric component) are quite different from those of the SMHMABB model for
the streamflows at all the sites. This is because, in case of the SMHMABB model, the periodic
parameters (parametric component) are obtained at each site (independently) by fitting a PAR(1)
model using the method of moments (SAMS 2007). Following this, the multi-site residuals are
contemporaneously resampled using a multi-site HMABB with a set of equal within-year block
sizes and a pre-selected window size. Subsequently, the post-blackening operation is performed.
Thus, there are multiple steps and these steps have to be sequentially performed. While, in case
of the AMHMABB model, the parameters of the parametric component are simultaneously
objective evolutionary search technique, NSGA-II, efficiently guided by objective functions that
are based on multi-site critical deficit run sum preservation and the constraint on preservation of
inter-annual dependence. Moreover, in case of the proposed framework, a huge parameter space
is available for the search (unlike the simulation based SMHMABB model). As mentioned
earlier, the AMHMABB model yielding the minimum ∑|R-bias(MARS)| from the pareto-optimal
For both the AMHMABB and the SMHMABB models, the reproduction of summary statistics
and the preservation of the serial correlations at monthly level are presented in Figs. 4 and 5
respectively. For brevity, the results are shown for only two sites of the Colorado river basin,
since a similar trend of results are observed at the other two sites. The reproduction of the
summary statistics and the lag-1 autocorrelation of the aggregated annual flows are presented in
Table 3. The summary statistics of the flows are well reproduced by both the models at the
monthly level (Fig. 4) and at the aggregated annual level (Table 3). However, the standard
deviation at the annual level is deflated by the SMHMABB model. It is seen from Fig. 5 that the
monthly serial correlations at all the 4 lags are well preserved by the AMHMABB model,
whereas, the SMHMABB model shows considerable bias in preserving the lag-2, lag-3 and lag-4
correlations, and this bias is found to increase with the order of the lag. The lag-1 autocorrelation
at the aggregated annual level is well preserved at all the sites by the AMHMABB model (Table
3), due to the inter-annual dependence constraint introduced into the framework. On the other
hand, the SMHMABB model (Table 3) does not preserve the lag-1 autocorrelation at the annual
level at any of the four sites, since the simulation based hybrid model is not conditioned to
preserve the same. The lag-zero site-to-site correlations (Fig. 6) are well reproduced by both the
models, due to the residual resampling using contemporaneous within-year blocks. Although the
state-dependent correlations (Fig. 7) are well reproduced at all the four sites by both the models,
the SMHMABB model exhibits relatively more bias in a few months in comparison with the
AMHMABB model.
SMHMABB and AMHMABB models (values in parentheses denote the standard deviation
between AMHMABB and SMHMABB Models: a) site 1 to site 3; b) site 1 to site 4; c) site 2
to site 4.
4.2.2. Preservation of Marginal Distributions and minimum and maximum monthly flows
Typical results of preservation of the marginal distributions of monthly streamflows at site 1 for
July and December months are presented in Figs. 8 and 9 respectively and the same at site 2 for
August month is presented in Fig. 10. For brevity, the results are presented and discussed only
for flows of a few typical months that exhibit peakedness and/or bimodality. In general, it is
observed that the AMHMABB model is able to mimic the distribution characteristics of
historical flows very well (especially the non-normal features such as peakedness and bi-
modality), when compared with the SMHMABB model. It is also observed that at all the four
sites, both the models show limited smoothing as well as extrapolation beyond the extremes
(minimum and maximum flows). The preservation of the minimum and the maximum flows at
the key site 4 is presented only for AHMABB in Fig. 11 since the behavior of SMHMABB is
quite similar. Both the hybrid models do not preserve the minimum and the maximum flows
effectively, since very limited number of flows are generated beyond the historical extremes. In
Figure 8: Preservation of the marginal Distribution of the July month streamflows at site 1-
Mm3/month)
Figure 10: Preservation of the marginal Distribution of the August month streamflows at
Mm3/month)
Figure 11: Preservation of a) minimum flows and b) maximum flows at site 4 - Model:
The results of preservation of the multi-site deficit run characteristics are presented in Fig. 12for
the selected SMHMABB model and the selected AMHMABB model. From Fig. 12, it is
observed that the selected AMHMABB model clearly outperforms the SMHMABB model in
preserving the number of runs, the critical and the mean deficit run characteristics at all the
truncation levels. Also, a good and consistent percent of exceedance of the various deficit run
characteristics (compared to their historical flow counterparts) is noted in case of the generated
streamflows from the AMHMABB model, which is not shown here for brevity. It is to be noted
that the selected AMHMABB model is able to preserve the critical run sum accurately owing to
the objective functions adopted (that are explicitly related to∑|R-bias(MARS)| and ∑R-
RMSE(MARS), over various truncation levels). On the other hand, the SMHMABB model
shows high bias either at lower and/or higher truncation levels. Herein, it is to be mentioned that
although no objective functions/constraints are introduced into the S-O framework with regard to
preserving the other deficit run characteristics (such as number of runs, maximum run length,
mean run sum and mean run length), these are well preserved by the selected AMHMABB
model, when compared with the simulation based SMHMABB model (Fig. 12).
The automated multi-site hybrid model AMHMABB proposed in this study scores over the
simulation based hybrid model SMHMABB plausibly due to the more effective combination of
the parametric and the non-parametric components owing to the exploration of the huge
parameter space of HMABB enabled by the objective function that minimizes the aggregated
errors in the preservation of the multi-site critical drought magnitude over a wide range of
threshold levels, subject to the constraint for the preservation of the inter-annual dependence of
the simulated streamflows at the various sites. The provision for unequal within-year block sizes
in the structure of the AMHMABB model is expected to offer a better representation of the
short-term persistence due to the recession of the seasonal ground water flows in the sub-basins
considered and the pronounced seasonality due to seasonal storage in snow packs. Moreover, the
dependence of the streamflows at the various sites considered, is expected to represent the over-
year response time of deep ground water runoff more accurately (Claps and Murrone, 20??). In
general, the better preservation of the streamflow drought durations at the various sites exhibited
by AMHMABB is indicative of the better representation of the storage and the response times of
the different catchments considered in this study. The observation that both the multisite drought
durations and the drought severities (magnitudes) at various threshold levels are better preserved
by the AMHMABB model indicates that the non-linear dynamics behind the propagation of the
The following paragraphs bring out the performance comparison between the selected
AMHMABB model obtained from the proposed S-O framework and the multi-site parametric
models arising out of the combinations resulting from: i) aggregated annual model at the key
site; ii) the available schemes for the spatial and the temporal disaggregations; and iii) the
sequence of disaggregation adopted, are fitted using SAMS 2007. The best model based on the
preservation of spatial and temporal statistics as well as deficit run characteristics, is selected.
The selected MDM model adopts AR(1) for generation of the aggregated annual flows at the key
site, Lees Ferry on Colorado river (Table 1, Fig. 2),Valencia and Schaake model for spatial
disaggregation of the generated annual streamflows at the key site, followed by temporal
disaggregation using Lane’s model. For brevity, only the results for the key site (site 4) are
presented in this section, since similar trend of performance is observed for the other three sites
as well. For both AMHMABB and MDM, the reproduction of the summary statistics,
distributions at the key site (site 4) and multi-site drought characteristics are presented in Figs
13-17, respectively.
Fig 13: Reproduction of Summary Statistics for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model (Flow units -
Mm3/month)
Fig 14: Preservation of Serial Correlations for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model
Fig 15: Preservation of State-dependent Correlations for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model
Fig 16: Preservation of Marginal Distribution of March and June month flows at Site 4 for
Colorado River Basin - A comparison between Disaggregation model (MDM) and
AMHMABB models (Flow units - Mm3/month)
Fig 17: Preservation of multi-site drought characteristics (a) Number of runs; (b)
Maximum Run Length; (c) Maximum Run Sum for Colorado River Basin - A comparison
between Disaggregation model (MDM) and AMHMABB models
It is observed from Figure 13 that the monthly mean and standard deviations of flows at Lees
Ferry (Site 4) are well reproduced by both the models, although the MDM model exhibits some
bias in a few months in terms of preserving the skewness coefficient. A detailed performance
comparison of preservation of monthly serial correlations has been done, in this study, but for
brevity, the results are presented only for lag-1, lag-2 (lower lags), lag 3 and lag-4 (higher lags)
monthly serial correlations in Figure 14. In case of AMHMABB models, it is observed that both
the lower lag (lag-1 andlag-2) and the higher lag (lag-3 and lag-4) correlations are well preserved
for Site – 4. In case of MDM, it is observed that the lower lag serial correlations are reasonably
well preserved. While, the higher lag serial correlations are not preserved by the disaggregation
model. This is because the selected MDM is not designed to preserve the serial correlations
beyond lag-1 at the seasonal level. The results for the preservation of the lag-1 state-dependent
correlations for the Site 4 is presented in Figure 15. It is observed from these figures that in
general, the AMHMABB model is seen to reproduce the monthly lag-1 state-dependent
correlations very well. On the other hand, it is seen that the disaggregation models fail to
preserve the same. Being a linear parametric model, the MDM is not expected to preserve the
state-dependent correlations that are indicative of the non- linear dependence present in the data
Typical results of preservation of the marginal distribution of flows of the Colorado River Basin
for the AMHMABB model and the multi-site disaggregation model (MDM) are presented in
Figure 16. For brevity, the results are presented only for the March and June month flows. It is
observed, that the AMHMABB model is able to mimic the distribution characteristics of
historical flows very well, when compared to MDM. However, being a parametric model, MDM
observed that, both the models show only some limited extrapolation near minimum and/or
maximum flows. It is to be noted that the selected AMHMABB model is found to preserve the
statistical characteristics of multi-site multi-season streamflows better than the selected multi-site
disaggregation model (MDM), although the objective functions used in the proposed S-O
framework for the AMHMABB model are based on the preservation of the multi-site critical
Figure 17 for both AMHMABB and MDM. From Figure 17, it is observed that the selected
AMHMABB model is able to preserve the deficit run characteristics better compared to the
MDM. It is to be noted that although there are no objective functions/constraints used to achieve
the preservation of the other deficit run characteristics (such as number of runs and maximum
run length), these characteristics are also well preserved by the AMHMABB model when
compared to the MDM. Overall, it is observed that the selected AMHMABB model shows better
performance in simulating the historical streamflows of the Colorado river basin, when
compared with the selected disaggregation model (MDM). The better performance of the
AMHMABB model in comparison with the MDM may be attributed to the better preservation of
the marginal distributions, the higher lag serial correlations, skewness coefficient of aggregated
Split sample validation is conducted with a view to ensure that the AMHMABB model obtained
from the simulation-optimization framework is able to capture the repeatable statistical structure
present in the historical streamflow data. In other words, this kind of validation will endorse the
adaptability of the proposed model for the possible streamflow sequences that may occur in the
uncertain future. The split sample validation is carried out in two phases, namely, calibration and
validation, using the 102-yearmultisite monthly streamflows measured at the four streamflow
stations located on the Upper Colorado River basin (Table 1, Fig. 2). In the calibration phase, the
first 60 years of the streamflow data are employed in obtaining the parameters of the parametric
and the non-parametric components using the S-O framework. The AMHMABB model obtained
in the calibration stage is then used to model the validation data set (remaining 42 years of
historical streamflows). The stochastic model tested is considered acceptable for practical use, if
it can provide a very good simulation of the measured streamflows at the calibration as well as
the validation phases by way of reproducing the basic statistical characteristics of the streamflow
data as well as preserving both the critical and the mean deficit run characteristics obtained from
the historical flow data, at various truncation levels, with minimum errors. For brevity, only the
results for the Site 4 are presented in this section since similar statistical performance is
Fig 18: Reproduction of Summary Statistics for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 19: Reproduction of Summary Statistics for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 20: Preservation of Serial Correlations for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 21: Preservation of Serial Correlations for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 22: Preservation of Marginal Distribution of the February and April monthflows for
Colorado River Basin (Calibration and Validation) at Site 4 - Model: AMHMABB
Fig 23: Preservation of multi-site drought characteristics Number of runs and Maximum Run
Sum for Colorado River Basin (Calibration and Validation)- Model: AMHMABB
The results of the reproduction of the summary statistics, the preservation of the dependence
structure of the historical streamflows and the preservation of marginal distributions for the
calibration and validation data sets are presented in Figs. 18-22.-. It is observed from Figures 18
and 19 that the mean and the standard deviation are well reproduced for all the months in case of
both calibration and validation. On the other hand, although the skewness is well preserved in the
calibration dataset, it is slightly deflated during three of the high skewness months (November,
December and January) in case of validation. In the calibration stage, the AMHMABB model is
able to preserve the lower lag (lag-1 andlag-2) as well as the higher lag (lag-3 and lag-4) serial
correlations (Fig: 20) well, although a small bias is noted in a few months, in case of higher lag
cross-year serial correlations. While, in case of validation, some bias is observed in the higher
lag serial correlations (lag-3 and lag-4) in a few months (Fig: 21). It may be observed from
Figure 22 that AMHMABB is able to mimic the distribution characteristics of the monthly
historical flows in both calibration and validation data sets. However, it may be noted that the in
both calibration and validation stages, the AMHMABB model exhibits only very limited
For brevity, only the results of the preservation of number of runs, maximum (critical) run length
and the maximum (critical deficit) run sum for both calibration and validation datasets are
presented in Figure 23. It can be observed that the AMHMABB model is able to preserve the
maximum run sum well at all the truncations considered in calibration as well as validation. This
is expected in case of calibration data set, since the proposed model uses an objective function
that is explicitly related to preserving the critical deficit run sum characteristics. However, it can
be observed, that in case of validation also, the maximum run sum at the various truncation
levels specified are preserved; moreover, the number of runs are well preserved in both the
calibration and the validation stages (Fig. 23), although they are not explicitly specified in the
objective functions or constraints of the framework. It is to be noted that the critical run length is
slightly underestimated at both the calibration as well as the validation stages (Fig. 23). Thus, the
split sample validation performed in this study brings out the ability of the proposed hybrid
multi-season streamflow model (AMHMABB) in predicting the statistical as well as the multi-
site critical deficit run characteristics of the multi-site streamflows likely to occur in future. It
also shows that the hybrid model parameters obtained through evolutionary search using the
maximum run sum based objective functions and the constraint related to the preservation of the
The simulation-optimization (S-O) framework developed for modeling the single-site multi-
season streamflows by Srivastav and Srinivasan (2011) is extended to modeling the multi-site
proposed by Srinivas and Srinivasan (2006) is extended to the multi-site streamflow simulation
model and the same is also used as the simulation module within the proposed S-O framework
The multi-objective optimization model formulated is the driver and the multi-site, multi-season
hybrid matched block bootstrap model is the simulation engine within this framework. In
addition to the constraints on the hybrid model parameter space, some specific constraints
aggregated annual flows) are introduced explicitly into the modeling framework with a view to
arrive at a hybrid stochastic model with an improved performance in terms of preserving the
statistics at more than one level and consequently, preserving the deficit run (drought)
characteristics accurately. A robust and efficient evolutionary search based technique, namely,
non-dominated sorting based genetic algorithm (NSGA - II) (Deb et al., 2002) is employed as the
solution technique for the multi-objective optimization within the S-O framework. The use of the
deficit sum related objective functions and the constraints imposed explicitly into the simulation-
optimization framework apparently enable the evolutionary search to effectively explore the
wide parameter space of the multi-site, multi-season hybrid model HMABB. The efficacy of the
The proposed hybrid model AMHMABB preserves the temporal statistics (summary statistics,
marginal distributions at each site at monthly and aggregated annual levels) as well as the spatial
statistics (site-to-site lag-zero correlations) very well. Also, the other deficit run characteristics
namely, the number of runs, the maximum run length, the mean run sum and the mean run length
Overall, the AMHMABB model (obtained from the proposed S-O framework) is able to show
better streamflow modeling performance when compared with the simulation based SMHMABB
model (which is an extension of the single-site HMABB model proposed by Srinivas and
SMHMABB model, is plausibly due to the significant role played by: (i) the objective functions
related to the preservation of deficit run sum, which drives (directs) the search effectively; (ii)
the huge hybrid model parameter space available for the search and (iii) the constraint on the
preservation of the inter-annual dependence. Further, the split-sample validation results indicate
that the AMHMABB model is able to perform well in both calibration and validation phases,
indicating that the parameters derived from the proposed S-O framework are robust in predicting
the statistical and the critical deficit characteristics of the multi-site multi-season streamflows
likely to occur in the uncertain future. Moreover, the proposed AMHMABB model is found to
perform better in preserving the statistical as well as the preservation of the critical deficit run
characteristics of the observed flows, when compared with the multi-site parametric
1. The model dose not generate values that are quite different from the observed flows and
hence only limited smoothing is achieved. One way of alleviating this issue is to use
4. In place of the linear parametric model, non-linear regression based models or data-
bootstrap methods such as multisite maximum entropy bootstrap (Srivastav et al., 2014)
can be tried.
Acknowledgments
The authors wish to thank the Indian Institute of Technology Madras, Chennai, India for the
continuous support and facilities offered to carry out this research work. The help rendered by
Prof. V.V. Srinivas, Indian Institute of Science, Bengaluru, India through sharing some of the
computer codes for verifying and validating the stochastic model is gratefully acknowledged.
The constructive suggestions and the insightful comments of the anonymous reviewers, the
associate editor and the Editor, Andreas Bardossy, were helpful in improving the quality of the
manuscript. The first author wishes to thank VIT University for providing the required
References
Azadivar, F., Tompkins, G., 1999. Simulation optimization with qualitative variables and
Back, T., Schwefel, H. P., 1993. An overview of evolutionary algorithms for parameter
Carlstein, E., Do, K. A., Hall, P., Hesterberg, T., Kunsch, H., 1998. Matched-block bootstrap for
Chen, B. S., Lee, B. K., Peng, S. C., 2002. Maximum likelihood parameter estimation of
FARIMA processes using the genetic algorithm in the frequency domain. IEEE Transactions on
Chen L, V.P. Singh, S.L. Guo, J.Z. Zhou, J.H. Zhang, 2015. Copula-based method for multisite
monthly and daily streamflow simulation. Journal of Hydrology, 528, pp. 369–384
models by time series aggregation, Stochastic and Statistical Methods in Hydrology and
Cortez, P., Rocha, M., Neves, J., 2004. Evolving time series forecasting ARMA models. Journal
Davison, A. C., Hinkley, D. V. (Eds.), 1997. Bootstrap methods and theirapplication. Cambridge
Deb, K., Pratap, A., Agrawal, S., Meyarivan, T., 2002. A fast and elitist multi-objective genetic
Efstratiadis, A., Dialynas, Y.G., Kozanis, S., Koutsoyiannis, D., 2014. A multivariate
stochastic model for the generation of synthetic time series at multiple time scales reproducing
Fiering, J. D., 1964. Multivariate technique for synthetic hydrology. Journal of Hydraulic
Grantz, K., Rajagopalan, B., Clark, M., Zagona, E., 2006. A technique for incorporating large-
Grygier, J. C., Stedinger, J. R., 1988. Condensed disaggregation procedures and conservation
Haltiner, J. P., Salas, J. D., 1988. Development and testing of a multivariate seasonal ARMA
Harms, A. A., Campbell, T. H., 1967. An extension to the Thomas-Fiering model for the
Hipel, K. W., Mcleod, A. I., 1994. Time series modeling of water resources and environmental
Ilich, N., 2014. An effective three-step algorithm for multi-site generation of weekly
Ilich N, Despotovic J., 2008, A simple method for effective multi-site generation of stochastic
Izzeldin, M., Murphy, A., 2000. Bootstrapping the small sample critical values of the rescaled
Koutsoyiannis, D., 1999. A nonlinear disaggregation model with a reduced parameter set for
Lall, U., 1995. Recent advances in nonparametric function estimation. Reviews of Geophysics,
1093-1102.
Lall, U., Sharma, A., 1996. A nearest neighbour bootstrap for resampling hydrologic time series.
Colorado.
Langousis, 2006. A stochastic methodology for generation of seasonal time series reproducing
Lee, T., Salas, J. D., Prairie, J., 2010. An enhanced nonparametric streamflow disaggregation
doi:10.1029/2009WR007761.
Li, C. and Singh, V.P., 2014. A Multi-model Regression-sampling Algorithm for generating rich
Loon V A, G. Laaha (2015), Hydrological drought severity explained by climate and catchment
Marković, Đ., Plavšić, J., Ilich N, Ilić S (2015), Non-parametric Stochastic Generation of
doi:10.1007/s11269-015-1090-z
Matalas, N. C., 1967. Mathematical assessment of synthetic hydrology. WaterResources
Mejia, J. M., Rousselle, J., 1976. Disaggregation models in hydrology revisited. Water
Minerva, T., Poli, I., 2001. Building ARMA models with genetic algorithms, lecture notes in
Moon, Y. I., Lall, U., 1994. Kernel function estimator for flood frequency analysis. Water
Oliveira, G. C., Kelman, J., Pereira, M. V. F., Stedinger, J. R., 1988.A representation of spatial
correlations in large stochastic seasonal streamflow models. Water Resources Research 24 (5),
781-785.
Ong, C. S., Huang, J. J., Tzeng, G. H., 2005. Model identification of ARIMA family using
Peng, P., Chen, Q., 2003. Improved genetic algorithm and application to ARMA modeling. SICE
Pereira, M. V. F., Oliveira, G. C., Costa, C. C. G., Kelman, J., 1984.Stochastic streamflow
Pierreval, H., Paris, J. L., 2000. Distributed evolutionary algorithms for simulation optimization.
Prairie, J., Rajagopalan, B., Fulp, T. J., Zagona, E. A., 2006. Modified k-NN model for stochastic
Rajagopalan, B., Lall, U., 1999. A k-nearest-neighbour simulator for daily precipitation and
Rasmussen, P. F., Salas, J. D., Fagherazzi, L., Rassam, J.-C., Bobee, B.,1996. Estimation and
Rolf, S., Sprave, J., Urfer, W., 1997. Model identification and parameter estimation of ARMA
Salas, J. D. (Ed.), 1993. Analysis and modeling of hydrologic time series, in Handbook of
Salas, J. D., Delleur, J. W., Yevjevich, V., Lane, W. (Eds.), 1980. Applied Modeling of
Salas, J. D., Guillermo, Q., III, T., Bartolini, P., 1985. Approaches to multivariate modeling of
Santos, E. G., Salas, J. D., 1992. Stepwise disaggregation scheme for synthetic hydrology.
Silverman, B. W., 1986. Density estimation for statistics and dataanalysis: monograph on
Singhrattna, N., Rajagopalan, B., Clark, M., Kumar, K. K., 2005.Forecasting Thailand summer
Srinivas, V. V., Srinivasan, K., 2000. Post-blackening approach for modeling dependent annual
Srinivas, V. V., Srinivasan, K., 2001. A hybrid stochastic model for multiseason streamflow
Srinivas, V.V., Srinivasan, K., 2005. Hybrid moving block bootstrap for stochastic simulation of
Srinivas, V. V., Srinivasan, K., 2006. Hybrid matched-block bootstrap for stochastic simulation
Srivastav, R.K., Simonovic, S.P., 2014. An analytical procedure for multi-site, multiseason
for multi-season hybrid stochastic models, Journal of Hydrology 404 (3-4), 209-225.
Stedinger, J. R., Pei, D. (Eds.), 1982. An annual-monthly streamflow model for incorporating
Stedinger, J. R., Pei, D., Cohn, T. A., 1985. A condensed disaggregation model for incorporating
parameter uncertainty into monthly reservoir simulation. Water Resources Research 21 (5), 665-
675.
Stedinger, J. R., Vogel, R.M., 1984. Disaggregation procedures for generating serially correlated
model for synthetic generation of annual stream-flows, Hydrol. Process., 22 (12), pp. 1831–1845
Tarboton, D. G., Sharma, A., Lall, U., 1998. Disaggregation procedures for stochastic hydrology
Tekin, E., Ihsan, S., 2004. Simulation optimization: A comprehensive review on theory and
Ünes, F., Demirci, M., Kişi, Ö. (2015). Prediction of Millers Ferry Dam reservoir level in USA
using Artificial Neural Network. Periodica Polytechnica Civil Engineering, 59 (3) 309-318.
Voss, M. S., Feng, X., 2002. ARMA model selection using particle swarm optimization and AIC
Yates, D., Gangopadhyay, S., Rajagopalan, B., Strzepek, K., 2003. A technique for generating
regional climate scenarios using a nearest neighbor bootstrap. Water Resources Research 39 (7),
1199.
Zelenhasic, E. and Salvai, A., 1987. A Method of Streamflow Drought Analysis, Water
2.20
Pareto Front
2.18
Selected Solutions
A
2.16
R-RMSE in MARS
2.14
2.12
B
2.10
2.08
C
600
400
200
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
2.4
2.0
Skewness
1.6
1.2
0.8
0.4
0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month
b)
3000
2000
1000
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Standard Deviation
2000
1600
1200
800
400
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
2.4
2.0
Skewness
1.6
1.2
0.8
0.4
0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month
0.8 0.8
Lag1 Correlation
Lag2 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
Historical Historical
AMHMABB AMHMABB
SMHMABB SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
1.0 1.0
Historical Site 2 Historical Site 2
AMHMABB AMHMABB
SMHMABB SMHMABB
0.8 0.8
Lag3 Correlation
Lag4 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
b)
1.0 1.0
Historical
Site 4 AMHMABB
Site 4
SMHMABB
0.8 0.8
Lag 2 Correlation
Lag 1 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
1.0 1.0
Historical Site 4 Historical Site 4
AMHMABB AMHMABB
SMHMABB SMHMABB
0.8 0.8
Lag 4 Correlation
Lag 3 Correlation
0.6 0.6
0.4 0.4
0.2 0.2
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
a)
0.85
Site 1 to Site 3
0.80
0.75
0.70
0.65
Lag 0
0.60
0.55
0.50
0.45 Hist
SMHMABB
0.40 AMHMABB
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month
b) 1.00
Site 1 to Site 4
0.98
0.96
0.94
0.92
0.90
Lag 0
0.88
0.86
0.84
0.82
0.80 Hist
0.78 SMHMABB
AMHMABB-A
0.76
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month
c) 1.00
Site 2 to Site 4
0.95
0.90
0.85
Lag 0
0.80
0.75
0.70 Hist
SMHMABB
AMHMABB
0.65
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month
a) Historical
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
1.0 1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
Historical 0.2
0.0 Historical
AMHMABB AMHMABB
SMHMABB SMHMABB
-0.2 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
1.0 1.0
b) Historical
ABOVE & BACKWARD CORRELATIONS
AMHMABB
ABOVE & FORWARD CORRELATIONS
SMHMABB
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
1.0 1.0
Historical
BELOW & BACKWARD CORRELATIONS
AMHMABB
BELOW & FORWARD CORRELATIONS
SMHMABB
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
Fig 13: Reproduction of Summary Statistics for Colorado River Basin at Site 4 - A comparison
between AMHMABB model and Disaggregation model (Flow units - Mm3/month)
Fig 14: Preservation of Serial Correlations for Colorado River Basin at Site 4 - A comparison
between AMHMABB model and Disaggregation model
Fig 15: Preservation of State-dependent Correlations for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model
Fig 16: Preservation of Marginal Distribution of March and June month flows at Site 4 for
Colorado River Basin - A comparison between Disaggregation model (MDM) and AMHMABB
models (Flow units - Mm3/month)
Fig 17: Preservation of multi-site drought characteristics (a) Number of runs; (b) Maximum Run
Length; (c) Maximum Run Sum for Colorado River Basin - A comparison between
Disaggregation model (MDM) and AMHMABB models
Split-Sample Validation
Fig 18: Reproduction of Summary Statistics for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 19: Reproduction of Summary Statistics for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 20: Preservation of Serial Correlations for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 21: Preservation of Serial Correlations for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 22: Preservation of Marginal Distribution of the February and April month flows for
Colorado River Basin (Calibration and Validation) at Site 4 - Model: AMHMABB
35000 30000
5000 5000
0 0
40 50 60 70 80 90 100 110 40 50 60 70 80 90 100 110
Threshold Threshold
100 Historical 60
AMHMABB -Calibration Historical
80 AMHMABB -Validation
45
Number of Runs
Number of Runs
60
30
40
15
20
0
0
50 60 70 80 90 100 40 50 60 70 80 90 100 110
Threshold Threshold
21 18
Historical Historical
AMHMABB -Calibration
Maximum Run Length
AMHMABB -Validation
14
12
7
6
0
40 60 80 100 50 60 70 80 90 100
Threshold Threshold
Fig 23: Preservation of multi-site drought characteristics Maximum Run Sum, Number of runs
and Maximum Run Length for Colorado River Basin (Calibration and Validation) - Model:
AMHMABB
Table 1: Location of the selected stations for the multi-site multi-season flow modeling
Non-Parametric
Parametric Component Component
Site Para- Window Block
Number Model meters Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Model Size Size
SMHMABB
MABB 9 4,4,4
Site 1 PAR(1) 0.6247 0.7778 0.8548 0.7492 0.6710 0.5147 0.4539 0.5107 0.6178 0.8515 0.8421 0.6764
Site 2 PAR(1) 0.6515 0.8395 0.7528 0.5319 0.4871 0.3135 0.4968 0.6179 0.5903 0.7946 0.8152 0.7117
Site 3 PAR(1) 0.3054 0.7107 0.7641 0.5903 0.5503 0.5681 0.6455 0.7505 0.7589 0.8379 0.4850 0.5154
Site 4 PAR(1) 0.5104 0.7558 0.8249 0.6546 0.5492 0.4679 0.4507 0.5917 0.6252 0.8365 0.7860 0.6437
AMHMABB
MABB 5 6,6
Site 1 PAR(1) 0.9279 0.9500 0.9426 0.9500 0.9389 0.2865 0.6330 0.4708 0.3049 0.5556 0.0800 0.0505
Site 2 PAR(1) 0.9500 0.9500 0.9500 0.9316 0.9058 0.0395 0.7546 0.9389 0.1243 0.9095 0.4081 0.4671
Site 3 PAR(1) 0.1132 0.3823 0.2607 0.5851 0.1833 0.5224 0.4782 0.4671 0.0948 0.2127 0.1464 0.3897
Site 4 PAR(1) 0.9389 0.9463 0.9389 0.9389 0.9353 0.3270 0.7141 0.8357 0.9242 0.9095 0.7952 0.8468
are the first-order periodic (monthly) autoregressive parameters of the hybrid models
Table 3: Reproduction of the Aggregated Annual Flow Statistics - Comparison between
SMHMABB and AMHMABB models (values in parentheses denote the standard deviation over
300 replicates)
76
Highlight Points:
77