Sunteți pe pagina 1din 78

Accepted Manuscript

Research papers

Simulation-Optimization Framework for Multi-Site Multi-Season Hybrid Sto-


chastic Streamflow Modeling

Roshan Srivastav, K. Srinivasan, K.P. Sudheer

PII: S0022-1694(16)30579-0
DOI: http://dx.doi.org/10.1016/j.jhydrol.2016.09.025
Reference: HYDROL 21521

To appear in: Journal of Hydrology

Received Date: 23 January 2016


Revised Date: 12 August 2016
Accepted Date: 8 September 2016

Please cite this article as: Srivastav, R., Srinivasan, K., Sudheer, K.P., Simulation-Optimization Framework for
Multi-Site Multi-Season Hybrid Stochastic Streamflow Modeling, Journal of Hydrology (2016), doi: http://
dx.doi.org/10.1016/j.jhydrol.2016.09.025

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers
we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and
review of the resulting proof before it is published in its final form. Please note that during the production process
errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
Simulation-Optimization Framework for Multi-Site Multi-Season Hybrid

Stochastic Streamflow Modeling

Roshan Srivastava,b, K.Srinivasanc, K.P.Sudheerc

a
Associate Professor, School of Civil and Chemical Engineering, VIT University, Vellore, Tamilnadu, India 632014

b
Former PhD Scholar,EWRE Division, Dept. of Civil Engineering, IIT Madras, Chennai, Tamilnadu, India 600036

c
Professor, EWRE Division, Dept. of Civil Engineering, IIT Madras, Chennai, Tamilnadu, India 600036

Abstract

A simulation-optimization (S-O) framework is developed for the hybrid stochastic modeling of

multi-site multi-season streamflows. The multi-objective optimization model formulated is the

driver and the multi-site, multi-season hybrid matched block bootstrap model (MHMABB) is the

simulation engine within this framework. The multi-site multi-season simulation model is the

extension of the existing single-site multi-season simulation model. A robust and efficient

evolutionary search based technique, namely, non-dominated sorting based genetic algorithm

(NSGA - II) is employed as the solution technique for the multi-objective optimization within the

S-O framework. The objective functions employed are related to the preservation of the multi-

site critical deficit run sum and the constraints introduced are concerned with the hybrid model

parameter space, and the preservation of certain statistics (such as inter-annual dependence

and/or skewness of aggregated annual flows). The efficacy of the proposed S-O framework is

brought out through a case example from the Colorado river basin. The proposed multi-site

multi-season model AMHMABB (whose parameters are obtained from the proposed S-O

framework) preserves the temporal as well as the spatial statistics of the historical flows. Also,
the other multi-site deficit run characteristics namely, the number of runs, the maximum run

length, the mean run sum and the mean run length are well preserved by the AMHMABB model.

Overall, the proposed AMHMABB model is able to show better streamflow modeling

performance when compared with the simulation based SMHMABB model, plausibly due to the

significant role played by: (i) the objective functions related to the preservation of multi-site

critical deficit run sum; (ii) the huge hybrid model parameter space available for the evolutionary

search and (iii) the constraint on the preservation of the inter-annual dependence. Split-sample

validation results indicate that the AMHMABB model is able to predict the characteristics of the

multi-site multi-season streamflows under uncertain future. Also, the AMHMABB model is

found to perform better than the linear multi-site disaggregation model (MDM) in preserving the

statistical as well as the multi-site critical deficit run characteristics of the observed flows.

However, a major drawback of the hybrid models persists in case of the AMHMABB model as

well, of not being able to synthetically generate enough number of flows beyond the observed

extreme flows, and not being able to generate values that are quite different from the observed

flows.

Key words: Stochastic streamflow models; Simulation-Optimization;NSGA-II;Evolutionary

Algorithms;Hybrid matched block bootstrap.


1. Introduction

Starting with Fiering (1964) and Matalas (1967), there have been a number of attempts in

hydrology to model multi-site/multi-variate streamflows. These belong to one of the two basic

types, i) parametric and ii) non- parametric models. A detailed review of the parametric type of

multivariate/multi-site time seriesmodels used in hydrology is presented by Salas et al. (1980),

Salas (1993) and McLeod and Hipel (1994), while the various types of non-parametric models in

use are reviewed by Lall (1995), Lall and Sharma (1996), Srinivas and Srinivasan (2005) and

Salas and Lee (2010).

The parametric type of models may be classified as: i) Periodic vector AR/ARMA models; ii)

contemporaneous AR/ARMA models; iii) Disaggregation models. The PAR / PARMA models

need to estimate a large number of parameters jointly, to account for the periodic space-time

dependence, especially at shorter time scale, with the available historical samples of limited

record length. Moreover, the parameter estimates may be unstable and may lead to poor

reproduction of some of the important statistics. This motivated the development of a simplified

set of models known as contemporaneous AR/ARMA models (CAR/CARMA), wherein the

preservation of dependence structure of concurrent streamflows at the various stations was

effected through model decoupling (Stedinger et al., 1985 and Salas et al., 1985). However, the

complex structure of some of the individual site models could impede the exact preservation of

the spatial cross-correlations of flows. (Rasmussen et al., 1996)..

The need to preserve the statistical properties at more than one level necessitated the

development of disaggregation models in the hydrologic literature(Harms and Campbell,1967;

Valencia and Schaake,1973; Mejia and Rousselle,1976). Preservation of a wide range of


statistical relationships between both multiple time scales and space scales needs accurate

estimation of a large number of parameters in case of disaggregation models which may not be

feasible with the limited hydrologic data available. Hence, staged disaggregation models(Lane,

1982; Stedinger andVogel, 1984; Grygier and Stedinger, 1988; Santos and Salas, 1992) and

condensed disaggregation models (Lane, 1982; Stedinger and Pei, 1982; Pereira et al., 1984;

Oliveira et al., 1988; Stedinger et al., 1985;Grygier and Stedinger, 1988) were developed with a

view to reduce the number of parameters to make them computationally more amenable.

Moreover, empirical adjustment procedures were suggested by Grygier and Stedinger (1988)to

restore the summability of the disaggregated flows to the aggregate flows, especially when

normalizing transformations were applied to flows. The traditional linear parametric models of

streamflows of the AR/ARMA (Box-Jenkins) type including the multi-site parametric

disaggregation models can provide only a linear control system representation of watershed

processes, while the various physical components of streamflow such as snowmelt runoff, soil

water retention as well as soil drainage are dynamic, non-linear processes. Also, non-stationarity

trends owing to the underlying dynamics of the physical processes may not be captured

effectively.

Koutsoyiannis (1999) developed a parsimonious nonlinear multi-variate dynamic disaggregation

model (DDM) that followed a two-step approach for simulation of hydrologic time series.

Following this, a generalized mathematical framework for stochastic simulation and forecasting

problems in hydrology was proposed by Koutsoyiannis (2000) for modeling stochastic processes

with short- or long-term memory structure, in which a generalized autocovariance function was

implemented within a generalized moving average generating scheme. Although the DDM and

the further developments (Koutsoyiannis, 2000, 2001) were reported to reproduce long-term
dependence, and were validated for practical water resourcesuse, the computational complexity

involved was high. Langousis (2006) proposed an approach that directly deals with the

hydrologic data at the seasonal time scale, but still preserves both the seasonal and the annual

statistics and the over-year scaling behavior without restoring to disaggregation techniques.

However, it is reported to be quite complex owing to several steps of nonlinear multi-variate

optimization(Langousis, 2006).Recently, Efstratiadis et al. (2014) have presented a multi-variate

parametric stochastic modeling framework that preserves the important statistical characteristics

of the data at multiple sites and at daily, monthly and annual time scales which also involves a

number of computational complexities concerning multi-variate parameter estimation and the

adjustments and refinements required to reduce the biases in the simulations. The limitations of

the parametric stochastic disaggregation models concerning preservation of complex spatial and

temporal dependence structure and reproduction of the non-standard marginal distributions have

been brought out by Sharma and O'Neill(2002). Recently, copula-based multisite stochastic

simulation models have been proposed by Chen et al. (2015). The spatial and temporal

dependencies were modeled by combining bivariate copulas and conditional probability

distributions. The main advantages of this method are (i) the parameters of the model can be

easily estimated and (ii) the computational time is less.

On the other hand, non-parametric models can provide more accurate representation of the non-

linear dynamics of the physical watershed processes by way of effectively modeling the complex

dependence structure present in the streamflow data. Also, they can successfully mimic the bi-

modality present in the marginal distributions in certain months that may be caused due to

different runoff generating mechanisms. A unique feature of the non-parametric technique is to

reproduce the empirical structure of multi-variate datasets without recourse to assumptions about
data or model structure. Moreover, the complexities associated with parameter estimation are not

experienced. Silverman (1986) discusses a wide range of non-parametric methods, while

Lall(1995) provides a review of the non-parametric techniques applied to a variety of water and

environmental applications. Non-parametric methods have been applied to a wide variety of

hydro-climate modeling problems that include stochastic daily weather generation (Rajagopalan

and Lall, 1999; Yates et al., 2003), streamflow simulation(Lall and Sharma, 1996; Sharma et al.,

1997; Prairie et al., 2006), streamflow forecasting (Grantz et al., 2006; Singhrattna et al., 2005),

and flood frequency estimation (Moon and Lall, 1994). Some of the non-parametric techniques

that are often used in hydrology are: moving block bootstrap (MBB) (Vogel and Shallcross,

1996);k-nearest neighbor (k-NN) bootstrap (Lall and Sharma, 1996) and its variations and

improvements (Prairie et al., 2007; Lee et al., 2010; Salas and Lee, 2010); kernel based methods

(Sharma et al., 1997; Tarboton et al., 1998); and matched block bootstrap (MABB) (Srinivas and

Srinivasan, 2005b).Salas and Lee (2010) have presented a review of the non-parametric models

used in streamflow modeling, clearly bringing out the limitations of each model.

Prairie et al. (2006) proposed a modified k-NN approach that enables the simulation of values

not seen in the historical record, which has recently been improved by Li and Singh (2014)

through the implementation of a multi-model simulation scheme. Also, Salas and Lee(2010)

have employed the k-nearest neighbor resampling algorithm with gamma kernel perturbation to

generate the seasonal data by conditioning the annual data. Although these models perform well

in simulating multi-season streamflows, they are applicable only to modeling single site data.

Prairie et al. (2007) presented a parsimonious non-parametric disaggregation model for space-

time simulation of streamflows at river basin level, extending the single site temporal

disaggregation scheme of Tarboton et al. (1998) by replacing the tedious kernel based methods
with the k-NN approach. Although this method captures the distributional characteristics and the

spatial dependencies well, a number of limitations have been pointed out by Lee et al. (2010)

concerning underestimation of critical drought characteristics and repetitious nature of the data

patterns being generated. Lee et al. (2010) have proposed a spatio-temporal disaggregation

model that generates the higher level variable (e.g., annual flow data) based on any parametric or

non-parametric model, then generates the lower level sequence (e.g., seasonal flow data) by

applying k-nearest neighbor resampling in such a way that their sum is close to the higher level

generated flow data. Moreover, genetic algorithm based mixing is implemented to achieve

variety in the generated data. This multi-site multi-season non-parametric disaggregation model

is reported to yield better simulations than that of Prairie et al. (2007).More recently, based on

the maximum entropy bootstrap (MEB) modeling approach proposed by Vinod (2006) for

economic time series. Srivastav and Simonovic (2014) have developed a computationally less

demanding and simple procedure to model multi-site, multi-season stream flows. The orthogonal

transformation is used with MEB to capture the spatial dependence present in the multi-site

collinear data. Ilich and Despotvic (2008) and Ilich (2014) have developed a three step non-

parametric algorithm for multi-site generation of hydrologic series. This involves the generation

of random variables that reproduce any arbitrary marginal, followed by reordering and permuting

of the generated data such that the serial correlations, cross-correlations, annual level

autocorrelations and the correlations between the end of the previous year and the beginning of

the current year are preserved. Following this, Markovic et al. (2015) have introduced two

modifications to the above algorithm to model high skew and outliers present in the data and to

obtain a number of extreme dry years in the simulated series. In recent times, data-driven models

(Ahmed and Sharma, 2007; Sudheer et al., 2008; Ünes et al., 2015) are also used in the stochastic
hydrology literature to model hydrologic data. However, these prediction models seem to be

limited to single site modeling. Moreover, these models cannot generate data outside the

observed range.

Srinivas and Srinivasan (2000, 2001) introduced hybrid stochastic streamflow models based on

the post-blackening approach proposed by Davison and Hinkley (1997). This approach used a

parsimonious linear parametric model for partial pre-whitening of the observed streamflows,

followed by resampling of the residuals extracted using moving block bootstrap (MBB) to

generate innovations which were then post-blackened to synthesize stochastic replicates of the

observed flows. The single site multi-season HMBB model of Srinivas and Srinivasan (2001)

was extended to multi-site multi-season model by Srinivas and Srinivasan (2005). Moreover,

Srinivas and Srinivasan (2006) proposed the hybrid matched block bootstrap (HMABB) for

modeling single site multi-season streamflows, using the rank matching idea of Carlstein et al.

(1998) for resampling the residuals. In comparison to the low-order linear parametric models and

the HMBB, the HMABB model was shown to provide better simulations of multi-season

streamflows with complex dependence structure. Moreover, the HMABB model is able to yield

sufficient variability of the streamflow characteristics owing to the use of smaller within-year

block sizes. However, the following limitations of the HMBB models seem to be present in case

of the HMABB as well: (i) poor preservation of the statistics at the aggregated time scale which

affects the preservation of the critical drought characteristics at higher truncation levels; (ii) the

smoothing and the extrapolation value added is limited, since the generated flows lie close to the

observed flow values; (iii) the identification of the appropriate hybrid model is quite tedious.

Further improvement to the single site multi-season HMABB model was done by Srivastav and

Srinivasan (2011) by way of automating the selection of the appropriate HMABB model through
a simulation-optimization framework and introducing a constraint into the framework for the

preservation of the streamflow statistics at the aggregated (annual) level. This effected in an

improvement in the better preservation of the storage and the drought characteristics.

The current research study proposes an extension of the simulation-optimization (S-O)

framework developed for single-site multi-season streamflows by Srivastav et al.(2011) to multi-

site multi-season streamflows. Another contribution is the extension of the single-site multi-

season simulation model HMABB proposed by Srinivas and Srinivasan (2006) to the multi-site

multi-season simulation model (MHMABB) for use as the simulation module within the

proposed S-O framework for modeling the multi-site multi-season streamflows. A robust and

efficient evolutionary search based technique, namely, non-dominated sorting based genetic

algorithm (NSGA - II) (Deb et al., 2002) is employed as the solution technique for the multi-

objective optimization within the S-O framework. The multi-objective optimization model

formulated will be the driver and the multi-site, multi-season hybrid matched block bootstrap

model (MHMABB) will be the simulation engine within this framework. The idea of using the

relevant water-use related objective functions explicitly into the simulation-optimization

framework directs the evolutionary search to explore the wide parameter space of the multi-site,

multi-season hybrid model HMABB, subject to the necessary constraints on the hybrid model

parameter space and to find the appropriate hybrid model (described by a combination of

parametric and non-parametric components). In addition to the constraints on the model

parameter space, some specific constraints regarding the preservation of certain statistics (such

as inter-annual dependence or skewness of aggregated annual flows) are introduced explicitly

into the modeling framework with a view to arrive at a hybrid stochastic model with an improved

performance in terms of preserving the statistics at more than one level and consequently,
preserving the multi-site deficit run (drought) characteristics accurately. The efficacy of the

proposed S-O framework in simulating the multi-site multi-season streamflows is shown through

a case example from the Colorado River basin.

2. Simulation-Optimization Framework for Multi-site Multi-season Streamflow Modeling

Simulation-optimization modeling can be defined as the process of finding the best input

variable values from among all possibilities without explicitly evaluating each possibility. The

output of a simulation model is used by an optimization strategy to provide feedback on progress

of the search for the optimal solution. This in turn guides further input to the simulation model

(Carson and Maria, 1997). A comprehensive review on theory and applications of simulation-

optimization modeling has been presented by Tekin and Ihsan (2004). The S-O methodology has

been employed beneficially in a number of research works in the field of water resources

planning and management, which have been documented by Nicklow et al. (2010).

In the last few decades, there has been an increasing interest in using Evolutionary Algorithms

(EAs) in simulation-optimization problems mainly because they do not require restrictive

assumptions or prior knowledge about the shape of the response surface (Back and Schwefel,

1993). Evolutionary Algorithms (EAs) are heuristic search methods that implement ideas from

the evolution process. As opposed to a single solution used in traditional methods, EAs work on

a population of solutions in such a way that poor solutions become extinct, whereas the good

solutions are likely to reach the optimum (survival of the fittest). When the response surface is

high-dimensional, discontinuous, and non-differentiable, the traditional methods may often fail

to find the optimal solution, while methods such as evolutionary algorithms can be applied

successfully to these types of problems (Azadivarand Tompkins, 1999; Pierreval and Paris,
2000).In general, an EA for simulation-optimization can be described as follows: (i) generate a

population of solutions; (ii) evaluate these solutions through a simulation model; (iii)perform

selection, apply genetic operators to produce a new offspring (or solution), and insert it into the

population; and (iv) repeat until some stopping criterion is reached. From the literature, the most

popular EAs are known to be Genetic Algorithms (GAs) (Goldberg, 1989). In general, each point

in the solution space is represented by a string of values for the decision variables. The use of

appropriate cross-over and mutation operators reduces the probability of trapping to a local

optimum. The elitism property enables the carry-over of competent solutions through successive

generations.

2.1. Application of GA in Time Series Modeling

Rolf et al. (1997) stated that the aim of combining traditional ARMA modeling knowledge and

evolutionary algorithms would be to provide a tool that would be able to automate the three step

process of time series modeling. Following this, in the last decade, a few research studies (Cortez

et al., 2004; Voss and Feng, 2002; Minerva andPoli, 2001; Peng and Chen, 2003; Ong et al.,

2005; Chen et al., 2002) have employed evolutionary search algorithms for automating the three-

step time series modeling approach of Box-Jenkins ARMA models. The above research studies

bring out the efficacy of evolutionary techniques in model identification and parameter

estimation of Box-Jenkins type of models (AR, ARMA, ARIMA, SARIMA, FARIMA). For

model identification, fitness functions such as AIC, BIC are used in the GA framework, while

some form of statistical performance criteria (such as minimization of sum of squared errors,

maximization of likelihood functions) are used in case of parameter estimation. In case of non-

parametric models (such as k-NN, Kernel based models), the fitness function can be to minimize

the generalized cross-validation score. However, in case of the more complex multi-site, multi-
season hybrid models, no such statistical criteria are available for model identification and

parameter estimation. Hence, it has been decided to adopt (employ) water-use (reservoir

storage/drought) related criteria (mentioned in the following section) as the objective functions in

the Multi-objective GA (MOGA) based framework proposed in this study. Incidentally, these

criteria can be expected to preserve the basic statistical characteristics (such as summary

statistics, marginal distributions). It is to be mentioned that this approach has already been

successfully applied to single-site, multi-season streamflow modeling by Srivastav and

Srinivasan (2011).

2.2. Simulation-Optimization Modeling Framework

The proposed simulation-optimization (S-O) modeling framework for multi-site multi-season

streamflow modeling is shown in Fig. 1. It consists of the multi-objective optimization model (as

the driver), and the multi-site multi-season hybrid matched block bootstrap model MHMABB

developed in this study as the simulator embedded into it. The multi-site multi-season hybrid

simulation model MHMABB is an extension of the single-site HMABB model proposed by

Srinivas and Srinivasan (2006).

Figure 1: Simulation-Optimization Framework for Stochastic Modeling of Multi-site

Multi-season Streamflows

As discussed earlier, the S-O modeling framework primarily aims to enhance the performance of

the hybrid stochastic models in simulating the streamflows for water resources planning use. The

secondary aim of the framework is to minimize the drudgery, judgment and subjectivity involved

in the selection of the most appropriate hybrid stochastic model. The special features introduced

into the S-O framework to achieve the above are: i) critical water-use related objective functions
in the driver of the framework; ii) a powerful multi-objective evolutionary search based tool

(NSGA-II) (Deb et al., 2002) to explore the huge hybrid model parameter space and obtain a set

of competent hybrid stochastic models automatically; and iii) a constraint to enable the

preservation of the inter-annual dependence, which may be helpful in the preservation of the

critical water-use related statistics at higher truncation levels.

2.2.1. Multi-site Multi-season HMABB as the Simulator

In this study, the single-site multi-season hybrid matched block bootstrap (HMABB) proposed

by Srinivas and Srinivasan (2006) is extended to multi-site multi-season hybrid matched block

bootstrap (MHMABB). The hybrid model effectively blends the parametric component (the low-

order PAR(1)model at each site) and the non-parametric component (multi-site multi-season

matched block bootstrap). The proposed extension of the simulation algorithm is presented

below.

Proposed Algorithm for MHMABB

Let the time series of historical streamflows be denoted by the vector where the superscript k

denotes the site index (k = 1,...,nk), v is the index for year (v = 1,...,N) and denotes the index for

season (period)within the year ( = 1,...,ω); nk refers to the number of sites; N represents the

number of years of historical record and ω denotes the number of periods within the year. The

modeling steps are as follows:

1. Standardize the elements of the historical streamflows, i.e., the vector using

(1)
where and represents the mean and the standard deviation respectively, of the observed

streamflows in the period at the kth site. Note that the historical streamflows are not transformed

to remove skewness.

2. Pre-whiten the standardized historical streamflows, , partially, using a parsimonious

periodic model PAR(1), and extract the residuals, , at each

site k, using

(2)

where is the first order periodic autoregressive parameter for period , at the kth site. The

purpose of partial pre-whitening using a parsimonious PAR(1) structure at each site is to utilize

the potential of the proposed non-parametric component, multi-site multi-season MABB, that

can capture the weak linear dependence structure and the non-linear dependence structure

present in the multi-site multi-season residuals effectively.

3. Obtain one set of replicates of the simulated innovations

at eachsite k, by contemporaneous resampling of the multi-site, within-year non-

overlapping blocks of residuals, using the proposed multi-site rank-matched block bootstrap

(MABB) method. The key steps involved in the resampling algorithm are as follows:

(a) For each site k, prepare n non-overlapping within-year blocks (such as ) using

the residuals with the respective lengths being L1,...,Ln such that the lengths of all the within-

year blocks sum to ω, i.e., . Note that the lengths of all the within-year blocks are the

same for all the sites to enable resampling of contemporaneous blocks of residuals, so that the

site-to-site cross-correlations (dependence across the sites) are captured. Herein, denotes the
ith within-year block for the year v of the record, at site k. Let denote the end elementof .

Form the sets

(b) For the contemporaneous selection of the within-year blocks, the end elements of the block i

for each site k, has to be combined by using an appropriate strategy, to obtain a fictitious

contemporaneous end element. In this research work, the strategy based on the Euclidean

distance (ED) is adopted and presented here. Form the sets where

isthe contemporaneous end element which can be obtained using

(3)

(c) Arrange the elements of in ascending (or descending order)of their magnitude and assign

ranks. Let denote the contemporaneous rank of , where and . The

algorithm is initialized by randomly selecting one of the “N” first within-year blocks

contemporaneously. Let it be the current contemporaneous within-year block for all the sites.

The following are the steps in the resampling algorithm:

i. Identify the rank corresponding to the current contemporaneous within-year block. Let it be

denoted by .

ii. Select all the contemporaneous end elements whose ranks fall within a bandwidth w (= 2m+1),

ranging from and , where m is the window parameter which is a small positive

integer. These form the set of nearest neighbors to the current contemporaneous end element

(which has rank ). From this, randomly select one of the neighboring contemporaneous end

elements. This requires generating a uniform random number "U" in the range of integers

and .
iii. Obtain the contemporaneous within-year block that follows the selected contemporaneous

within-year block (which corresponds to the contemporaneous end element selected in (ii)) and

append it to the current within-year block. It is to be noted that the appending of the

corresponding neighboring contemporaneous within-year block is to be done for all the ‘k’ sites.

iv. The recently appended contemporaneous within-year block becomes the new current

contemporaneous within-year block. This holds for all ‘k’ sites.

v. To generate more innovations, repeat steps from (i) to (iv) till the desired length of one

replicate of simulated innovations ( is obtained.

4. Post-blacken the resampled innovation series, to obtain the standardized synthetic streamflows

(4)

5. Inverse standardize to obtain synthetic streamflow series .

Note that, for k = 1, this algorithm reduces to single-site multi-season hybrid matched block

bootstrap model (HMABB) proposed by Srinivas and Srinivasan (2006).

The use of short contemporaneous within-year block sizes ensures reasonable amount of

variability in the synthetic replicates to be generated at various sites. Moreover, the site-to-site

dependence is preserved due to the contemporaneous resampling of the within-year blocks.

While, the window size selected based on the rank matching approach ensures that one of the

nearest neighbors to the current within-year block is selected and appended.


2.2.2. Multi-Objective Optimization Model

Conventional wisdom in stochastic modeling of streamflows suggests that if a stochastic

streamflow model preserves the summary statistics, the marginal distributions and the

dependence structure present in the historical streamflows well, then, it is likely to preserve the

water-use characteristics such as the storage capacity and the critical drought characteristics.

However, there is no explicit proof for this and there is no general functional relationship

between the accuracy of preservation of the water-use characteristics and the accuracy of

reproduction of the basic statistical characteristics of streamflows and/or the stochastic model

parameters. Moreover, in case of hybrid models, there are no statistical criteria (such as AIC,

BIC) for the selection of the hybrid model parameters. On the other hand, manually exploring the

huge parameter space of the multi-site multi-season hybrid model (MHMABB) through a large

number of simulations to find the best hybrid model, would involve drudgery and subjectivity.

Given this, it would be pragmatic to formulate a simulation-optimization (S-O) modeling

framework that would explicitly relate the objective functions based on the accuracy of

preservation of the water-use characteristics (such as critical drought characteristics) to the

hybrid model parameter space.

All extreme (or critical) streamflow droughts encounter large deficits. On the other hand, a long

drought duration may not necessarily signify an extreme (or critical) drought if the

corresponding deficit volume encountered during the drought event is not large. Likewise, a low

mean discharge may not indicate necessarily an extreme drought if its duration is short. The

variation of drought duration is primarily governed by climate, while the deficit volume is more

related to catchment characteristics. According to Zelenhasic and Salvai (1987) and Zelenhasic

(1997), the stochastic process of streamflow droughts can be described by nine descriptive
parameters, of which the critical drought deficit volume is the most informative parameter.

Hence, critical drought deficit volume may be considered to be the essential and single pivotal

characteristic that effectively represents the process of critical streamflow droughts. Hence, the

efficacy of preservation of the critical drought deficit volumes estimated from the historical

streamflows corresponding to various pre-specified truncation levels, is vital for the effective

stochastic simulation of streamflows.

The streamflow drought characteristics are often described using the theory of runs (Yevjevich,

1967). Specifically negative runs of streamflow sequences with respect to a specified truncation

level, represent deficit conditions. A number of stochastic models preserve the deficit run

(drought) characteristics either at lower or higher truncation levels, but not both. But, a good

synthetic streamflow model is expected to preserve the run characteristics with minimum bias

and root mean square error (overall truncation levels considered) when compared with the

corresponding estimates from the historical streamflows, while ensuring sufficient variability to

account for future uncertainty. Quite often, if the bias of the estimate is reduced, then the

variance of the same may increase and vice-versa. If only the R-RMSE related objective function

is used, then, the hybrid stochastic model identified may have minimum ∑R-RMSE(MARS), but

may result ina high value of ∑|R-Bias(MARS)|, which is not desirable at all. Hence, in this

research work, i) Minimize the sum of absolute values of the relative Bias in the preservation of

the multi-site critical deficit run sum over all truncation levels considered; ii) Minimize the sum

of relative RMSE in the preservation of the multi-site critical deficit run sum over all truncation

levels considered are employed as the objectives in the simulation-optimization framework

proposed for the multi-site multi-season hybrid stochastic streamflow modeling.


2.2.3. Mathematical Formulation

Objective Functions

Based on a detailed exploration of the use of different plausible water-use related objective

functions, the following two objective functions are proposed within the framework: (i)

Minimize the aggregated relative bias and (ii) Minimize the aggregated relative RMSE, in the

preservation of the maximum multi-site deficit run sum (MARS) over the truncation levels

varying from 50% to 95%of the historical mean monthly flow (MMF) at intervals of 5% MMF.

(5)

(6)

in which (7)

(8)

where is the estimated MARS based on the historical streamflows at the ith truncation

level. The maximum run sum (MARS) is expressed as: MARS = max(ds1, ds2,…,dsnr), where the

multi-site run-sum for a specified truncation level and run is defined as: wherein

j denotes the run number, k refers to the site number, nk denotes the total number of sites being

modeled and nr denotes the total number of runs. In eq. 7, E [ ] is the mean value of

MARS corresponding to the ith truncation level, estimated over Nr synthetically generated

replicates and is expressed as:

(9)
In eq. 8, var[ ] is the variance of MARS at the ith truncation level estimated over the Nr

synthetically generated replicates and is expressed as:

(10)

It is possible to use other water use objective functions in place of the two objective functions

mentioned (eqs. 5 and 6).

Constraints

Constraints on Model Parameters: Certain constraints are developed within the proposed S-O

framework to describe the model parameter space.

Parametric Component: In the simulation-optimization multi-season hybrid model proposed in

this study, the partial pre-whitening is done using a parsimonious parametric model, namely,

periodic autoregressive model of order 1 (PAR(1)). This means that the parameter space of the

parametric component of the multi-season hybrid model is defined by the range of values taken

by the periodic autoregressive parameter of order 1, . For the stationarity condition, the roots

of the characteristic equation must lie within the unit circle. However, in most practical situations

of stochastic modeling of the hydrologic random process "streamflow", the physical

considerations suggest that the lag-1 serial correlation coefficient (ρ1) be positive, which means

that (Hipel and Mcleod, 1994). Accordingly, the following constraint on the first

order PAR parameter ( ) has been introduced into the simulation-optimization framework:

(11)

where, refers to the periodic autoregressive parameter of order ‘1’ for month ‘ ’ at site k.
Non-Parametric Component: In the proposed framework, the multi-site multi-season MABB

model has been used as the non-parametric component. The conditional resampling is done

contemporaneously on non-overlapping within-year blocks formed from the residuals at each

site. The parameters of the multi-site multi-season MABB model are: (i) the non-overlapping

within-year block sizes and (ii) the band width. In case of within-year blocks, there exist a large

number of possible combinations of non-overlapping block sizes. However, the sum of all the

within-year blocks should be equal to the total number of periods within a year (ω= 12 for

monthly), i.e.,

L1 + L2 + . . . + Ln = ω (12)

Further, in case of selection of bandwidth (w), it is observed from various trials that adopting

large `w', increases the bias in the preservation of historical dependence structure and in the

prediction of storage capacities at different demand levels. While, adopting a low ‘w’ leads to the

reduction in variety of replicates in synthetic generation (Srinivas and Srinivasan, 2006).Hence,

based on the experience gained by the authors in modeling periodic streamflows of various rivers

using multi-season HMABB hybrid models, the bandwidth is restricted to fall between 3 and 13.

3 ≤ ω≤ 13 (13)

Constraints on Statistical Characteristics. In this research work, the issue of preserving the inter-

annual dependence is addressed by introducing an explicit constraint that can ensure the

preservation of the dependence at the aggregated annual level. This is done through a constraint

on R-bias in preserving the lag-1 correlation of flows at the aggregated annual level, which is

usually effective in modeling the inter-annual dependence. This is expected to enable the

preservation of the various statistics at the aggregated annual level, and as a result, enhance the
preservation of storage capacity at higher demand levels. In general, the modeler can introduce

any appropriate constraints into the S-O framework explicitly, depending on the statistics to be

preserved, either at the monthly level and/or the annual level.

(14)

where denotes the basic periodic statistical characteristics(s) at any sitek (such as mean,

standard deviation, skewness of month) and is theallowable upper limit of the relative bias

(that can be specified by the modeler), for each month ( ) for each statistical characteristic

considered foreach site k. In eq. (14), represents the basic aggregated annual statistical

characteristic at any site, k (such as mean, standard deviation, skewness and autocorrelation),

while, is the allowable upper limit of the relative bias at each site, k (that can be specified by

the modeler), for each statistical characteristic (A) considered at the aggregated annual level. In

addition, represents the site-to-site correlations and denotes the allowable upper limit of

the relative bias (that can be specified by the modeler), for the site-to-site correlations.

The hybrid model parameter space of the multi-site multi-season hybrid streamflow model

(MHMABB) consists of two components, i.e., the parametric component at each site and the

non-parametric component, and is quite huge. The parameters of the parametric component of

the model can take combinations of real values within the unit circle resulting from multiple sites

and multiple seasons (12 in case of monthly modeling). The non-parametric component, matched

block bootstrap, contains bl number of within-year blocks and m number of window sizes. The
sizes of each of these blocks can take any integer value between 1 and 12, such that the sum of

all such within-year blocks equals 12 and a reasonable range of band width can be from 3 to 13.

Thus, the total number of combinations of HMABB models possible considering the parametric

and non-parametric components together, will be quite large.

2.2.4. Solution Technique - NSGA-II

There is no known explicit functional relationship between the accuracy of preservation of the

critical drought characteristics and the reproduction of basic statistical characteristics of

streamflows and/or the stochastic model parameters, especially for the complex hybrid stochastic

model, HMABB, considered in this study. Hence, traditional optimization techniques cannot be

employed to find the optimal hybrid model. Moreover, the hybrid parameter space is too large

and complex to be explored using only simulations. In such problems, multi-objective

evolutionary algorithms (MOEA) are known to be appropriate, since the objective functions can

be explicitly evaluated by interacting with the simulation model. Moreover, their inherent ability

to handle complex problems, including features such as discontinuities, multi-modality, disjoint

feasible space and noisy functions makes MOEA appropriate for complex real world problems

(Fonseca and Fleming, 1995).Also, these algorithms are efficient and can obtain a number of

non-dominated solutions from a random initial population in a single run (Deb et al.,2002).

Moreover, both discrete and continuous variables can be handled together simultaneously such

as in case of the hybrid parameter space (block sizes and window size being discrete and

parameters of the PAR(1) model being continuous).In the proposed simulation-optimization

framework (Fig.1), the multi-objective evolutionary technique, Non-dominated Sorting Genetic

Algorithm - II (NSGA-II) developed by Deb et al. (2002) is adopted. Although the number of

alternative hybrid models to be searched appears to be very large, the NSGA-II based genetic
search used in this research work, being an efficient, robust and elitist non-dominated search

based approach, converges to the near Pareto-optimal solutions within reasonable number of

evaluations.

The decision vector consists of both discrete and continuous variables represented within the

NSGA-II string as a chromosome. All the variables are coded in binary strings to represent both

the parametric component (such as ϕ1, ϕ2,. . ., ϕ12 of PAR(1) model defined in a continuous space)

and the non-parametric model parameters (such as one window parameter and within-year block

sizes of MABB model defined in a discrete space). In the decision vector, the first discrete

variable in a chromosome represents the contemporaneous window parameter for the multi-site

multi-season HMABB model. The next twelve discrete variables represent the within-year block

sizes of a maximum possible 12 blocks. The sum of within-year block sizes should be equal to

12 (total number of months in a year). The selection of the within-year block sizes is made in

such a way that the aggregated sum of the sizes of the within-year blocks equals 12.If the number

of within-year blocks is less than 12, then, the remaining number of variables (out of 12) are set

as dummy variables. Moreover, in case, the sum of the within-year block sizes happens to be

greater than 12 in any of the chromosomes (in a given population), then that chromosome is not

allowed to pass through the hybrid model simulator and instead a large positive value is assigned

to the fitness function in order to eliminate that string. The next 12k number of continuous

variables in the chromosome represent the parametric component (PAR(1)) corresponding to the

k number of sites of the multi-site multi-season HMABB model.


2.2.5. Functioning of the Simulation-Optimization Framework

To evaluate the fitness functions based on the reservoir storage statistics, the generated

chromosomes from NSGA-II (each chromosome represents a multi-site HMABB model) are sent

to the synthetic simulation module. Once the synthetic replicates are generated, the simulation

module computes the required statistics (summary statistics, distribution related statistics,

correlations, storage capacity required at the specified demand levels) and sends the same to the

NSGA-II module to evaluate the fitness functions and the constraints formulated. Based on the

fitness function values evaluated, the solutions are then sorted according to the fast elitist-based

non-dominated approach (Deb et al., 2002) to identify the different levels of non-dominated

fronts. The generation/reproduction based on tournament selection, will pick only the best among

the existing population. The cross-over and the mutation operations are performed to introduce

variability among the generations. To handle both discrete and continuous variable space,

uniform cross-over operator is adopted. The crowded comparison operator enables the diversity

preservation and the elitism operator helps in significantly speeding up the search process and

preserving the good non-dominated solutions. For further details on the NSGA-II approach and

the genetic operators used, the readers are referred to Deb et al. (2002).

3. Simulation based Multi-site HMABB (SMHMABB) Model

In this research work, the single site multi-season HMABB model proposed by Srinivas and

Srinivasan (2006) has been extended to multi-site multi-season HMABB (MHMABB) model.

The modeling steps involved in the synthetic generation of streamflows using the proposed

multi-site multi-season HMABB model are presented in section 2.2.1. The simulation based

MHMABB models are herein referred as SMHMABB models. The SMHMABB model building
is divided into two stages. In stage 1, the parametric model parameters for each site are obtained

using method of moments followed by stage 2 in which the parameters of the nonparametric

models (i.e., the contemporaneous block size and window size) is selected by numerous trials

based on the overall performance of the model. It is to be mentioned that in this study, only equal

within-year blocks sizes (1, 2, 3, 4, and 6) and the window sizes 3,5,7,9,11 and 13 are tried.

Thus, if equal within-year block sizes are used, then the total number of combinations of both the

parametric and the nonparametric components results in 30 hybrid models. It is to be noted that

if unequal within-year block sizes are to be used, then the total number of hybrid models will be

quite large and the manual inspection and selection will be extremely tedious. In fact, the

parameter space of the MHMABB model is huge, and the same is under-explored in case of the

simulation based MHMABB (SMHMABB) model, since the selection of the model parameters

for both the parametric and the non-parametric components of the SMHMABB model is

obtained independently and the residual space explored by the non-parametric model is limited.

The drawbacks of the simulation based hybrid models can be summarized as: (i) Joint parameter

space exploration is not done; (ii) conditioning of variables for the reproduction of statistics at

the aggregated level is not possible; (iii) the manual effort involved in inspection and selection of

the alternate hybrid models is enormous.

4. Application to Upper Colorado River Basin

The efficacy of the AHMABB model obtained from the proposed S-O framework in modeling

the multi-site multi-season streamflows is evaluated by applying to the monthly streamflows

measured at four streamflow stations located on the Upper Colorado River basin. The monthly
naturalized streamflows at the following four streamflow gauging stations for the 102-year

period (1906 to 2007)(source:http://www.usbr.gov/lc/region/g4000/NaturalFlow/index.html) are

considered for the multi-site multi-season streamflow modeling application: Colorado River near

Cisco, Utah (site 1); Green River at Green River, Utah (site 2); San Juan River near Bluff, Utah

(site 3); and Colorado River at Lees Ferry, Arizona (site 4). The location of the stations are

presented in Table 1and Fig.2. These streamflow data sets have been chosen for the study

because they exhibit complex dependence, and also bimodality in a few months. Also, these

bench-mark data sets have been used by Prairie et al. (2007) and Salas and Lee (2010)for multi-

site multi-season modeling of streamflows.

Figure 2: Location of Streamflow Stations - Colorado River Basin (source: Google Maps)

Table 1: Location of the selected stations for the multi-site multi-season flow modeling

The efficacy of the AMHMABB model is shown through: (i) a comparison with the selected

simulation based hybrid model (SMHMABB), in order to bring out the advantages of the

proposed model in terms of model performance due to the automation achieved by the S-O

framework and the preservation of inter-annual dependence; (ii) a comparison with the multi-site

parametric disaggregation model (MDM) fitted using SAMS2007 (Sveinsson et al., 2007), a

state-of-the-art stochastic streamflow modeling package; and (iii) a split-sample validation test to

assess the performance of the proposed AHMABB model in capturing the statistics of the multi-

site streamflows that may occur in the uncertain future.

The performance comparisons are based on the ability of the models to preserve the following

statistics: (i) summary statistics (mean, standard deviation and skewness coefficient) at within-
year (monthly) and aggregated annual time scales at each site; (ii) marginal distribution of

monthly flows at each site; (iii) lag-1 autocorrelation at aggregated annual level at each site; (iv)

monthly serial correlations at each site; (v) serial state-dependent correlations (Sharma et al.,

1997) of monthly flows at each site (representing nonlinear dependence); (vi) lag-zero site-to-site

correlations at the monthly level; (vii)minimum and maximum monthly flows at each site; and

(viii)the multi-site deficit run characteristics (Yevjevich,1972; Haltiner and Salas, 1988)

expressed in terms of (a) maximum deficit run sum; (b) maximum deficit run length; (c) mean

deficit run sum; and (d) mean deficit run length.

4.1. Models Considered and Selection of Model

For the AMHMABB and the SMHMABB, the details of the models considered for the selection

and the selected model for comparison are discussed in the following paragraphs.

AMHMABB model: Since this model is to be obtained from the S-O framework based on the

multi-objective evolutionary search using NSGA-II (Deb et al., 2002), a sensitivity analysis is

performed for the application example considered in this study. The sensitivity analysis on

MOGA-parameters (population size, number of generations, cross-over probability, mutation

probability and random seed) has been carried out with an intention to obtain the non-dominated

Pareto-optimal solutions. The MOGA parameters adopted based on the sensitivity analysis are as

follows: population size = 100; number of generations = 300; probability of cross-over = 0.6;

mutation probability = 0.001; random seed = 0.3. The non-dominated front obtained for the

application example using the evolutionary search based technique NSGA-II (Deb et al., 2002),

is presented in Fig. 3. In Fig. 3, the solutions A and C represent the AMHMABB models
corresponding to the two extremes on the non-dominated front, one with the "minimum ∑|R-

bias(MARS)|" and the other with the "minimum ∑R-RMSE(MARS)" respectively; the solution B

represents the AMHMABB model that corresponds to a typical compromising solution between

the two extremes. The compromising solution is the one that is located closest to the origin on

the pareto-front presented in Fig. 3. It is to be noted from Fig. 3 that the Pareto-front has a

narrow range, resulting in practically very close solutions, which is plausibly due to the inter-

annual dependence constraint introduced into the framework. Hence, in this study, only the

AMHMABB-A solution is used for the comparisons and the same will be hereafter referred as

AMHMABB.

SMHMABB model: While the parametric component of the SMHMABB model is restricted to

PAR(1) at all the sites for partial pre-whitening, the non-parametric components of the

SMHMABB model are picked from one of the combinations resulting from: (i) equal sized

within-year contemporaneous blocks of 1,2,3,4,6 months and (ii) window sizes of 3,5,7,9,11,13.

It is to be noted that the PAR(1) model parameters are estimated independently at each station

using the method of moments and the within-year block sizes adopted for resampling the

residuals are equal and contemporaneous, since the unequal block sizes result in a large number

of possible hybrid models, which will be too cumbersome to evaluate manually. The above

combinations of parametric and non-parametric components result in 30hybrid models. For the

purpose of comparison, the most competent model is chosen based on the reproduction of all the

temporal as well as the spatial statistics and the preservation of the deficit run characteristics.

The SMHMABB model selected herein for the Colorado river basin has the PAR(1) model at

each site as the parametric component and the contemporaneous within-year block size of 4
months and the window size of 9 as the nonparametric model components used for resampling

the residuals (Table2).

Table 2: Parameters for the selected multi-site HMABB models SMHMABB and

AMHMABB-Colorado River Basin

Figure 3: Pareto-front between R-Bias (MARS) and R-RMSE (MARS)

4.2. Comparison with SMHMABB Model

A comparison of model performance between the selected AMHMABB model and the selected

SMHMABB model is presented in the next few paragraphs. Table 2 summarizes the parameters

of the selected SMHMABB and AMHMABB models for Colorado River Basin, from which it

can be observed that the parameters of the AMHMABB model (both the parametric component

and the non-parametric component) are quite different from those of the SMHMABB model for

the streamflows at all the sites. This is because, in case of the SMHMABB model, the periodic

parameters (parametric component) are obtained at each site (independently) by fitting a PAR(1)

model using the method of moments (SAMS 2007). Following this, the multi-site residuals are

contemporaneously resampled using a multi-site HMABB with a set of equal within-year block

sizes and a pre-selected window size. Subsequently, the post-blackening operation is performed.

Thus, there are multiple steps and these steps have to be sequentially performed. While, in case

of the AMHMABB model, the parameters of the parametric component are simultaneously

obtained in combination with contemporaneous non-parametric component using the multi-

objective evolutionary search technique, NSGA-II, efficiently guided by objective functions that

are based on multi-site critical deficit run sum preservation and the constraint on preservation of

inter-annual dependence. Moreover, in case of the proposed framework, a huge parameter space
is available for the search (unlike the simulation based SMHMABB model). As mentioned

earlier, the AMHMABB model yielding the minimum ∑|R-bias(MARS)| from the pareto-optimal

front is used for the comparison with the SMHMABB model.

4.2.1. Reproduction of Summary Statistics and Dependence Structure

For both the AMHMABB and the SMHMABB models, the reproduction of summary statistics

and the preservation of the serial correlations at monthly level are presented in Figs. 4 and 5

respectively. For brevity, the results are shown for only two sites of the Colorado river basin,

since a similar trend of results are observed at the other two sites. The reproduction of the

summary statistics and the lag-1 autocorrelation of the aggregated annual flows are presented in

Table 3. The summary statistics of the flows are well reproduced by both the models at the

monthly level (Fig. 4) and at the aggregated annual level (Table 3). However, the standard

deviation at the annual level is deflated by the SMHMABB model. It is seen from Fig. 5 that the

monthly serial correlations at all the 4 lags are well preserved by the AMHMABB model,

whereas, the SMHMABB model shows considerable bias in preserving the lag-2, lag-3 and lag-4

correlations, and this bias is found to increase with the order of the lag. The lag-1 autocorrelation

at the aggregated annual level is well preserved at all the sites by the AMHMABB model (Table

3), due to the inter-annual dependence constraint introduced into the framework. On the other

hand, the SMHMABB model (Table 3) does not preserve the lag-1 autocorrelation at the annual

level at any of the four sites, since the simulation based hybrid model is not conditioned to

preserve the same. The lag-zero site-to-site correlations (Fig. 6) are well reproduced by both the

models, due to the residual resampling using contemporaneous within-year blocks. Although the

state-dependent correlations (Fig. 7) are well reproduced at all the four sites by both the models,
the SMHMABB model exhibits relatively more bias in a few months in comparison with the

AMHMABB model.

Table 3: Reproduction of the Aggregated Annual Flow Statistics - Comparison between

SMHMABB and AMHMABB models (values in parentheses denote the standard deviation

over 300 replicates)

Figure 4: Reproduction of Summary Statistics - A Comparison between AMHMABB and

SMHMABB Models (flow units: Mm3/month): a) at site 2; b) at site 4.

Figure 5: Preservation of Serial Correlations - A Comparison between AMHMABB and

SMHMABB Models: a) at site 2; b) at site 4.

Figure 6: Preservation of Lag-zero site-to-site monthly correlations - A Comparison

between AMHMABB and SMHMABB Models: a) site 1 to site 3; b) site 1 to site 4; c) site 2

to site 4.

Figure 7: Preservation of state-dependent correlations - A Comparison between

AMHMABB and SMHMABB Models: a) at site 2; b) at site 4.

4.2.2. Preservation of Marginal Distributions and minimum and maximum monthly flows

Typical results of preservation of the marginal distributions of monthly streamflows at site 1 for

July and December months are presented in Figs. 8 and 9 respectively and the same at site 2 for

August month is presented in Fig. 10. For brevity, the results are presented and discussed only

for flows of a few typical months that exhibit peakedness and/or bimodality. In general, it is

observed that the AMHMABB model is able to mimic the distribution characteristics of

historical flows very well (especially the non-normal features such as peakedness and bi-
modality), when compared with the SMHMABB model. It is also observed that at all the four

sites, both the models show limited smoothing as well as extrapolation beyond the extremes

(minimum and maximum flows). The preservation of the minimum and the maximum flows at

the key site 4 is presented only for AHMABB in Fig. 11 since the behavior of SMHMABB is

quite similar. Both the hybrid models do not preserve the minimum and the maximum flows

effectively, since very limited number of flows are generated beyond the historical extremes. In

summary, the automated model obtained from the simulation-optimization framework

(AMHMABB) outperforms the simulation based model SMHMABB.

Figure 8: Preservation of the marginal Distribution of the July month streamflows at site 1-

A Comparison between AMHMABB and SMHMABB Models (flow units: Mm3/month)

Figure 9: Preservation of the marginal Distribution of the December month streamflows at

site 1 - A Comparison between AMHMABB and SMHMABB Models (flow units:

Mm3/month)

Figure 10: Preservation of the marginal Distribution of the August month streamflows at

site 2 - A Comparison between AMHMABB and SMHMABB Models (flow units:

Mm3/month)

Figure 11: Preservation of a) minimum flows and b) maximum flows at site 4 - Model:

AMHMABB (flow units: Mm3/month)

4.2.3. Preservation of Multi-site Deficit Run Characteristics

The results of preservation of the multi-site deficit run characteristics are presented in Fig. 12for

the selected SMHMABB model and the selected AMHMABB model. From Fig. 12, it is
observed that the selected AMHMABB model clearly outperforms the SMHMABB model in

preserving the number of runs, the critical and the mean deficit run characteristics at all the

truncation levels. Also, a good and consistent percent of exceedance of the various deficit run

characteristics (compared to their historical flow counterparts) is noted in case of the generated

streamflows from the AMHMABB model, which is not shown here for brevity. It is to be noted

that the selected AMHMABB model is able to preserve the critical run sum accurately owing to

the objective functions adopted (that are explicitly related to∑|R-bias(MARS)| and ∑R-

RMSE(MARS), over various truncation levels). On the other hand, the SMHMABB model

shows high bias either at lower and/or higher truncation levels. Herein, it is to be mentioned that

although no objective functions/constraints are introduced into the S-O framework with regard to

preserving the other deficit run characteristics (such as number of runs, maximum run length,

mean run sum and mean run length), these are well preserved by the selected AMHMABB

model, when compared with the simulation based SMHMABB model (Fig. 12).

Figure 12: Preservation of historical drought characteristics - A comparison between

SMHMABB and AMHMABB models

The automated multi-site hybrid model AMHMABB proposed in this study scores over the

simulation based hybrid model SMHMABB plausibly due to the more effective combination of

the parametric and the non-parametric components owing to the exploration of the huge

parameter space of HMABB enabled by the objective function that minimizes the aggregated

errors in the preservation of the multi-site critical drought magnitude over a wide range of

threshold levels, subject to the constraint for the preservation of the inter-annual dependence of

the simulated streamflows at the various sites. The provision for unequal within-year block sizes
in the structure of the AMHMABB model is expected to offer a better representation of the

short-term persistence due to the recession of the seasonal ground water flows in the sub-basins

considered and the pronounced seasonality due to seasonal storage in snow packs. Moreover, the

implementation of the constraint on conditional preservation of the inter-annual (long-term)

dependence of the streamflows at the various sites considered, is expected to represent the over-

year response time of deep ground water runoff more accurately (Claps and Murrone, 20??). In

general, the better preservation of the streamflow drought durations at the various sites exhibited

by AMHMABB is indicative of the better representation of the storage and the response times of

the different catchments considered in this study. The observation that both the multisite drought

durations and the drought severities (magnitudes) at various threshold levels are better preserved

by the AMHMABB model indicates that the non-linear dynamics behind the propagation of the

streamflow droughts is better captured (Loon and Laaha, 2015).

4.3 Comparison with Multi-site Parametric Disaggregation Model (MDM)

The following paragraphs bring out the performance comparison between the selected

AMHMABB model obtained from the proposed S-O framework and the multi-site parametric

disaggregation model (MDM). Twenty-eight different multi-site parametric disaggregation

models arising out of the combinations resulting from: i) aggregated annual model at the key

site; ii) the available schemes for the spatial and the temporal disaggregations; and iii) the

sequence of disaggregation adopted, are fitted using SAMS 2007. The best model based on the

preservation of spatial and temporal statistics as well as deficit run characteristics, is selected.

The selected MDM model adopts AR(1) for generation of the aggregated annual flows at the key

site, Lees Ferry on Colorado river (Table 1, Fig. 2),Valencia and Schaake model for spatial
disaggregation of the generated annual streamflows at the key site, followed by temporal

disaggregation using Lane’s model. For brevity, only the results for the key site (site 4) are

presented in this section, since similar trend of performance is observed for the other three sites

as well. For both AMHMABB and MDM, the reproduction of the summary statistics,

preservation of serial correlations, preservation of state-dependent correlations, marginal

distributions at the key site (site 4) and multi-site drought characteristics are presented in Figs

13-17, respectively.

Fig 13: Reproduction of Summary Statistics for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model (Flow units -
Mm3/month)

Fig 14: Preservation of Serial Correlations for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model

Fig 15: Preservation of State-dependent Correlations for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model

Fig 16: Preservation of Marginal Distribution of March and June month flows at Site 4 for
Colorado River Basin - A comparison between Disaggregation model (MDM) and
AMHMABB models (Flow units - Mm3/month)

Fig 17: Preservation of multi-site drought characteristics (a) Number of runs; (b)
Maximum Run Length; (c) Maximum Run Sum for Colorado River Basin - A comparison
between Disaggregation model (MDM) and AMHMABB models

It is observed from Figure 13 that the monthly mean and standard deviations of flows at Lees

Ferry (Site 4) are well reproduced by both the models, although the MDM model exhibits some

bias in a few months in terms of preserving the skewness coefficient. A detailed performance

comparison of preservation of monthly serial correlations has been done, in this study, but for

brevity, the results are presented only for lag-1, lag-2 (lower lags), lag 3 and lag-4 (higher lags)

monthly serial correlations in Figure 14. In case of AMHMABB models, it is observed that both

the lower lag (lag-1 andlag-2) and the higher lag (lag-3 and lag-4) correlations are well preserved
for Site – 4. In case of MDM, it is observed that the lower lag serial correlations are reasonably

well preserved. While, the higher lag serial correlations are not preserved by the disaggregation

model. This is because the selected MDM is not designed to preserve the serial correlations

beyond lag-1 at the seasonal level. The results for the preservation of the lag-1 state-dependent

correlations for the Site 4 is presented in Figure 15. It is observed from these figures that in

general, the AMHMABB model is seen to reproduce the monthly lag-1 state-dependent

correlations very well. On the other hand, it is seen that the disaggregation models fail to

preserve the same. Being a linear parametric model, the MDM is not expected to preserve the

state-dependent correlations that are indicative of the non- linear dependence present in the data

(Sharma et al., 1997).

Typical results of preservation of the marginal distribution of flows of the Colorado River Basin

for the AMHMABB model and the multi-site disaggregation model (MDM) are presented in

Figure 16. For brevity, the results are presented only for the March and June month flows. It is

observed, that the AMHMABB model is able to mimic the distribution characteristics of

historical flows very well, when compared to MDM. However, being a parametric model, MDM

offers better smoothing when compared to AMHMABB (hybrid model). Moreover, it is

observed that, both the models show only some limited extrapolation near minimum and/or

maximum flows. It is to be noted that the selected AMHMABB model is found to preserve the

statistical characteristics of multi-site multi-season streamflows better than the selected multi-site

disaggregation model (MDM), although the objective functions used in the proposed S-O

framework for the AMHMABB model are based on the preservation of the multi-site critical

deficit run sum.


The results for preservation of the multi-site critical deficit run characteristics is presented in

Figure 17 for both AMHMABB and MDM. From Figure 17, it is observed that the selected

AMHMABB model is able to preserve the deficit run characteristics better compared to the

MDM. It is to be noted that although there are no objective functions/constraints used to achieve

the preservation of the other deficit run characteristics (such as number of runs and maximum

run length), these characteristics are also well preserved by the AMHMABB model when

compared to the MDM. Overall, it is observed that the selected AMHMABB model shows better

performance in simulating the historical streamflows of the Colorado river basin, when

compared with the selected disaggregation model (MDM). The better performance of the

AMHMABB model in comparison with the MDM may be attributed to the better preservation of

the marginal distributions, the higher lag serial correlations, skewness coefficient of aggregated

annual flows and the state-dependent correlations of the monthly flows.

4.4 Split-sample Validation

Split sample validation is conducted with a view to ensure that the AMHMABB model obtained

from the simulation-optimization framework is able to capture the repeatable statistical structure

present in the historical streamflow data. In other words, this kind of validation will endorse the

adaptability of the proposed model for the possible streamflow sequences that may occur in the

uncertain future. The split sample validation is carried out in two phases, namely, calibration and

validation, using the 102-yearmultisite monthly streamflows measured at the four streamflow

stations located on the Upper Colorado River basin (Table 1, Fig. 2). In the calibration phase, the
first 60 years of the streamflow data are employed in obtaining the parameters of the parametric

and the non-parametric components using the S-O framework. The AMHMABB model obtained

in the calibration stage is then used to model the validation data set (remaining 42 years of

historical streamflows). The stochastic model tested is considered acceptable for practical use, if

it can provide a very good simulation of the measured streamflows at the calibration as well as

the validation phases by way of reproducing the basic statistical characteristics of the streamflow

data as well as preserving both the critical and the mean deficit run characteristics obtained from

the historical flow data, at various truncation levels, with minimum errors. For brevity, only the

results for the Site 4 are presented in this section since similar statistical performance is

observed for the other sites as well.

Fig 18: Reproduction of Summary Statistics for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB

Fig 19: Reproduction of Summary Statistics for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB

Fig 20: Preservation of Serial Correlations for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB

Fig 21: Preservation of Serial Correlations for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB

Fig 22: Preservation of Marginal Distribution of the February and April monthflows for
Colorado River Basin (Calibration and Validation) at Site 4 - Model: AMHMABB

Fig 23: Preservation of multi-site drought characteristics Number of runs and Maximum Run
Sum for Colorado River Basin (Calibration and Validation)- Model: AMHMABB

The results of the reproduction of the summary statistics, the preservation of the dependence

structure of the historical streamflows and the preservation of marginal distributions for the
calibration and validation data sets are presented in Figs. 18-22.-. It is observed from Figures 18

and 19 that the mean and the standard deviation are well reproduced for all the months in case of

both calibration and validation. On the other hand, although the skewness is well preserved in the

calibration dataset, it is slightly deflated during three of the high skewness months (November,

December and January) in case of validation. In the calibration stage, the AMHMABB model is

able to preserve the lower lag (lag-1 andlag-2) as well as the higher lag (lag-3 and lag-4) serial

correlations (Fig: 20) well, although a small bias is noted in a few months, in case of higher lag

cross-year serial correlations. While, in case of validation, some bias is observed in the higher

lag serial correlations (lag-3 and lag-4) in a few months (Fig: 21). It may be observed from

Figure 22 that AMHMABB is able to mimic the distribution characteristics of the monthly

historical flows in both calibration and validation data sets. However, it may be noted that the in

both calibration and validation stages, the AMHMABB model exhibits only very limited

extrapolation value beyond the extremes.

For brevity, only the results of the preservation of number of runs, maximum (critical) run length

and the maximum (critical deficit) run sum for both calibration and validation datasets are

presented in Figure 23. It can be observed that the AMHMABB model is able to preserve the

maximum run sum well at all the truncations considered in calibration as well as validation. This

is expected in case of calibration data set, since the proposed model uses an objective function

that is explicitly related to preserving the critical deficit run sum characteristics. However, it can

be observed, that in case of validation also, the maximum run sum at the various truncation

levels specified are preserved; moreover, the number of runs are well preserved in both the

calibration and the validation stages (Fig. 23), although they are not explicitly specified in the
objective functions or constraints of the framework. It is to be noted that the critical run length is

slightly underestimated at both the calibration as well as the validation stages (Fig. 23). Thus, the

split sample validation performed in this study brings out the ability of the proposed hybrid

multi-season streamflow model (AMHMABB) in predicting the statistical as well as the multi-

site critical deficit run characteristics of the multi-site streamflows likely to occur in future. It

also shows that the hybrid model parameters obtained through evolutionary search using the

maximum run sum based objective functions and the constraint related to the preservation of the

inter-annual dependence, do not result in overfitting the calibration data set.

5. Summary and Conclusions

The simulation-optimization (S-O) framework developed for modeling the single-site multi-

season streamflows by Srivastav and Srinivasan (2011) is extended to modeling the multi-site

multi-season streamflows. Moreover, the single-site streamflow simulation model HMABB

proposed by Srinivas and Srinivasan (2006) is extended to the multi-site streamflow simulation

model and the same is also used as the simulation module within the proposed S-O framework

for modeling the multi-site multi-season streamflows.

The multi-objective optimization model formulated is the driver and the multi-site, multi-season

hybrid matched block bootstrap model is the simulation engine within this framework. In

addition to the constraints on the hybrid model parameter space, some specific constraints

regarding the preservation of certain statistics (such as inter-annual dependence or skewness of

aggregated annual flows) are introduced explicitly into the modeling framework with a view to

arrive at a hybrid stochastic model with an improved performance in terms of preserving the
statistics at more than one level and consequently, preserving the deficit run (drought)

characteristics accurately. A robust and efficient evolutionary search based technique, namely,

non-dominated sorting based genetic algorithm (NSGA - II) (Deb et al., 2002) is employed as the

solution technique for the multi-objective optimization within the S-O framework. The use of the

deficit sum related objective functions and the constraints imposed explicitly into the simulation-

optimization framework apparently enable the evolutionary search to effectively explore the

wide parameter space of the multi-site, multi-season hybrid model HMABB. The efficacy of the

proposed framework in simulating the multi-site multi-season streamflows is shown through a

case example from the Colorado river basin.

The proposed hybrid model AMHMABB preserves the temporal statistics (summary statistics,

linear dependence structure, state-dependent correlations (non-linear dependence) and the

marginal distributions at each site at monthly and aggregated annual levels) as well as the spatial

statistics (site-to-site lag-zero correlations) very well. Also, the other deficit run characteristics

namely, the number of runs, the maximum run length, the mean run sum and the mean run length

are well preserved by the AMHMABB model.

Overall, the AMHMABB model (obtained from the proposed S-O framework) is able to show

better streamflow modeling performance when compared with the simulation based SMHMABB

model (which is an extension of the single-site HMABB model proposed by Srinivas and

Srinivasan, 2006).The improved performance of the proposed AMHMABB over the

SMHMABB model, is plausibly due to the significant role played by: (i) the objective functions

related to the preservation of deficit run sum, which drives (directs) the search effectively; (ii)
the huge hybrid model parameter space available for the search and (iii) the constraint on the

preservation of the inter-annual dependence. Further, the split-sample validation results indicate

that the AMHMABB model is able to perform well in both calibration and validation phases,

indicating that the parameters derived from the proposed S-O framework are robust in predicting

the statistical and the critical deficit characteristics of the multi-site multi-season streamflows

likely to occur in the uncertain future. Moreover, the proposed AMHMABB model is found to

perform better in preserving the statistical as well as the preservation of the critical deficit run

characteristics of the observed flows, when compared with the multi-site parametric

disaggregation model (MDM).

Limitations and/or Future extensions of the AMHMABB models

1. The model dose not generate values that are quite different from the observed flows and

hence only limited smoothing is achieved. One way of alleviating this issue is to use

suitable perturbation methods to the resampled multi-site residuals. The perturbations

could be restricted to each within year block or the entire residuals.

2. Only limited number of flows are generated beyond extremes.

3. Although the AMHMABB is automated and searches the multi-dimensional space

effectively, , it is computationally demanding. Parallel computing could be resorted in

order to reduce the computational time.

4. In place of the linear parametric model, non-linear regression based models or data-

driven models such as ANN can be adopted.


5. Instead of the matched block bootstrap for resampling the residuals, some of the other

bootstrap methods such as multisite maximum entropy bootstrap (Srivastav et al., 2014)

can be tried.

Acknowledgments

The authors wish to thank the Indian Institute of Technology Madras, Chennai, India for the

continuous support and facilities offered to carry out this research work. The help rendered by

Prof. V.V. Srinivas, Indian Institute of Science, Bengaluru, India through sharing some of the

computer codes for verifying and validating the stochastic model is gratefully acknowledged.

The constructive suggestions and the insightful comments of the anonymous reviewers, the

associate editor and the Editor, Andreas Bardossy, were helpful in improving the quality of the

manuscript. The first author wishes to thank VIT University for providing the required

computational facilities through RGEMS.

References

Azadivar, F., Tompkins, G., 1999. Simulation optimization with qualitative variables and

structural model changes - a genetic algorithm approach. European Journal of Operational

Research 113, 169-182.

Back, T., Schwefel, H. P., 1993. An overview of evolutionary algorithms for parameter

optimization. Evolutionary Computation 1 (1), 1-24.

Carlstein, E., Do, K. A., Hall, P., Hesterberg, T., Kunsch, H., 1998. Matched-block bootstrap for

dependent data. Bernoulli 4, 305-328.


Carson, Y., Maria, A., 1997. Simulation optimization: methods and applications. Proceedings of

the 1997 Winter Simulation Conference, 118-126.

Chen, B. S., Lee, B. K., Peng, S. C., 2002. Maximum likelihood parameter estimation of

FARIMA processes using the genetic algorithm in the frequency domain. IEEE Transactions on

Signal Processing 50 (9), 2208-2220.

Chen L, V.P. Singh, S.L. Guo, J.Z. Zhou, J.H. Zhang, 2015. Copula-based method for multisite

monthly and daily streamflow simulation. Journal of Hydrology, 528, pp. 369–384

Claps P., F. Murrone 1994, Optimal parameter estimation of conceptually-based streamflow

models by time series aggregation, Stochastic and Statistical Methods in Hydrology and

Environmental Engineering, Springer, Netherlands, pp. 421–434

Cortez, P., Rocha, M., Neves, J., 2004. Evolving time series forecasting ARMA models. Journal

of Heuristics 10 (4), 415-429.

Davison, A. C., Hinkley, D. V. (Eds.), 1997. Bootstrap methods and theirapplication. Cambridge

University Press, Cambridge.

Deb, K., Pratap, A., Agrawal, S., Meyarivan, T., 2002. A fast and elitist multi-objective genetic

algorithm NSGA-II. IEEE Transactions on Evolutionary Computation 6 (2), 182-197.

Efstratiadis, A., Dialynas, Y.G., Kozanis, S., Koutsoyiannis, D., 2014. A multivariate

stochastic model for the generation of synthetic time series at multiple time scales reproducing

long-term persistence, Environmental Modeling & Software 62, 139-152.

Fiering, J. D., 1964. Multivariate technique for synthetic hydrology. Journal of Hydraulic

Engineering Division, ASCE 90, 43-60.


Fonseca, C. M., Fleming, P. J., 1995. An overview of evolutionary algorithmsin multi-objective

optimization. Evolutionary Computation Journal 3 (1),1-16.

Grantz, K., Rajagopalan, B., Clark, M., Zagona, E., 2006. A technique for incorporating large-

scale climate information in basin-scale ensemble streamflow forecasts. Water Resources

Research 41, W10410.

Grygier, J. C., Stedinger, J. R., 1988. Condensed disaggregation procedures and conservation

corrections for stochastic hydrology. Water Resources Research 24 (10), 1574-1584.

Haltiner, J. P., Salas, J. D., 1988. Development and testing of a multivariate seasonal ARMA

(1,1) model. Journal of Hydrology 104, 247-272.

Harms, A. A., Campbell, T. H., 1967. An extension to the Thomas-Fiering model for the

sequential generation of streamflow. Water Resources Research 3 (3), 653-661.

Hipel, K. W., Mcleod, A. I., 1994. Time series modeling of water resources and environmental

systems. Elseveir Science.

Ilich, N., 2014. An effective three-step algorithm for multi-site generation of weekly

stochastic hydrologic time series. Hydrol. Sci. J. 59 (1), 85-98.

Ilich N, Despotovic J., 2008, A simple method for effective multi-site generation of stochastic

hydrologic time series. Stochastic Environ Res Risk Assess 22(2):265–279

Izzeldin, M., Murphy, A., 2000. Bootstrapping the small sample critical values of the rescaled

range statistic. The Economic and Social Review31 (4), 351-359.

Koutsoyiannis, D., 1999. A nonlinear disaggregation model with a reduced parameter set for

simulation of hydrologic series. Water Resources Research28 (12), 3175-3191.


Koutsoyiannis, D., 2000. A generalized mathematical framework for stochastic simulation and

forecast of hydrologic time series. Water Resources Research 36 (6), 1519-1533.

Lall, U., 1995. Recent advances in nonparametric function estimation. Reviews of Geophysics,

1093-1102.

Lall, U., Sharma, A., 1996. A nearest neighbour bootstrap for resampling hydrologic time series.

Water Resources Research 32 (3), 679693.

Lane, W. L. (Ed.), 1982. Corrected parameter estimates for disaggregation schemes. In V. P.

Singh, Statistical Analysis of Rainfall and Runoff. WaterResources Publications, Littleton,

Colorado.

Langousis, 2006. A stochastic methodology for generation of seasonal time series reproducing

over year scaling behavior. Journal of Hydrology322 (1-4), 138-154.

Lee, T., Salas, J. D., Prairie, J., 2010. An enhanced nonparametric streamflow disaggregation

model with genetic algorithm. Water Resources Research 46 (W08545),

doi:10.1029/2009WR007761.

Li, C. and Singh, V.P., 2014. A Multi-model Regression-sampling Algorithm for generating rich

streamflow scenarios, Water Resources Research, 50, 5958-5979.

Loon V A, G. Laaha (2015), Hydrological drought severity explained by climate and catchment

characteristics, J. Hydrol., 526, pp. 3–14

Marković, Đ., Plavšić, J., Ilich N, Ilić S (2015), Non-parametric Stochastic Generation of

Streamflow Series at Multiple Locations, Water Resource Management, 29: 4787.

doi:10.1007/s11269-015-1090-z
Matalas, N. C., 1967. Mathematical assessment of synthetic hydrology. WaterResources

Research 3 (4), 937-945.

Mejia, J. M., Rousselle, J., 1976. Disaggregation models in hydrology revisited. Water

Resources Research 13 (2), 679-693.

Minerva, T., Poli, I., 2001. Building ARMA models with genetic algorithms, lecture notes in

computer science. Lecture Notes in Computer Science 2037,335-342.

Moon, Y. I., Lall, U., 1994. Kernel function estimator for flood frequency analysis. Water

Resources Research 30 (11), 3095-3103.

Oliveira, G. C., Kelman, J., Pereira, M. V. F., Stedinger, J. R., 1988.A representation of spatial

correlations in large stochastic seasonal streamflow models. Water Resources Research 24 (5),

781-785.

Ong, C. S., Huang, J. J., Tzeng, G. H., 2005. Model identification of ARIMA family using

genetic algorithms. Applied Mathematics and Computation164 (3), 885-912.

Peng, P., Chen, Q., 2003. Improved genetic algorithm and application to ARMA modeling. SICE

Annual Conference in Fukui August 4-6.

Pereira, M. V. F., Oliveira, G. C., Costa, C. C. G., Kelman, J., 1984.Stochastic streamflow

models for hydroelectric systems. Water Resources Research 20 (3), 379-390.

Pierreval, H., Paris, J. L., 2000. Distributed evolutionary algorithms for simulation optimization.

IEEE Transaction on Systems, Man and Cybernetics 20 (11), 15-24.


Prairie, J., Rajagopalan, B., Lall, U., Fulp, T., 2007. A stochastic nonparametric technique for

space-time disaggregation of streamflows. Water Resources Research 43 (3).

Prairie, J., Rajagopalan, B., Fulp, T. J., Zagona, E. A., 2006. Modified k-NN model for stochastic

stream flow simulation. Journal of hydrology11 (4), 371-378.

Rajagopalan, B., Lall, U., 1999. A k-nearest-neighbour simulator for daily precipitation and

other weather variables. Water Resources research35 (10), 3089-3101.

Rasmussen, P. F., Salas, J. D., Fagherazzi, L., Rassam, J.-C., Bobee, B.,1996. Estimation and

validation of contemporaneous PARMA models for streamflow simulation. Water Resources

Research 32 (10), 3151-3160.

Rolf, S., Sprave, J., Urfer, W., 1997. Model identification and parameter estimation of ARMA

models by means of evolutionary algorithms. In Proc.of IEEE/IAFE Conf. Computational

Intelligence for Financial Engineering (CIFEr'97), 237-243.

Salas, J. D. (Ed.), 1993. Analysis and modeling of hydrologic time series, in Handbook of

Hydrology, edited by D. R. Maidment, McGraw Hill, NewYork.

Salas, J. D., Delleur, J. W., Yevjevich, V., Lane, W. (Eds.), 1980. Applied Modeling of

Hydrologic Time Series. Water Resources Publications, Littleton, CO, USA.

Salas, J. D., Guillermo, Q., III, T., Bartolini, P., 1985. Approaches to multivariate modeling of

water resources time series. Water Resources Research 21 (4), 683-708.

Santos, E. G., Salas, J. D., 1992. Stepwise disaggregation scheme for synthetic hydrology.

Journal of Hydraulic Engineering 118 (5), 765-784.


Sharma, A., Tarboton, D. G., Lall, U., 1997. Streamflow simulation: A nonparametric approach.

Water Resources Research 33 (2), 291-308.

Silverman, B. W., 1986. Density estimation for statistics and dataanalysis: monograph on

statistics and applied probablilty.

Singhrattna, N., Rajagopalan, B., Clark, M., Kumar, K. K., 2005.Forecasting Thailand summer

monsoon rainfall. International Journal of Climatology 25 (5), 649-664.

Srinivas, V. V., Srinivasan, K., 2000. Post-blackening approach for modeling dependent annual

streamflows. Journal of Hydrology 230 (1-2), 86-126.

Srinivas, V. V., Srinivasan, K., 2001. A hybrid stochastic model for multiseason streamflow

simulation. Water Resources Research 37 (10),2537-2549.

Srinivas, V.V., Srinivasan, K., 2005. Hybrid moving block bootstrap for stochastic simulation of

multi-site multi-season streamflows. J. Hydrol. 302 (1-4),307-330.

Srinivas, V. V., Srinivasan, K., 2006. Hybrid matched-block bootstrap for stochastic simulation

of multi-season streamflows. Journal of Hydrology329 (1-2), 1-15.

Srivastav, R.K., Simonovic, S.P., 2014. An analytical procedure for multi-site, multiseason

streamflow generation using maximum entropy bootstrapping, Environmental Modelling

Software, 59, 59-75.

Srivastav, R.K., Srinivasan, K., Sudheer, K.P., 2011. Simulation-Optimization framework

for multi-season hybrid stochastic models, Journal of Hydrology 404 (3-4), 209-225.
Stedinger, J. R., Pei, D. (Eds.), 1982. An annual-monthly streamflow model for incorporating

parameter uncertainty into reservoir simulation. TimeSeries Methods in Hydroscience, Dev.

Water Sci., Elsevier, New York.

Stedinger, J. R., Pei, D., Cohn, T. A., 1985. A condensed disaggregation model for incorporating

parameter uncertainty into monthly reservoir simulation. Water Resources Research 21 (5), 665-

675.

Stedinger, J. R., Vogel, R.M., 1984. Disaggregation procedures for generating serially correlated

flow vectors. Water Resources Research 20 (11), 47-56.

Sudheer K P, K. Srinivasan, T.R. Neelakantan, V.V. Srinivas (2008), A non-linear data-driven

model for synthetic generation of annual stream-flows, Hydrol. Process., 22 (12), pp. 1831–1845

Tarboton, D. G., Sharma, A., Lall, U., 1998. Disaggregation procedures for stochastic hydrology

based on nonparametric density estimation. Water Resources Research 34 (1), 107-119.

Tekin, E., Ihsan, S., 2004. Simulation optimization: A comprehensive review on theory and

applications. IIE Transactions 36 (11), 1067-1081.

Ünes, F., Demirci, M., Kişi, Ö. (2015). Prediction of Millers Ferry Dam reservoir level in USA

using Artificial Neural Network. Periodica Polytechnica Civil Engineering, 59 (3) 309-318.

Valencia, Schaake, 1973. Disaggregation processes in stochastic hydrology. Water Resources

Research 9 (3), 580-585.

Vinod, H.D., 2006.Maximumentropy ensembles for time series inference in economics,

J. Asian Econ. 17 (6), 955-978.


Vogel, R. M., Shallcross, A. L., 1996. 1996. The moving blocks bootstrap versus parametric time

series models 32 (6), 1875-1882.

Voss, M. S., Feng, X., 2002. ARMA model selection using particle swarm optimization and AIC

criteria. 15th TriennialWorld Congress 15th TriennialWorld Congress, Barcelona, Spain.

Yates, D., Gangopadhyay, S., Rajagopalan, B., Strzepek, K., 2003. A technique for generating

regional climate scenarios using a nearest neighbor bootstrap. Water Resources Research 39 (7),

1199.

Yevjevich, V. (Ed.), 1976. Stochastic Processes in Hydrology. Water Resources Publications,

Fort Collins, CO.

Zelenhasic, E.,2002. On the Extreme Streamflow Drought Analysis, Water Resources

Management, 16, 105-132.

Zelenhasic, E. and Salvai, A., 1987. A Method of Streamflow Drought Analysis, Water

Resources Research, 23(1), 156-168.


Figure 1: Simulation-Optimization Framework for Stochastic Modeling of
Multi-site Multi-season Streamflows
Figure 2: Location of Streamflow Stations - Colorado River Basin (source: Google Maps)

2.20

Pareto Front
2.18
Selected Solutions
A

2.16
R-RMSE in MARS

2.14

2.12
B

2.10

2.08
C

0.46 0.48 0.50 0.52 0.54 0.56 0.58 0.60


R-Bias in MARS

Figure 3: Pareto-front between R-Bias(MARS) and R-RMSE(MARS)


a)

Historical AMHMABB SMHMABB


2000
1600
Mean 1200
800
400
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
800
Standard Deviation

600
400
200
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
2.4
2.0
Skewness

1.6
1.2
0.8
0.4
0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month

b)

Historical AMHMABB SMHMABB


6000
5000
4000
Mean

3000
2000
1000
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Standard Deviation

2000
1600
1200
800
400
0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
2.4
2.0
Skewness

1.6
1.2
0.8
0.4
0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month

Figure 4: Reproduction of Summary Statistics - A Comparison between AMHMABB and


SMHMABB Models (flow units: Mm3/month): a) at site 2; b) at site 4
a)
1.0 1.0
Site 2 Site 2

0.8 0.8
Lag1 Correlation

Lag2 Correlation
0.6 0.6

0.4 0.4

0.2 0.2
Historical Historical
AMHMABB AMHMABB
SMHMABB SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Month Month
1.0 1.0
Historical Site 2 Historical Site 2
AMHMABB AMHMABB
SMHMABB SMHMABB
0.8 0.8
Lag3 Correlation

Lag4 Correlation
0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Month Month

b)
1.0 1.0
Historical
Site 4 AMHMABB
Site 4
SMHMABB
0.8 0.8
Lag 2 Correlation
Lag 1 Correlation

0.6 0.6

0.4 0.4

0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month
1.0 1.0
Historical Site 4 Historical Site 4
AMHMABB AMHMABB
SMHMABB SMHMABB
0.8 0.8
Lag 4 Correlation
Lag 3 Correlation

0.6 0.6

0.4 0.4

0.2 0.2

0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Month Month

Figure 5: Preservation of Serial Correlations - A Comparison between AMHMABB and


SMHMABB Models: a) at site 2; b) at site 4

a)
0.85
Site 1 to Site 3
0.80

0.75

0.70

0.65

Lag 0
0.60

0.55

0.50

0.45 Hist
SMHMABB
0.40 AMHMABB

Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month

b) 1.00
Site 1 to Site 4
0.98

0.96

0.94

0.92

0.90
Lag 0

0.88

0.86

0.84

0.82

0.80 Hist
0.78 SMHMABB
AMHMABB-A
0.76
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month

c) 1.00
Site 2 to Site 4
0.95

0.90

0.85
Lag 0

0.80

0.75

0.70 Hist
SMHMABB
AMHMABB
0.65
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month

Figure 6: Preservation of Lag-zero site-to-site monthly correlations - A Comparison between


AMHMABB and SMHMABB Models: a) site 1 to site 3; b) site 1 to site 4; c) site 2 to site 4
1.0 1.0

a) Historical

ABOVE & BACKWARD CORRELATIONS


ABOVE & FORWARD CORRELATIONS
AMHMABB
0.8 0.8 SMHMABB

0.6 0.6

0.4 0.4

0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Month Month
1.0 1.0

BELOW & BACKWARD CORRELATIONS


BELOW & FORWARD CORRELATIONS

0.8
0.8

0.6
0.6

0.4

0.4
0.2

Historical 0.2
0.0 Historical
AMHMABB AMHMABB
SMHMABB SMHMABB
-0.2 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month

1.0 1.0
b) Historical
ABOVE & BACKWARD CORRELATIONS

AMHMABB
ABOVE & FORWARD CORRELATIONS

SMHMABB
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

Month Month
1.0 1.0
Historical
BELOW & BACKWARD CORRELATIONS

AMHMABB
BELOW & FORWARD CORRELATIONS

SMHMABB
0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2
Historical
AMHMABB
SMHMABB
0.0 0.0
Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep
Month Month

Figure 7: Preservation of state-dependent correlations - A Comparison between


AMHMABB and SMHMABB Models: a) at site 2; b) at site 4
Figure 8: Preservation of the marginal Distribution of the July month streamflows at site 1 - A Comparison between AMHMABB and
SMHMABB Models (flow units: Mm3/month)
Figure 9: Preservation of the marginal Distribution of the December month streamflows at site 1 - A Comparison between
AMHMABB and SMHMABB Models (flow units: Mm3/month)
Figure 10: Preservation of the marginal Distribution of the August month streamflows at site 2 - A Comparison between AMHMABB
and SMHMABB Models (flow units: Mm3/month)
Figure 11: Preservation of a) minimum flows and b) maximum flows at site 4 -
Model: AMHMABB (flow units: Mm3/month)
Figure 12: Preservation of Deficit Run (Drought) Characteristics - A Comparison between
AMHMABB and SMHMABB Models
(MARL: maximum run length; MARS: maximum run sum; MERL: mean run length; MERS: mean run sum)
Comparison to Disaggregation Model

Fig 13: Reproduction of Summary Statistics for Colorado River Basin at Site 4 - A comparison
between AMHMABB model and Disaggregation model (Flow units - Mm3/month)
Fig 14: Preservation of Serial Correlations for Colorado River Basin at Site 4 - A comparison
between AMHMABB model and Disaggregation model
Fig 15: Preservation of State-dependent Correlations for Colorado River Basin at Site 4 - A
comparison between AMHMABB model and Disaggregation model
Fig 16: Preservation of Marginal Distribution of March and June month flows at Site 4 for
Colorado River Basin - A comparison between Disaggregation model (MDM) and AMHMABB
models (Flow units - Mm3/month)
Fig 17: Preservation of multi-site drought characteristics (a) Number of runs; (b) Maximum Run
Length; (c) Maximum Run Sum for Colorado River Basin - A comparison between
Disaggregation model (MDM) and AMHMABB models
Split-Sample Validation

Fig 18: Reproduction of Summary Statistics for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB
Fig 19: Reproduction of Summary Statistics for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 20: Preservation of Serial Correlations for Colorado River Basin (Calibration) at Site 4 -
Model: AMHMABB

Fig 21: Preservation of Serial Correlations for Colorado River Basin (Validation) at Site 4 -
Model: AMHMABB
Fig 22: Preservation of Marginal Distribution of the February and April month flows for
Colorado River Basin (Calibration and Validation) at Site 4 - Model: AMHMABB
35000 30000

30000 Historical Historical


25000 AMHMABB -Validation
AMHMABB -Calibration
Maximum Run Sum

Maximum Run Sum


25000
20000
20000
15000
15000
10000
10000

5000 5000

0 0
40 50 60 70 80 90 100 110 40 50 60 70 80 90 100 110
Threshold Threshold

100 Historical 60
AMHMABB -Calibration Historical
80 AMHMABB -Validation
45
Number of Runs

Number of Runs
60
30

40
15

20
0
0
50 60 70 80 90 100 40 50 60 70 80 90 100 110
Threshold Threshold

21 18
Historical Historical
AMHMABB -Calibration
Maximum Run Length

Maximum Run Length

AMHMABB -Validation
14
12

7
6

0
40 60 80 100 50 60 70 80 90 100
Threshold Threshold

Fig 23: Preservation of multi-site drought characteristics Maximum Run Sum, Number of runs
and Maximum Run Length for Colorado River Basin (Calibration and Validation) - Model:
AMHMABB
Table 1: Location of the selected stations for the multi-site multi-season flow modeling

River (USGS Location of Record Referred as


Station Number) the station Duration in this study
Colorado River Near Cisco,
1906-2007 Site 1
(9180500) Utah
Green River
Utah 1906-2007 Site 2
(9315000)
San Juan River
Bluff, Utah 1906-2007 Site 3
(9379500)
Colorado River Lees Ferry, Site 4
1906-2007
(09380000) Arizona (Key Site)
Table 2: Parameters for the selected multi-site HMABB models SMHMABB and AMHMABB - Colorado River Basin

Non-Parametric
Parametric Component Component
Site Para- Window Block
Number Model meters Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Model Size Size
SMHMABB
MABB 9 4,4,4
Site 1 PAR(1) 0.6247 0.7778 0.8548 0.7492 0.6710 0.5147 0.4539 0.5107 0.6178 0.8515 0.8421 0.6764
Site 2 PAR(1) 0.6515 0.8395 0.7528 0.5319 0.4871 0.3135 0.4968 0.6179 0.5903 0.7946 0.8152 0.7117

Site 3 PAR(1) 0.3054 0.7107 0.7641 0.5903 0.5503 0.5681 0.6455 0.7505 0.7589 0.8379 0.4850 0.5154

Site 4 PAR(1) 0.5104 0.7558 0.8249 0.6546 0.5492 0.4679 0.4507 0.5917 0.6252 0.8365 0.7860 0.6437
AMHMABB
MABB 5 6,6
Site 1 PAR(1) 0.9279 0.9500 0.9426 0.9500 0.9389 0.2865 0.6330 0.4708 0.3049 0.5556 0.0800 0.0505

Site 2 PAR(1) 0.9500 0.9500 0.9500 0.9316 0.9058 0.0395 0.7546 0.9389 0.1243 0.9095 0.4081 0.4671
Site 3 PAR(1) 0.1132 0.3823 0.2607 0.5851 0.1833 0.5224 0.4782 0.4671 0.0948 0.2127 0.1464 0.3897
Site 4 PAR(1) 0.9389 0.9463 0.9389 0.9389 0.9353 0.3270 0.7141 0.8357 0.9242 0.9095 0.7952 0.8468
are the first-order periodic (monthly) autoregressive parameters of the hybrid models
Table 3: Reproduction of the Aggregated Annual Flow Statistics - Comparison between
SMHMABB and AMHMABB models (values in parentheses denote the standard deviation over
300 replicates)

Aggregated Annual Flow Statistics


Mean Standard Skewness Lag 1
Deviation Correlation
Site 1
Historical 8378.9 2413.8 0.25 0.292
SMHMABB 8377.3 2266.4 0.26 0.029
(239.2) (148.4) (0.2) (0.098)
AMHMABB 8382.7 2405.4 0.32 0.229
(293.1) (179.6) (0.22) (0.098)
Site 2
Historical 6642.9 2010.7 0.35 0.291
SMHMABB 6645.6 1877.2 0.27 0.037
(199.8) (125.7) (0.19) (0.099)
AMHMABB 6620.3 1916.6 0.23 0.242
(234.0) (138.9) (0.17) (0.095)
Site 3
Historical 2638.2 1079.4 0.36 0.113
SMHMABB 2632.4 936.8 0.38 0.007
(99.4) (65.3) (0.19) (0.097)
AMHMABB 2632.2 977.2 0.31 0.130
(115.8) (59.8) (0.19) (0.102)
Site 4
Historical 18504.9 5345.6 0.16 0.275
SMHMABB 18500 4985.4 0.18 0.027
(523.2) (314.4) (0.18) (0.098)
AMHMABB 18498.1 5579.6 0.16 0.251
(703.2) (375.1) (0.19) (0.095)

76
Highlight Points:

 Simulation-optimization model for hybrid multi-site multi-season streamflows


 Extended the single-site hybrid matched block bootstrap model to multi-site multi-season
 Better performance due to deficit sum based objective functions and inter-annual
dependence constraint
 Efficacy of the proposed model illustrated by a case example of Colorado river basin

77

S-ar putea să vă placă și