Documente Academic
Documente Profesional
Documente Cultură
*Department of Civil Engineering and Applied Mechanics, McGill University, 817 Sherbrooke
Street West, Montreal, Quebec, Canada H3A 2K6; PH 514-398-6870; van.tv.nguyen@mcgill.ca
Downloaded from ascelibrary.org by GADJAH MADA UNIVERSITY on 03/08/14. Copyright ASCE. For personal use only; all rights reserved.
**Environment Canada, 100 Alexis-Nihon Boulevard, 3rd Floor, Saint-Laurent, Quebec, Canada
H4M 2N8; PH 514-283-3052; alain.bourque@ec.gc.ca
Abstract
This paper presents an assessment procedure for evaluating systematically the performance of
various probability models in order to identify the most suitable distribution that could provide
accurate extreme rainfall estimates. More specifically, nine popular probability distributions,
Beta-K, Beta-P, Generalized Extreme Value (GEV), Generalized Normal (GNO), Generalized
Pareto, Gumbel, Log-Pearson Type III, Pearson Type III, and Wakeby, were examined and
compared for their descriptive and predictive abilities in the estimation of annual maximum
precipitations. The suggested procedure was applied to 5-minute and 1-hour annual maximum
precipitation data from a network of 20 raingages located in the southern Quebec region in
Canada. The methods of maximum likelihood and L-moments were used to estimate the
parameters of these distributions. Results based on numerical and graphical goodness-of-fit
criteria have indicated that the Wakeby, GEV, and GNO models were the best models for
describing the distribution of annual maximum precipitations in the southern Quebec region in
Canada. Furthermore, it was found that no single distribution ranked best at every station for
both rainfall durations. Additional bootstrap method was performed to evaluate the model ability
at predicting extreme right-tail behaviour. The results are similar to the outcome of the goodness-
of-fit tests. The GEV distribution, however, was preferred to the Wakeby and GNO because it
requires a simpler parameter estimation method and it was based on a more solid theoretical
basis for representing the distribution of extreme random variables. Therefore, among the nine
candidate distributions considered, the GEV could be recommended as the most suitable model
for describing the distribution of annual maximum precipitations in southern Quebec.
Introduction
Rainfall frequency analysis studies are necessary for the development of a “design storm”; that
is, a rainfall temporal pattern used in the design of a hydraulic structure. The objective of rainfall
frequency analyses is to estimate the amount of rainfall falling at a given point or over a given
area for a specified duration and return period. The rainfall data used for frequency analysis are
typically available in the form of annual maximal series (AMS). These series contain the largest
rainfall in each complete year of record. An alternative data format for rainfall frequency studies
is “partial duration series” (PDS) (also referred to as peaks over threshold data) which consist of
all large precipitation amounts above certain thresholds selected for different durations.
Arguments in favor of either of these techniques are well described in the literature. Due to its
an appropriate model depends mainly on the characteristics of available rainfall data at the
particular site. Therefore, in the present study, a general assessment procedure is suggested to
compare the performance of different probability distributions in order to identify the best model
that could provide the most accurate extreme rainfall estimates. More specifically, nine popular
probability distributions, Beta-K (BEK), Beta-P (BEP), Generalized Extreme Value (GEV),
Generalized Normal (GNO), Generalized Pareto (GPA), Gumbel (GUM), Log-Pearson Type III
(LP3), Pearson Type III (P3), and Wakeby (WAK), are examined and compared in terms of their
descriptive and predictive abilities to represent the distribution of annual maximum rainfalls.
Graphical and numerical comparisons were used to judge the performance of the probability
distributions according to their degree of overall fit to the data, their degree of fit on the right-
tail, the accuracy of their right-tail extrapolations, and their overall computational facility.
Results of these comparisons based on annual maximum rainfall data for 5-minute and 1-hour
durations available in the southern Quebec region in Canada have indicated that the Wakeby,
GNO and GEV are the best overall choices for representing extreme rainfall distributions. The
GEV, however, could be recommended as the most suitable distribution due to its relatively
simple parameter estimation and its sound theoretical basis for representing distribution of
extreme random variables.
Methodology
distributions.
Graphical display
Graphical techniques such as the quantile-quantile (Q-Q) plots can be used to visually assess the
goodness of fit of a fitted distribution. To construct the Q-Q plot, the observed data xi are ranked
in ascending order, and denoted from x1:n to xn:n , where n is the total number of observations. In
addition, an empirical non-exceedance probability pi:n is computed for each xi:n using the
Cunnane (1978) plotting position formula that yields approximately unbiased quantiles for a
wide range of distributions:
in which i is the rank associated with each observation. Furthermore, each x i:n is paired with
yi:n , which is computed from the assumed theoretical cumulative distribution function (CDF),
F (x) , that is yi:n = F −1 ( pi:n ) . Finally, the set of quantiles ( x i:n , yi:n ) is plotted on a graph with the
1:1 straight line extending from the origin. Therefore, if the assumed CDF is the true distribution,
all points should fall on this 1:1 straight line.
Numerical criteria
The Q-Q plots are useful for a visual assessment of the good fit of a distribution to the data.
However, this assessment is subjective. Hence, numerical goodness-of-fit criteria are necessary
to provide an objective comparison of the adequacy of various distributions. In the present study,
the following four common criteria are selected:
The root mean square error (RMSE) is the sum of squares of the differences between
observed and computed values:
RMSE = [∑ ( x − y )
i i
2
/(n − m) ]
1/ 2
(2)
where xi , i = 1,K, n , are the observed values and yi , i = 1,K, n are the values computed from
an assumed probability distribution for the same probability level, and m is the number of
distribution parameters.
The relative root mean square error (RRMSE) is defined as:
The magnitude of RRMSE tends to decrease when the sample size increases (Yu et al., 1994).
The RRMSE and RMSE criteria are different in that the latter gives heavy weighting to large
values:
The correlation coefficient (CC) measures the linearity of the probability plot. It has a
range between –1 and +1. Values near ±1 suggest that the observation could have been drawn
from the fitted distribution. A positive sign indicates an upward slope and a negative sign
indicates a downward sloping curve. The CC is defined mathematically as:
CC = ∑ [( x i − x )( y i − y )] [∑ ( x i − x) 2 ∑ ( yi − y) 2 ]
1/ 2
(5)
where x and y denote the average value of the observations and fitted quantiles, respectively.
In the present study, to test the proposed assessment procedure, annual maximum rainfall series
for 5-minute and 1-hour durations from a network of 20 raingages located in the southern
Quebec region in Canada were chosen. The selection of these data series was based on the
quality of the data, the adequate length of available historical rainfall records, and the
representative spatial distribution of raingages. More specifically, to ensure the quality of data,
only data from recording raingages under the supervision of the Atmospheric Environmental
Service of Environment Canada were used. In addition, each selected station has more than 24
years of historical records to provide reliable estimates of rainfall quantiles for practical
application purposes (National Research Council of Canada, 1989). Furthermore, these raingages
were selected to represent different climatic conditions in the southern Quebec region. Finally,
under-estimated, or well estimated by any of the distributions. However, it was found that the
Wakeby was superior to other distributions at fitting all regions of the data sample due to its
flexibility with 5 parameters. For purposes of illustration, Figure 1 shows the graphical
comparison of the goodness of fit of all nine distributions for 1-hour extreme rainfalls at McGill
station using the Q-Q plots. From the visual inspection, all distributions seem to perform well in
this case. In particular, the Wakeby shows the best fit for all regions of the data set. Nevertheless,
it is difficult to judge the significance of the differences between distributions based on the
graphical display since these differences are quite small. A more objective assessment using
numerical comparison criteria is hence necessary.
75 75 75
Fitted Precipitation (mm)
60 60 60
45 45 45
30 30 30
15 15 15
0 0 0
0 15 30 45 60 75 90 0 15 30 45 60 75 90 0 15 30 45 60 75 90
Observed Precipitation (mm) Observed Precipitation (mm) Observed Precipitation (mm)
75 75 75
Fitted Precipitation (mm)
60 60 60
45 45 45
30 30 30
15 15 15
0 0 0
0 15 30 45 60 75 90 0 15 30 45 60 75 90 0 15 30 45 60 75 90
Observed Precipitation (mm) Observed Precipitation (mm) Observed Precipitation (mm)
75 75 75
Fitted Precipitation (mm)
60 60 60
45 45 45
30 30 30
15 15 15
0 0 0
0 15 30 45 60 75 90 0 15 30 45 60 75 90 0 15 30 45 60 75 90
Observed Precipitation (mm) Observed Precipitation (mm) Observed Precipitation (mm)
ranking result for all nine distributions for 5-minute and 1-hour rainfall durations. Tie cases are
indicated by parentheses.
Examination of the goodness-of-fit results using numerical comparisons for each of the
20 raingages reveals that no unique distribution ranked consistently best at all locations and for
both rainfall durations (Tao, 2001). However, the overall rank for the 20 stations combined
shows that the Wakeby model with 5 parameters was the best for describing the distribution of
annual maximum rainfalls for both 5-minute and 1-hour durations (Table 2). The GEV and GNO
distributions also performed quite well overall. The LP3, Gumbel and GPA distributions ranked
consistently poorly as compared to the others.
After assessing the overall descriptive ability of each distribution, the next step is focused
on its predictive capability since this property is important for engineering design applications. In
this study, one thousand bootstrap samples of size approximately equal to half of the actual
sample size were generated. Each candidate distribution was fitted to the bootstrap samples and
was extrapolated to estimate the right-tail quantiles corresponding to the four largest observed
precipitation amounts in the full data set. The variability in the estimation of these extrapolated
quantiles was presented in the form of modified box plots. The size of the box indicates the
robustness of each distribution’s extrapolative ability. Large box widths or whiskers imply high
uncertainty in the estimation of these extreme values. If the observed values fall outside the box,
then the distribution fitted to the bootstrap samples has overestimated or underestimated the true
values and is therefore not commendable.
Overall, the BEK, BEP and LP3 distributions gave consistently the worst performance
with large sampling variation and bias for both rainfall durations (Tao, 2001). In addition, while
the GUM distribution exhibited the lowest sample variation in most cases, it tended to
overestimate or underestimate the observed values most frequently. The GEV, GNO, GPA, P3,
and WAK distributions produced satisfactory results at most stations where the box enclosed the
24 (32.5) 24 24
20 20 20
16 16 16
12 12 12
8 8 8
24 24 24
20 20 20
16 16 16
12 12 12
8 8 8
24 24 24
20 20 20
16 16 16
12 12 12
8 8 8
Figure 2: Box plots of extrapolated right-tail bootstrap data for 5-minute annual maximum
precipitations at McGill station.
In general, it was observed that no single distribution performed the best at all stations
(Tao, 2001). This could be expected because of the spatial variation of precipitation
characteristics, possibly due to the different climate conditions within the southern Quebec
region (Proulx et al., 1987). While it is difficult to provide a clear physical interpretation of the
regional variation of estimated distribution parameters based on the climate variability in the
study region, one could however rely on the proposed assessment procedure to be able to identify
the WAK, GEV and GNO models as the top ranked distributions for a large number of stations
considered for both 5-minute and 1-hour rainfall durations. However, it can be seen that it is
easier to recognize those distributions that do not perform well, but it is more difficult to identify
the best distribution. Other criteria should be thus considered in the choice of an appropriate
distribution for a given area. For instance, on the basis of computational simplicity, among the
three best models identified above, the estimation of the GEV parameters is the simplest as
compared to the estimation of the parameters of the GNO and WAK distributions. In addition,
the GEV model is based on a more solid theoretical basis than the other two distributions
because it was derived from the statistical theory of extreme random variables. Therefore, the
GEV could be considered as the most suitable distribution for describing the distribution of
annual maximum rainfalls in southern Quebec.
The following general conclusions can be drawn from the present study:
a) A general procedure was proposed for evaluating systematically the performance of various
Downloaded from ascelibrary.org by GADJAH MADA UNIVERSITY on 03/08/14. Copyright ASCE. For personal use only; all rights reserved.
probability distributions in order to find the most suitable model for representing the
distribution of annual extreme rainfalls. The suggested procedure was based on a number
of graphical and numerical criteria to assess the descriptive and predictive abilities of each
distribution. More specifically, for evaluation of the goodness of fit of a model to the data,
one can rely on the visual inspection of the quantile-quantile plots as well as on the results
of four numerical assessment criteria, including the root mean square error, the relative root
mean square error, the maximum absolute error, and the correlation coefficient. For model
prediction capability, the assessment was supported by the bootstrap sampling
computations and the visual display of the results using the box plots.
b) Following a review of various probability distributions available in the literature, nine
popular models were selected for evaluation in this study. These models include the BEK,
BEP, GEV, GNO, GPA, GUM, LP3, P3, and WAK distributions. Based on the annual
extreme precipitation series available in southern Quebec, the proposed assessment
procedure has been successfully used to identify the best probability distributions that
could provide accurate maximum precipitation estimates. In particular, it was found that,
among the nine popular distributions considered, the WAK, GEV, and GNO provided the
best performance for different rainfall durations and for a number of locations in the region.
However, for practical application purposes, the GEV was preferable to the GNO and
WAK due to its more solid theoretical basis and relatively simpler parameter estimation
method. Therefore, the GEV could be considered as the most suitable model for
representing the distribution of annual maximum precipitations for southern Quebec.
References
Chow, V.T. (1964). Handbook of Applied Hydrology, McGraw-Hill, New York, NY, USA.
Cunnane, C. (1978). Unbiased Plotting Positions-A Review, Journal of Hydrology, 37: 205-222.
Efron, B. (1993). An Introduction to the Bootstrap, Chapman and Hall, New York, NY, USA.
Hosking, J.R.M. (1990). L-Moments: Analysis and Estimation of Distributions Using Linear
Combinations of Order Statistic, Journal of the Royal Statistical Society, Serial B, 52: 105-
124.
Hosking, J.R.M. and Wallis, J.R. (1997). Regional Frequency Analysis: An Approach Based on
L-Moments, Cambridge University Press, Cambridge, United Kingdom.
Kite, G.W. (1977). Frequency and Risk Analyses in Hydrology, Water Resources Publications,
Fort Colling, Colorado, USA.
Mielke, P.W. Jr. and Johnson, E.S. (1974). Some Generalized Beta Distributions of the Second
Kind Having Desirable Application Features in Hydrology and Meteorology, Water
Resources Research, 10: 223-226.
National Research Council of Canada (1989). Hydrology of floods in Canada: a guide to
planning and design, Ottawa, 245 pp.
18.66.
Tao, D.Q. (2001). Statistical Modelling of Extreme Precipitations for Southern Quebec, Master
of Engineering Project Report, Department of Civil Engineering and Applied Mechanics,
McGill University, Montreal, Quebec, Canada.
Vogel, R.M. (1995). Recent Advances and Themes in Hydrology, Reviews of Geophysics,
Supplement B, 33: 933-936.
Wilks, D.S. (1993). Comparison of Three-Parameter Probability Distributions for Representing
Annual Extreme and Partial Duration Precipitation Series, Water Resources Research, 29:
3543-3549.
Yu, F.X., Naghavi, B., Singh, V,P., and Wang, G.T. (1994). MMO: An Improved Estimator for
Log-Pearson Type III Distribution, Stochastic Hydrology and Hydraulics, 8: 219-231.
10