Sunteți pe pagina 1din 12

Environmental Modelling & Software 22 (2007) 464e475

www.elsevier.com/locate/envsoft

Assessment of surface water quality using multivariate


statistical techniques: A case study of the Fuji river basin, Japan
S. Shrestha*, F. Kazama
Department of Ecosocial System Engineering, Interdisciplinary Graduate School of Medicine and Engineering,
University of Yamanashi, 4-3-11, Takeda, Kofu, Yamanashi 400-8511, Japan
Received 4 August 2005; received in revised form 20 December 2005; accepted 1 February 2006
Available online 22 March 2006

Abstract
Multivariate statistical techniques, such as cluster analysis (CA), principal component analysis (PCA), factor analysis (FA) and discriminant
analysis (DA), were applied for the evaluation of temporal/spatial variations and the interpretation of a large complex water quality data set of
the Fuji river basin, generated during 8 years (1995e2002) monitoring of 12 parameters at 13 different sites (14 976 observations). Hierarchical
cluster analysis grouped 13 sampling sites into three clusters, i.e., relatively less polluted (LP), medium polluted (MP) and highly polluted (HP)
sites, based on the similarity of water quality characteristics. Factor analysis/principal component analysis, applied to the data sets of the three
different groups obtained from cluster analysis, resulted in five, five and three latent factors explaining 73.18, 77.61 and 65.39% of the total
variance in water quality data sets of LP, MP and HP areas, respectively. The varifactors obtained from factor analysis indicate that the parameters responsible for water quality variations are mainly related to discharge and temperature (natural), organic pollution (point source: domestic
wastewater) in relatively less polluted areas; organic pollution (point source: domestic wastewater) and nutrients (non-point sources: agriculture
and orchard plantations) in medium polluted areas; and organic pollution and nutrients (point sources: domestic wastewater, wastewater treatment plants and industries) in highly polluted areas in the basin. Discriminant analysis gave the best results for both spatial and temporal analysis. It provided an important data reduction as it uses only six parameters (discharge, temperature, dissolved oxygen, biochemical oxygen
demand, electrical conductivity and nitrate nitrogen), affording more than 85% correct assignations in temporal analysis, and seven parameters
(discharge, temperature, biochemical oxygen demand, pH, electrical conductivity, nitrate nitrogen and ammonical nitrogen), affording more than
81% correct assignations in spatial analysis, of three different sampling sites of the basin. Therefore, DA allowed a reduction in the dimensionality of the large data set, delineating a few indicator parameters responsible for large variations in water quality. Thus, this study illustrates the
usefulness of multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment, identification of pollution sources/factors and understanding temporal/spatial variations in water quality for effective river water quality management.
2006 Elsevier Ltd. All rights reserved.
Keywords: Fuji river basin; Water quality; Cluster analysis; Principal component analysis; Factor analysis; Discriminant analysis

1. Introduction
A river is a system comprising both the main course and the
tributaries, carrying the one-way flow of a significant load of
matter in dissolved and particulate phases from both natural
and anthropogenic sources. The quality of a river at any point

* Corresponding author. Tel./fax: 81 55 220 8193.


E-mail address: sangam@ccn.yamanashi.ac.jp (S. Shrestha).
1364-8152/$ - see front matter 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.envsoft.2006.02.001

reflects several major influences, including the lithology of the


basin, atmospheric inputs, climatic conditions and anthropogenic inputs (Bricker and Jones, 1995). On the other hand,
rivers play a major role in assimilation or transporting municipal and industrial wastewater and runoff from agricultural
land. Municipal and industrial wastewater discharge constitutes a constant polluting source, whereas surface runoff is
a seasonal phenomenon, largely affected by climate within
the basin (Singh et al., 2004). Seasonal variations in precipitation, surface runoff, interflow, groundwater flow and pumped

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

different multivariate statistical techniques to extract information about the similarities or dissimilarities between sampling
sites, identification of water quality variables responsible for
spatial and temporal variations in river water quality, the hidden factors explaining the structure of the database, and the influence of possible sources (natural and anthropogenic) on the
water quality parameters of the Fuji river basin.

in and outflows have a strong effect on river discharge and,


subsequently, on the concentration of pollutants in river water
(Vega et al., 1998). Therefore, the effective, long-term management of rivers requires a fundamental understanding of hydro-morphological, chemical and biological characteristics.
However, due to spatial and temporal variations in water quality (which are often difficult to interpret), a monitoring program, providing a representative and reliable estimation of
the quality of surface waters, is necessary (Dixon and Chiswell, 1996).
The application of different multivariate statistical techniques, such as cluster analysis (CA), principal component
analysis (PCA), factor analysis (FA) and discriminant analysis
(DA), helps in the interpretation of complex data matrices to
better understand the water quality and ecological status of
the studied systems, allows the identification of possible factors/sources that influence water systems and offers a valuable
tool for reliable management of water resources as well as
rapid solution to pollution problems (Vega et al., 1998; Lee
et al., 2001; Adams et al., 2001; Wunderlin et al., 2001; Reghunath et al., 2002; Simeonova et al., 2003; Simeonov
et al., 2004). Multivariate statistical techniques has been applied to characterize and evaluate surface and freshwater quality, and it is useful in verifying temporal and spatial variations
caused by natural and anthropogenic factors linked to seasonality (Helena et al., 2000; Singh et al., 2004, 2005).
In the present study, a large data matrix, obtained during an
8-year (1995e2002) monitoring program, is subjected to

13810

2. Methods
2.1. Study area
Fuji river basin study area, drained by the Fuji River, is located in the central part of Japan (Fig. 1). The basin area is 3570 km2 and the main stream
length is 128 km. The Fuji River is located to the west of Mount Fuji, drawing
a curve along the mountain, and is one of three prominent rapid watercourses
in Japan. The river originates, as the Kamanashi River, from Mount Komagatake in the north of the Southern Alps, and as the Fuefuki River from the north
of Yamanashi Prefecture. These two rivers flow together in the south of the
Kofu basin as the Fuji River and, subsequently, flow into the Pacific Ocean
at Suruga Bay. The average flow of the Kamanashi River at Funayamabashi
is w10 m3/s, the Fuefuki River at Torinkyo is w20 m3/s and the flow of the
Fuji River at Fujibashi is w72 m3/s. These rivers drain the major rural, agricultural, urban and industrial areas of Yamanashi Prefecture and discharge into
Suruga Bay. The river, during its course of w128 km, receives a pollution load
from both point and non-point sources. The Fuji River is a major source for
agriculture and industrial activities located in downstream areas. The geological features of the basin are very complex and fragile. This is because a massive dislocation, called the Itoi RivereShizuoka Tectonic Line, runs under it
in a north/south direction; there are also many other dislocations runs through
and across the area. As a result, there are many collapsed areas and collapsed

13820

13830

13850

3550

io
Sh
R.

hi
as
an
am

Japan

13840

R.A
r

K
R.

3550

465

Yamanashi
4 Isawa
Enzan
Ryuou
5
6
Kofu 8
2
7
9
Minami-alps 3
10
11
12
R. Ashi
1

3540

3540

ki
efu
Fu
R.

Nakatomi

R. Fuji

Sampling stations
1. Funayamabashi
2. Singenbashi
3. Sangunnishibashi
3530
4. Sakurabashi
5. Kikkobashi
6. Omogawabashi
7. Hikawabashi
8. Ukaibashi
3520
9. Futagawabashi
10. Torinkyo
11. Sangunhigashibashi
12. Fujibashi
13. Nanbubashi

3530

Minobu
3520

13
Wastewater
treatment plan
13810

13820

20 Kilometers

Suruga bay
13830

13840

13850

Fig. 1. Map of study area and surface water quality monitoring stations (listed 1e13) in the Fuji river basin.

466

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

rocks and sands, transported by river water, accumulate in gentle flow areas.
More than 75% of the basin area is covered by forestry. Forest land is located
mostly in mountainous areas, whereas agriculture and grassland areas are
sparsely distributed throughout the basin. Orchard plantations and urban areas
are mostly situated along the water bodies. The basin lies in an inland region
and, therefore, has extreme variations in temperature between summer and
winter. Summers are hot and humid, and winters are cold, with average temperatures of 26 and 3  C, respectively. Annual rainfall in the Kofu basin is as
little as 1200 mm, but in the middle and lower reaches it may be as high as
3000 mm. The whole basin receives a mean annual precipitation of
w2100 mm.
The Ministry of Land Infrastructure and Transport (MLIT) of Japan operates and maintains stream-flow gauging stations and The Environment Division of Yamanashi Prefecture (EDYP) has been collecting various water
quality parameters from 50 water quality monitoring stations. However, in
the present study, data from only 13 stations were selected under the river water quality monitoring network, which covers a wide range of catchments and
surface water types (rivers, streams, and tributaries).

2.2. Monitored parameters and analytical methods


The data sets of 13 water quality monitoring stations, comprising 12 water
quality parameters monitored monthly over 8 years (1995e2002), were obtained from the Environment Division of Yamanashi Prefecture (EDYP). Although there are more than 50 water quality parameters available, only 12
parameters are selected due to their continuity in measurement at all selected
water quality monitoring stations. The selected water quality parameters includes discharge, water temperature, dissolved oxygen, 5-day biochemical oxygen demand, chemical oxygen demand (manganese), pH, total suspended
solids, electrical conductivity, total coliforms, nitrate nitrogen, ammonical nitrogen and inorganic dissolved phosphorus. The water quality parameters, their
units and methods of analysis are summarized in Table 1. The EDYP has sampled, preserved and analyzed all the water quality parameters as per Japan Industrial Standards (EDYP, 2004). The basic statistics of the monthly measured,
8-year data set on river water quality are summarized in Table 2.

2.3. Data treatment and multivariate statistical methods


The KolmogoroveSmirnov (KeS) statistics were used to test the goodnessof-fit of the data to log-normal distribution. According to the KeS test, all
the variables are log-normally distributed with 95% or higher confidence. Similarly, to examine the suitability of the data for principal component analysis/
factor analysis, KaisereMeyereOlkin (KMO) and Bartletts test were performed. KMO is a measure of sampling adequacy that indicates the proportion
of variance which is common variance, i.e., which might be caused by underlying factors. High value (close to 1) generally indicates that principal component/factor analysis may be useful, which is the case in this study:
KMO 0.87. Bartletts test of sphericity indicates whether correlation matrix
is an identity matrix, which would indicate that variables are unrelated. The

significance level which is 0 in this study (less than 0.05) indicates that there
are significant relationships among variables.
Spearman rank-order correlations (Spearman R coefficient) were used to
study the correlation structure between variables to account for non-normal
distribution of water quality parameters (Wunderlin et al., 2001). In this study,
temporal variations of river water quality parameters were first evaluated
through a season parameter correlation matrix, using Spearman non-parametric correlation coefficients (Spearmans R). The water quality parameters were
grouped into four seasons: spring (MarcheMay), summer (JuneeAugust), autumn (SeptembereNovember) and winter (DecembereFebruary), and each assigned a numerical value in the data file (spring 1; summer 2; autumn 3
and winter 4), which, as a variable corresponding to the season, was correlated (pair by pair) with all the measured parameters.
River water quality data sets were subjected to four multivariate techniques:
cluster analysis (CA), principal component analysis (PCA), factor analysis (FA)
and discriminant analysis (DA) (Wunderlin et al., 2001; Simeonov et al., 2003;
Singh et al., 2004, 2005). DA was applied to raw data, whereas PCA, FA and CA
were applied to experimental data, standardized through z-scale transformation
to avoid misclassifications arising from the different orders of magnitude of both
numerical values and variance of the parameters analyzed (Liu et al., 2003; Simeonov et al., 2003). All mathematical and statistical computations were made
using Microsoft Office Excel 2003 and STATISTICA 6.

2.3.1. Cluster analysis


Cluster analysis is a group of multivariate techniques whose primary purpose is to assemble objects based on the characteristics they possess. Cluster
analysis classifies objects, so that each object is similar to the others in the
cluster with respect to a predetermined selection criterion. The resulting clusters of objects should then exhibit high internal (within-cluster) homogeneity
and high external (between cluster) heterogeneity. Hierarchical agglomerative
clustering is the most common approach, which provides intuitive similarity
relationships between any one sample and the entire data set, and is typically
illustrated by a dendrogram (tree diagram) (McKenna, 2003). The dendrogram
provides a visual summary of the clustering processes, presenting a picture of
the groups and their proximity, with a dramatic reduction in dimensionality of
the original data. The Euclidean distance usually gives the similarity between
two samples and a distance can be represented by the difference between analytical values from the samples (Otto, 1998). In this study, hierarchical agglomerative CA was performed on the normalized data set by means of the
Wards method, using squared Euclidean distances as a measure of similarity.
The Wards method uses an analysis of variance approach to evaluate the distances between clusters in an attempt to minimize the sum of squares (SS) of
any two clusters that can be formed at each step. The spatial variability of water quality in the whole river basin was determined from CA, using the linkage
distance, reported as Dlink/Dmax, which represents the quotient between the
linkage distances for a particular case divided by the maximal linkage distance. The quotient is then multiplied by 100 as a way to standardize the linkage distance represented on the y-axis (Wunderlin et al., 2001; Simeonov et al.,
2003; Singh et al., 2004, 2005).

Table 1
Water quality parameters, units and analytical methods used during 1995e2002 for surface waters of the Fuji river basin
Parameters
Discharge
Temperature
Dissolved oxygen
Biochemical oxygen demand
Chemical oxygen demand (Mn)
pH
Total suspended solids
Electrical conductivity
Total coliforms
Nitrate nitrogen
Ammonical nitrogen
Inorganic dissolved phosphorus

Abbreviations
Q
WT
DO
BOD
CODMn
pH
TSS
EC
TC
NO3-N
NH4-N
PO4-P

Units
3

m /s

C
mg l1
mg l1
mg l1
pH unit
mg l1
mS cm1
MPN/100 ml
mg l1
mg l1
mg l1

Analytical methods
Current meter
Mercury thermometer
Winkler azide method
Winkler azide method
Potassium permanganate
pH-meter
Dried at 103e105  C
Electrometric
Multiple tube method
Ion chromatographic
Phenate
Ascorbic acid

Table 2
Range, mean and S.D. of water quality parameters at different locations of the Fuji river basin during 1995e2002
Parameters
3

Station 1

Station 2

Station 3

Station 4

Station 5

Station 6

Station 7

Station 8

Station 9

Station 10

Station 11

Station 12

Station 13

Range
Mean
S.D

0.72e22.2
10.21
5.27

1.7e32.1
13.35
10.8

1.7e32.1
13.28
10.7

0.01e7.0
1.26
1.55

2.8e27.0
7.49
5.06

0.62e11.3
2.65
2.47

0.5e10.4
2.72
2.57

3.0e116.1
12.09
17.57

0e36
2.15
5.04

8.6e57.9
20.43
10.62

17.9e36.4
24.78
5.93

0.95e77.1
41.16
15.91

3.2e85.8
23.03
22.27

WT (  C)

Range
Mean
S.D

0.8e27.0
13.35
6.90

1.1e29
14.52
7.40

1.8e32.8
15.76
7.70

0.5e29.4
12.83
6.90

3.6e27.8
13.72
6.00

3e31.5
15.71
7.30

2.0e33.2
14.25
6.90

3.2e29.0
15.89
6.70

3.2e34.0
17.53
8.20

4e31.0
17.47
6.70

5.5e28.5
16.72
5.30

1e29.5
16.45
6.70

0.6e27.5
15.82
6.10

Range
DO
(mg l1) Mean
S.D

7.8e13.9
10.24
1.50

7.8e13.6
10.27
1.40

7.4e14.2
10.58
1.50

7.4e15.0
10.45
1.70

7e14.0
10.45
1.40

7e14.0
9.70
1.60

7e13.4
10.06
1.60

6.8e14.0
10.16
1.50

7e14.0
10.45
1.40

5.2e10.4
7.98
1.20

5.5e10.8
8.09
1.10

6e12.6
8.83
1.20

7.2e13.9
9.69
1.30

Range
BOD
(mg l1) Mean
S.D

0e4.4
0.82
0.40

0.2e7.6
1.04
0.70

0.1e4.3
1.11
0.50

0.3e4.4
0.74
0.30

0.1e8.1
0.88
0.70

0.6e10.5
2.73
1.50

0.4e17.7
1.27
1.60

0.5e8.8
1.64
1.00

0.5e10.0
2.67
1.50

0.6e8.7
3.14
1.40

0.8e5.6
2.52
0.90

0.1e101.0
2.54
7.20

0.2e3.2
0.82
0.40

Range
CODMn
(mg l1) Mean
S.D

1.1e19.0
2.02
1.50

1.1e17.0
2.36
1.20

0.8e8.2
2.35
0.80

0.8e4.8
2.10
0.50

0.7e7.6
1.98
0.80

2.3e21.0
4.43
2.20

0.00e22.0
2.55
2.20

1.6e17.0
3.07
1.50

1.9e9.1
4.56
1.30

2.5e8.7
4.33
1.00

2.1e7.2
3.98
0.90

2.1e8.8
3.75
1.00

0.5e11.0
2.03
1.10

pH

6.9e9.0
7.95
0.30

7.4e9.3
8.17
0.40

7.6e10.0
8.30
0.50

7.1e8.3
7.51
0.10

7.1e9.3
7.78
0.30

7.3e9.2
7.92
0.30

7.1e9.2
7.81
0.20

7.3e9.5
8.03
0.40

7e9.0
7.79
0.40

7.1e8.0
7.43
0.10

7.3e8.2
7.57
0.10

7.3e8.6
7.65
0.10

7e9.4
8.06
0.30

Range
TSS
(mg l1) Mean
S.D

1e153
7.71
15.10

1e170.0
9.91
16.50

1e190.0
12.74
21.50

0e39.0
2.77
4.00

1e57.0
4.79
7.00

1e348.0
17.81
34.50

0.6e77.0
5.01
7.50

1e120.0
8.93
12.00

1e34.0
7.62
5.50

3e74.0
14.30
9.80

0.00e49.0
14.59
7.90

3e111.0
16.53
14.30

1e734.0
17.93
66.50

Range
NO3-N
(mg l1) Mean
S.D

0.3e0.9
0.69
0.10

0.21e1.24
0.76
0.20

0.27e1.66
0.94
0.30

0.13e1.10 0.59e1.14
0.36
0.84
0.21
0.16

1.4e2.6
1.95
0.30

0.75e1.89
1.30
0.20

0.84e2.37
1.52
0.30

0.01e1.8
0.90
0.40

1.2e2.56
1.95
0.30

0.01e0.82
1.22
0.60

0.21e2.09
1.59
0.30

0.38e2.0
1.25
0.30

Range
PO4-P
(mg l1) Mean
S.D

0.01e0.08
0.03
0.01

0.02e0.14
0.04
0.02

0.03e0.33
0.09
0.05

0.0e0.10
0.01
0.01

0.01e0.11
0.06
0.02

0.00e0.16
0.03
0.02

0.01e0.11
0.04
0.02

0.01e0.45
0.08
0.05

0.01e0.41
0.13
0.07

0.06e0.22
0.12
0.03

0.02e0.53
0.10
0.05

0.01e0.20
0.05
0.01

Range
NH4-N
(mg l1) Mean
S.D

0.01e0.2
0.04
0.04

0.01e0.20
0.04
0.04

0.01e0.20
0.08
0.06

0.01e0.60 0.01e0.10
0.05
0.02
0.10
0.02

0.01e104.0
8.81
28.80

0.01e0.30
0.03
0.05

0.01e0.60
0.07
0.10

0.0e1.30
0.25
0.32

0.02e3.0
0.84
0.60

0.1e0.82
0.36
0.20

0.01e0.80
0.27
0.20

0.01e0.20
0.03
0.04

Range
EC
(mS cm1)Mean
S.D

101e185
143.28
16.90

18e208.0
154.81
28.40

11.6e287.0 39e109.0 57e128.0


167.51
61.67
88.98
53.60
12.30
17.10

126e186.0
146.71
12.40

109e216.0
147.86
23.70

95e179.0
138.25
19.40

103e314.0 165.0e333.0
203.79
246.75
49.80
37.70

180e305.0
257.63
25.50

152e278.0
217.17
26.30

14.7e258.0
200.84
59.50

TC (MPN/ Range
100 ml)
Mean
S.D

79e
9.2E 04
7.6E 03
1.4E 04

130e
1.6E 05
1.1E 04
2.0E 04

23e
9.2E 04
8.1E 03
1.6E 04

7600e
7.9E 05
1.3E 05
2.4E 05

490e
1.4E 06
4.7E 04
1.7E 05

1700e
1.4E 06
5.3E 04

54e
2.4E 05
3.9E 04
5.7E 04

2300e
2.4E 05
4.7E 04
4.6E 04

3300e
7.9E 05
3.9E 04
9.8E 04

33e
2.4E 05
1.2E 04
2.9E 04

Range
Mean
S.D

0e
3.5E 04
4.5E 03
6.3E 03

1 Funayamabashi; 2 Singenbashi; 3 Sangunnishibashi;


11 Sangunhigashibashi; 12 Fujibashi; 13 Nanbubashi.

0.00e0.04
0.01
0.01

200e
1.6E 05
1.3E 04
2.3E 04

4 Sakurabashi;

5 Kikkobashi;

6 Omogawabashi;

1700e
4.9E 05
3.2E 04
5.9E 04

7 Hikawabashi;

8 Ukaibashi;

9 Futagawabashi;

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

Q (m /s)

10 Torinkyo;
467

468

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

2.3.2. Principal component analysis/factor analysis


PCA is designed to transform the original variables into new, uncorrelated
variables (axes), called the principal components, which are linear combinations of the original variables. The new axes lie along the directions of maximum variance. PCA provides an objective way of finding indices of this type
so that the variation in the data can be accounted for as concisely as possible
(Sarbu and Pop, 2005). PC provides information on the most meaningful parameters, which describes a whole data set affording data reduction with minimum loss of original information (Helena et al., 2000). The principal
component (PC) can be expressed as:
zij ai1 x1j ai2 x2j ai3 x3j / aim xmj

properties. The DA technique builds up a discriminant function for each group,


which operates on raw data and this technique constructs a discriminant function for each group (Johnson and Wichern, 1992; Wunderlin et al., 2001; Singh
et al., 2004, 2005), as in the equation below:
f Gi ki

wij pij

j1

where i is the number of groups (G), ki is the constant inherent to each group, n
is the number of parameters used to classify a set of data into a given group, wj
is the weight coefficient, assigned by DA to a given selected parameters (pj).
The weight coefficient maximizes the distance between the means of the criterion (dependent) variable. The classification table, also called a confusion,
assignment or prediction matrix or table, is used to assess the performance
of DA. This is simply a table in which the rows are the observed categories
of the dependent and the columns are the predicted categories of the dependents. When prediction is perfect, all cases will lie on the diagonal. The percentage of cases on the diagonal is the percentage of correct classifications.
In this study, four groups for temporal (four seasons) and three groups for
spatial (three sampling regions) evaluations have been selected and the number
of analytical parameters used to assign a measure from a monitoring site into
a group (season or monitoring area). DA was performed on each raw data matrix using standard, forward stepwise and backward stepwise modes in constructing discriminant functions to evaluate both the spatial and temporal
variations in river water quality of the basin. The site (spatial) and the season
(temporal) were the grouping (dependent) variables, whereas all the measured
parameters constituted the independent variables.

where z is the component score, a is the component loading, x the measured


value of variable, i is the component number, j the sample number and m
the total number of variables.
FA follows PCA. The main purpose of FA is to reduce the contribution of
less significant variables to simplify even more of the data structure coming
from PCA. This purpose can be achieved by rotating the axis defined by
PCA, according to well established rules, and constructing new variables,
also called varifactors (VF). PC is a linear combination of observable water
quality variables, whereas VF can include unobservable, hypothetical, latent
variables (Vega et al., 1998; Helena et al., 2000). PCA of the normalized variables was performed to extract significant PCs and to further reduce the contribution of variables with minor significance; these PCs were subjected to
varimax rotation (raw) generating VFs (Brumelis et al., 2000; Singh et al.,
2004, 2005; Love et al., 2004; Abdul-Wahab et al., 2005). As a result, a small
number of factors will usually account for approximately the same amount of
information as do the much larger set of original observations. The FA can be
expressed as:
zji af 1 f1i af 2 f2i af 3 f3i / afm fmi efi

n
X

3. Results and discussion

3.1. Spatial similarity and site grouping

where z is the measured variable, a is the factor loading, f is the factor score, e
the residual term accounting for errors or other source of variation, i the sample number and m the total number of factors.

Cluster analysis was used to detect the similarity groups between the sampling sites. It yielded a dendrogram (Fig. 2),
grouping all 13 sampling sites of the basin into three statistically significant clusters at (Dlink/Dmax)  100 < 60. Since
we used hierarchical agglomerative cluster analysis, the number of clusters was also decided by practicality of the results as
there is ample information (e.g. landuse, location of wastewater treatment plants etc.) available on the study sites. The cluster 1 (Funayamabashi (1), Singenbashi (2), Sangunnishibashi
(3), Sakurabashi (4) and Nanbubashi (13)) corresponds to

2.3.3. Discriminant analysis


Discriminant analysis (DA) is used to classify cases into categorical-dependent values, usually a dichotomy. If discriminant analysis is effective for
a set of data, the classification table of correct and incorrect estimates will
yield a high correct percentage. In DA, multiple quantitative attributes are
used to discriminate between two or more naturally occurring groups. In contrast to CA, DA provides statistical classification of samples and it is performed with prior knowledge of membership of objects to a particular group
or cluster. Furthermore, DA helps in grouping samples sharing common

Funayamabashi (1)
Sangunnishibashi (3)
Cluster 1

Nanbubashi (13)
Singenbashi (2)
Sakurabashi (4)
Futagawabashi (9)
Sangunhigashibashi (11)
Cluster 2

Fujibashi (12)
Kikkobashi (5)
Torinkyo (10)
Omogawabashi (6)
Hikawabashi (7)

Cluster 3

Ukaibashi (8)
0

20

40

60

80

100

120

(Dlink/Dmax)*100
Fig. 2. Dendrogram showing clustering of sampling sites according to surface water quality characteristics of the Fuji river basin.

87.67
91.15
86.64
94.30

10.301
9.705
11.875

Discriminant function coefficient for winter, spring, summer and autumn seasons corresponds to wij as defined in Eq. (3).
a

0.251
1.321
6.604
4.978
4.105
41.854
0.029
0.034
0.000
10.436
31.312
6.114
206.94

0.272
0.245
4.997
4.330
4.106
41.902
0.045
0.060
0.000
8.318
30.014
23.103
198.68

0.214
0.767
4.051
4.806
4.163
40.679
0.009
0.062
0.000
6.638
27.968
23.192
196.14

0.267
0.066
5.092
4.892
3.852
40.656
0.011
0.053
0.000
9.057
28.548
22.752
192.44

0.251
1.321
6.604
4.978
4.105
41.854
0.029
0.034
0.000
10.436
31.312
6.114
206.94

0.272
0.245
4.997
4.330
4.106
41.902
0.045
0.060
0.000
8.318
30.014
23.103
198.68

0.214
0.767
4.051
4.806
4.163
40.679
0.009
0.062
0.000
6.638
27.968
23.192
196.14

0.267
0.066
5.092
4.892
3.852
40.656
0.011
0.053
0.000
9.057
28.548
22.752
192.44

7.943

0.020
0.015
0.017

0.010

0.206
3.243
11.189
0.078
0.274
2.291
12.273
0.398
0.252
1.229
13.699
0.123

0.259
2.392
12.202
0.348

469

Q
WT
DO
BOD
CODMn
pH
TSS
EC
TC
NO3-N
NH4-N
PO4-P
Constant

Summer
coefficienta
Spring
coefficienta
Winter
coefficienta
Winter
coefficienta
Spring
coefficienta

Summer
coefficienta

Autumn
coefficienta
Winter
coefficienta

Spring
coefficienta

Summer
coefficienta

Autumn
coefficienta

Backward stepwise mode


Forward stepwise mode
Standard mode

Temporal variations in river water quality parameters


(Table 1) were evaluated through a seasoneparameter correlation matrix, which shows that all the measured parameters (12
in number) were found to be significantly ( p < 0.01) correlated with the season, except COD(Mn), TSS, EC and PO4-P
(abbreviations are explained in Table 1). Among these, temperature exhibited highest correlation coefficient (Spearmans
R 0.70) followed by DO (Spearmans R 0.58). The season-correlated parameter can be taken as representing the major source of temporal variations in water quality. Wide
seasonal variations in temperature and river discharge can be
attributed to the high seasonality in various water quality parameters. The non-significant correlation of COD(Mn), TSS,
EC and PO4-P with season indicates the contribution of anthropogenic sources in the catchment areas.
Temporal variations in water quality were further evaluated
through DA. Temporal DA was performed on raw data after dividing the whole data set into four seasonal groups (spring,
summer, autumn and winter). Discriminant functions (DFs)
and classification matrices (CMs) obtained from the standard,
forward stepwise and backward stepwise modes of DA are
shown in Tables 3 and 4. In forward stepwise mode, variables
are included step-by-step beginning with the more significant until no significant changes are obtained, whereas, in

Parameters

3.2. Temporal and spatial variations in river water


quality

Table 3
Classification functions (Eq. (3)) for discriminant analysis of temporal variations in water quality of the Fuji river basin

relatively less polluted (LP) sites. In cluster 1, four stations,


Sakurabashi, Singenbashi, Sangunnishibashi and Funayamabashi are situated at the upstream sites and Nanbubashi is situated at the most downstream site of the river. The inclusion
of the most downstream sampling location, Nanbubashi, in
cluster 1 group suggests the self purification and assimilative
capacity of the river. Cluster 2 (Kikkobashi (5), Futagawabashi
(9), Torinkyo (10), Sangunhigashibashi (11) and Fujibashi
(12)) correspond to highly polluted sites (HP). These stations
receive pollution mostly from domestic wastewater, wastewater treatment plants and industrial effluents located in city
areas (Kofu, Yamanashi and Isawa). The sampling station,
Kikkobashi, however, located upstream of the Fuefuki River,
receives pollution mostly from domestic wastewater. Cluster
3 (Omogawabashi (6), Hikwabashi (7) and Ukaibashi (8)) correspond to moderately polluted (MP) sites and these stations
receive pollution from non-point sources, i.e., mostly from agricultural and orchard plantation activities. While analysis was
performed from samples taken only during a low-flow period,
it can be said that there is a groundwater contribution in pollution-loading into the water bodies of this area. The results
indicate that the CA technique is useful in offering reliable
classification of surface waters in the whole region and will
make it possible to design a future spatial sampling strategy
in an optimal manner, which can reduce the number of sampling stations and associated costs. There are other reports (Simeonov et al., 2003; Singh et al., 2004, 2005; Kim et al., 2005)
where similar approach has successfully been applied to water
quality programs.

Autumn
coefficienta

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

470

Table 4
Classification matrix for discriminant analysis of temporal variation in water
quality of the Fuji river basin
Monitoring seasons

% Correct

Season assigned by DA
Winter

Spring

Summer

Autumn

99.5
59.4
92.0
71.4
85.3

183
6
0
5
194

1
57
1
11
70

0
4
172
12
188

0
29
14
70
113

Forward stepwise DA mode


Winter
99.5
Spring
59.4
Summer
92.0
Autumn
71.4
Total
85.3

183
6
0
5
194

1
57
1
11
70

0
4
172
12
188

0
29
14
70
113

Backward stepwise DA mode


Winter
98.4
Spring
63.9
Summer
90.4
Autumn
71.4
Total
85.2

182
8
0
5
195

3
62
5
12
82

0
6
169
11
186

0
21
13
70
104

Standard DA mode
Winter
Spring
Summer
Autumn
Total

backward stepwise mode, variables are removed step-by-step


beginning with the less significant until no significant changes
are obtained. The standard DA mode- and forward stepwise
DA mode-constructed DFs, including 12 parameters, are
shown in Table 3. The coefficients for the total coliform bacteria group were zero. Both the standard and forward stepwise
mode DFs using 12 discriminant variables, respectively,
yielded the corresponding CMs assigning 85% of the cases
correctly (Tables 3 and 4). However, in backward stepwise
mode, DA gave CMs with 85% correct assignations using
only six discriminant parameters (Tables 3 and 4) with little
difference in match for each season compared with the forward stepwise mode. Thus, the temporal DA results suggest
that discharge, temperature, dissolved oxygen, biochemical
oxygen demand, electrical conductivity and nitrate nitrogen
are the most significant parameters to discriminate between
the four seasons, which means that these six parameters
account for most of the expected temporal variations in the
river water quality (Table 3).
As identified by DA, box and whisker plots of the selected
parameters showing seasonal trends are given in Fig. 3. The
average discharge (Fig. 3a) is higher in spring and autumn
as compared to winter and summer. In the study period, these
might have been due to the frequent typhoons in the basin during these seasons. The average temperature (Fig. 3b) is observed to be highest in summer and lowest in winter. A
clear inverse relationship between temperature and dissolved
oxygen (Fig. 3c) is observed, which is attributed to the seasonality effect. The inverse relationship between temperature and
dissolved oxygen is a natural process because warmer water
becomes more easily saturated with oxygen and it can hold
less dissolved oxygen. The average concentration of biochemical oxygen demand (Fig. 3d) is higher in winter and spring
compared to summer and autumn. The average concentration

of electrical conductivity (Fig. 3e) follows an inverse pattern


with discharge, which shows the dilution effect. However, a decrease in average nitrate nitrogen concentration (Fig. 3f) from
winter to summer followed by an increase in autumn season is
observed. Similar temporal variations in concentration of nitrate nitrogen were also reported by Fukasawa (2005).
Spatial DA was performed with the same raw data set comprising 12 parameters after grouping into three major classes
of LP, MP and HP sites as obtained through CA. The sites
(clustered) were the grouping (dependent) variable, while all
the measured parameters constituted the independent variables. Discriminant functions (DFs) and classification matrices
(CMs), obtained from the standard, forward stepwise and
backward stepwise modes of DA, are shown in Tables 5 and
6. Similarly to temporal DA, the standard DA- and forward
stepwise DA mode-constructed DFs including 12 parameters
(Table 5), the coliform bacteria group coefficients are again
zero. Both the standard and forward stepwise mode DFs using
12 discriminant parameters yielded the corresponding CMs assigning more than 83% cases correctly (Tables 5 and 6),
whereas the backward stepwise mode DA gave CMs with
>81% correct assignations using only seven discriminant parameters (Tables 5 and 6). Backward stepwise DA shows
that discharge, temperature, biochemical oxygen demand,
pH, electrical conductivity, nitrate nitrogen and ammonical nitrogen are the discriminating parameters in space.
Box and whisker plots of discriminating parameters identified by spatial DA (backward stepwise mode) were constructed to evaluate different patterns associated with spatial
variations in river water quality (Fig. 4). The average discharge (Fig. 4a) is highest in HP sites as they receive discharge
from the major tributaries (Ara, Kamanashi and Fuefuki
Rivers), domestic wastewater, wastewater treatment plants
and industrial effluents located in city areas. The average discharge is lowest in MP sites, as they do not receive any discharge from tributaries and the upstream contributing area is
comparatively smaller than the HP sites. The average discharge of LP sites is higher than the MP sites and lower
than the HP sites. Among the stations in the LP sites, Nanbubashi is located at the most downstream site and the discharge
is influenced by water withdrawal for irrigation and industrial
uses in upper areas of the river. The river water temperature
(Fig. 4b) is highest in HP sites, as they receive discharge
from domestic wastewater, wastewater treatment plants and
industrial effluents located in city areas, which increase water
temperature. The trends for BOD (Fig. 4c), pH (Fig. 4d), EC
(Fig. 4e) and NH4-N (Fig. 4g) suggest a high load of dissolved
organic matter in the HP sites added from the domestic wastewater, wastewater treatment plants and industrial effluents located at the upstream areas of the monitoring stations. This
results in anaerobic conditions in the river, which, in turn, results in formation of ammonia and organic acids. Hydrolysis
of these acidic materials causes a decrease of pH in these sites.
The highest average concentration of nitrate nitrogen (Fig. 4f)
is observed in the MP sites. This can be attributed to the use of
nitrogenous fertilizers in orchard and agricultural areas. The
study conducted by Fukasawa (2005) also supports the idea

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

30

Q (m3s-1)

b)

40

30
Mean
SE
SD

25

20

WT (oC)

a)

10
0

20

Mean
SE
SD

15
10
5

-10

Winter

Spring Summer Autumn

Winter

DO (mg l-1)

12
11

d)
4

Mean
SE
SD

BOD (mg l-1)

13

10
9
8

Mean
SE
SD

1
0

7
6

Winter

-1

Spring Summer Autumn

Winter

Season

e)

Spring Summer Autumn

Season

f)

200
180

2.0
Mean
SE
SD

1.8

NO3-N (mg l-1)

240
220

EC (Scm-1)

Spring Summer Autumn

Season

Season

c)

471

160
140
120
100
80

1.6

Mean
SE
SD

1.4
1.2
1.0
0.8
0.6

60
Winter

Spring Summer Autumn

Season

0.4

Winter

Spring Summer Autumn

Season

Fig. 3. Temporal variations: (a) discharge, (b) temperature, (c) DO, (d) BOD, (e) EC and (f) NO3-N in surface water quality of the Fuji river basin.

that orchard- and agricultural-related activities are the source


of nitrate nitrogen in these areas.
3.3. Data structure determination and source
identification
Principal component analysis/factor analysis was performed
on the normalized data sets (12 variables) separately for the
three different regions, viz., LP, MP and HP, as delineated by
CA techniques, to compare the compositional pattern between
analyzed water samples and identify the factors influencing
each one. The input data matrices (variables  cases) for
PCA/FA were [12  218] for LP and HP and [12  127] for
MP sites. PCA of the three data sets yielded five PCs for the
LP and MP sites and three PCs for the HP sites with Eigenvalues
>1, explaining 73.18, 77.61 and 65.39% of the total variance in

respective water quality data sets. An Eigenvalue gives a measure of the significance of the factor: the factors with the highest
Eigenvalues are the most significant. Eigenvalues of 1.0 or
greater are considered significant (Kim and Mueller, 1987).
Equal numbers of VFs were obtained for three sites through
FA performed on the PCs. Corresponding VFs, variable loadings and explained variance are presented in Table 7. Liu
et al. (2003) classified the factor loadings as strong, moderate and weak, corresponding to absolute loading values of
>0.75, 0.75e0.50 and 0.50e0.30, respectively.
For the data set pertaining to LP sites, among five VFs,
VF1, explaining 22.23% of total variance, has strong positive
loading on discharge. VF2, explaining 17.09% of the total
variance, has strong positive loadings on temperature and
strong negative loadings on dissolved oxygen. VF1 and VF2
represent the seasonal impact of discharge and temperature.

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

472

Table 5
Classification functions (Eq. (3)) for discriminant analysis of spatial variations of water quality in the Fuji river basin
Parameters Standard mode

Forward stepwise mode

0.120
0.143
3.594
5.145
3.183
50.579
0.050
0.094
0.000
10.339
23.916
50.648
222.46

0.057
0.022
3.521
4.292
3.569
50.436
0.022
0.114
0.000
16.997
20.225
64.076
229.15

0.146
0.033
3.991
4.807
3.577
46.929
0.026
0.086
0.000
11.667
24.336
44.699
205.47

Backward stepwise mode

LP coefficient MP coefficient HP coefficeint LP coefficient MP coefficient HP coefficeint LP coefficienta MP coefficienta HP coefficeinta


Q
WT
DO
BOD
CODMn
pH
TSS
EC
TC
NO3-N
NH4-N
PO4-P
Constant

0.120
0.143
3.594
5.145
3.183
50.579
0.050
0.094
0.000
10.339
23.916
50.648
222.46

Table 6
Classification matrix for discriminant analysis of spatial variations of water
quality in the Fuji river basin
Monitoring regions

% Correct

Regions assigned by DA
LPa

MPb

HPc

90.6
84.9
75.9
83.7

202
13
47
262

5
107
5
117

16
6
164
186

Forward stepwise DA mode


90.6
LPa
MPb
84.9
HPc
75.9
Total
83.7

202
13
47
262

5
107
5
117

16
6
164
186

Backward stepwise DA mode


89.7
LPa
MPb
83.3
72.5
HPc
Total
81.7

200
14
54
268

5
105
6
116

18
7
158
183

Standard DA mode
LPa
MPb
HPc
Total

0.057
0.022
3.521
4.292
3.569
50.436
0.022
0.114
0.000
16.997
20.225
64.076
229.15

0.146
0.033
3.991
4.807
3.577
46.929
0.026
0.086
0.000
11.667
24.336
44.699
205.47

0.102
0.513

0.030
0.374

0.111
0.406

0.710

0.308

0.215

51.748

51.393

48.712

0.106

0.127

0.094

8.431
15.155

14.821
10.881

9.663
15.913

203.16

208.44

185.31

Coefficients for different monitoring regions correspond to wij as defined in Eq. (3).

The inverse relationship between temperature and dissolved


oxygen is a natural process because warmer water becomes
saturated more easily with oxygen and it can hold less dissolved oxygen. VF3, explaining 13.92% of the total variance,
has strong positive loadings for suspended solids and chemical
oxygen demands. This factor explains the erosion from upland
areas during rainfall events and the positive correlation with
COD indicates the loading of partially decayed organic matters from forested areas. VF4, explaining 11.32% of total variance, has strong positive loadings on ammonical nitrogen and
moderate loadings on inorganic dissolved phosphorus and biochemical oxygen demand. This factor represents organic pollution from domestic waste. VF5, explaining the lowest
variance (8.62%), has strong positive loadings on pH and moderate positive loadings on electrical conductivity and represents the physiochemical source of variability.

LP includes stations (1, 2, 3, 4, and 13).


MP includes stations (6, 7 and 8).
HP includes stations (5, 9, 10, 11 and 12).

For the data set representing the MP sites, among total


five significant VFs, VF1, explaining about 24.98% of total
variance, has strong positive loadings on nitrate nitrogen
and biochemical oxygen demand. This factor represents the
contribution of non-point source pollution from orchard and
agricultural areas. In these areas, farmers use the nitrogenous
fertilizer, which undergo nitrification processes, and the rivers
receive nitrate nitrogen via groundwater leaching. This fact is
also supported by the studies of Kazama and Yoneyama
(2002) and Fukasawa (2005). VF2, explaining about
20.20% of total variance, has strong positive loading on temperature and strong negative loadings on dissolved oxygen
and moderate positive loadings on ammonical nitrogen.
This factor can be attributed to seasonal changes. VF3, explaining about 13.51% of total variance, has strong positive
loadings on pH and inorganic dissolved phosphorus. VF4, explaining 9.60% of total variance, has strong positive loadings
on total suspended solids and chemical oxygen demands.
This factor represents the erosion effect during cultivation
of soil and associated organic matter. VF5 (9.31%) has strong
positive loadings on discharge and strong negative loadings
on electrical conductivity. This factor represents the dilution
effect.
Lastly, for the data set pertaining to water quality in HP
sites, among the three VFs, VF1, explaining 32.83% of total
variance, has strong positive loadings on biochemical oxygen
demand, chemical oxygen demand, electrical conductivity,
ammonical nitrogen and inorganic dissolved phosphorus.
This organic factor can be interpreted as representing influences from point sources, such as of discharges from wastewater treatment plants, domestic wastewater and industrial
effluents. VF2, explaining 17.59% of total variance, has strong
positive loadings on discharge and strong negative loadings on
pH and moderate negative loadings on dissolved oxygen and
moderate positive loadings on nitrate nitrogen. The strong negative loading in pH and moderate negative loading in DO is
due to anaerobic conditions in the river from the loading of
high dissolved organic matter, which results in formation of
ammonia and organic acids leading to a decrease in pH.

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

a)

b)

40
Mean
SE
SD

20

WT (oC)

Q (m3 s-1)

30

10
0
-10

LP

HP

26
24
22
20
18
16
14
12
10
8
6

Mean
SE
SD

LP

MP

Mean
SE
SD

pH

BOD (mg l-1)

d)

5
4

2
1
0

LP

HP

MP

9.0
8.8
8.6
8.4
8.2
8.0
7.8
7.6
7.4
7.2
7.0

SE
SD

LP

HP

MP

Monitoring Region

f)

2.2
2.0

Mean
SE
SD

NO3-N (mg l-1)

EC (Scm-1)

300
280
260
240
220
200
180
160
140
120
100
80
60

MP

Mean

Monitoring Region

e)

HP

Monitoring Region

Monitoring Region

c)

473

1.8

Mean

1.6

SE
SD

1.4
1.2
1.0
0.8
0.6
0.4

LP

HP

0.2

MP

LP

Monitoring Region

g)

MP

0.8
0.6

NH4-N (mg l-1)

HP

Monitoring Region

0.4

Mean
SE
SD

0.2
0.0
-0.2

LP

HP

MP

Monitoring Region
Fig. 4. Spatial variations: (a) discharge, (b) temperature, (c) BOD, (d) pH, (e) EC, (f) NO3-N and (g) NH4-N in surface water quality of the Fuji river basin.

VF3, explaining 14.97% of total variance, has strong positive


loadings on temperature and moderate negative loadings on
dissolved oxygen. This factor represents the seasonal effect
of temperature.

4. Conclusions
In this case study, different multivariate statistical techniques were used to evaluate spatial and temporal variations

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475

474

Table 7
Loadings of experimental variables (12) on significant principal components
for (a) LP sites, (b) MP sites and (c) HP sites data sets
Variables

VF3

VF4

VF5

LP sites (five significant principal components)


Q
0.700
0.127
WT
0.037
0.889
DO
0.004 L0.885
BOD
0.071 0.227
CODMn
0.127
0.191
pH
0.112
0.237
TSS
0.212
0.012
EC
0.549
0.054
TC
0.394
0.511
NO3-N
0.437 0.121
NH4-N
0.057 0.161
PO4-P
0.449
0.146
Eigenvalue
2.67
2.05
% Total variance
22.23
17.09
Cumulative % variance 22.23
39.33

VF1

VF2

0.429
0.034
0.075
0.417
0.728
0.005
0.810
0.164
0.038
0.007
0.010
0.078
1.67
13.92
53.24

0.144
0.073
0.124
0.519
0.092
0.056
0.040
0.048
0.173
0.053
0.854
0.555
1.36
11.32
64.56

0.109
0.344
0.057
0.405
0.103
0.850
0.148
0.658
0.405
0.247
0.096
0.326
1.03
8.62
73.18

MP sites (five significant principal components)


Q
0.083
0.046
WT
0.055
0.938
DO
0.055 L0.908
BOD
0.865
0.193
0.565 0.004
CODMn
pH
0.064 0.159
TSS
0.021 0.143
EC
0.001
0.143
TC
0.424 0.289
NO3-N
0.822
0.267
0.231
0.661
NH4-N
PO4-P
0.066
0.059
Eigenvalue
3.00
2.42
% Total variance
24.98
20.20
Cumulative % variance 24.98
45.18

0.060
0.167
0.027
0.138
0.243
0.791
0.134
0.268
0.232
0.263
0.180
0.799
1.62
13.51
58.69

0.035
0.022
0.175
0.290
0.739
0.209
0.882
0.009
0.150
0.449
0.106
0.226
1.15
9.60
68.30

0.831
0.033
0.008
0.069
0.098
0.162
0.030
L0.734
0.010
0.264
0.003
0.325
1.12
9.31
77.61

HP sites (three significant principal components)


Q
0.053
0.742
0.209
WT
0.055 0.038
0.894
DO
0.004 0.660 0.566
BOD
0.917 0.068 0.055
0.863 0.075
0.253
CODMn
pH
0.060 L0.705
0.344
TSS
0.235
0.493
0.340
EC
0.804
0.209
0.171
TC
0.070
0.069
0.539
NO3-N
0.365
0.671 0.283
NH4-N
0.769
0.226 0.301
PO4-P
0.734
0.231 0.011
Eigenvalue
3.94
2.11
1.80
% Total variance
32.83
17.59
14.97
Cumulative % variance 32.83
50.42
65.39
Bold and italic values indicate strong and moderate loadings, respectively.

in surface water quality of the Fuji river basin. Hierarchical


cluster analysis grouped 13 sampling sites into three clusters
of similar water quality characteristics. Based on obtained information, it is possible to design a future, optimal sampling
strategy, which could reduce the number of sampling stations
and associated costs. Although the factor analysis/principle
component analysis did not result in a significant data reduction, it helped extract and identify the factors/sources responsible for variations in river water quality at three different

sampling sites. Varifactors obtained from factor analysis indicate that the parameters responsible for water quality variations are mainly related to discharge and temperature
(natural), organic pollution (point source: domestic wastewater) in relatively less polluted areas, organic pollution (point
source: domestic wastewater) and nutrients (non-point sources: agriculture and orchard plantations) in medium polluted
areas, and organic pollution and nutrients (point sources: domestic wastewater, wastewater treatment plants and industries)
in highly polluted areas in the basin. Discriminant analysis
gave the best results both spatially and temporally. For three
different sampling sites of the basin, it yielded an important
data reduction, as it used only six parameters (discharge, temperature, dissolved oxygen, biochemical oxygen demand,
electrical conductivity and nitrate nitrogen) affording more
than 85% correct assignations in temporal analysis, and seven
parameters (discharge, temperature, biochemical oxygen demand, pH, electrical conductivity, nitrate nitrogen and ammonical nitrogen) affording more than 81% correct
assignations in spatial analysis. Therefore, DA allowed a reduction in the dimensionality of the large data set, delineating
a few indicator parameters responsible for large variations in
water quality. Thus, this study illustrates the usefulness of
multivariate statistical techniques for analysis and interpretation of complex data sets, and in water quality assessment,
identification of pollution sources/factors and understanding
temporal/spatial variations in water quality for effective river
water quality management.
Acknowledgements
The authors sincerely thank Yuki Hiraga for her help in the
database development and the Fuji Xerox Setsutaro Kobayashi
Memorial Fund for providing funding support. We would also
like to acknowledge the help and support provided by the 21st
Century Center of Excellence (COE), Integrated River Basin
Management in Asian Monsoon Region, University of
Yamanashi.
References
Abdul-Wahab, S.A., Bakheit, C.S., Al-Alawi, S.M., 2005. Principal component and multiple regression analysis in modelling of ground-level ozone
and factors affecting its concentrations. Environmental Modelling & Software 20 (10), 1263e1271.
Adams, S., Titus, R., Pietesen, K., Tredoux, G., Harris, C., 2001. Hydrochemical characteristic of aquifers near Sutherland in the Western Karoo, South
Africa. Journal of Hydrology 241, 91e103.
Bricker, O.P., Jones, B.F., 1995. Main factors affecting the composition of natural waters. In: Salbu, B., Steinnes, E. (Eds.), Trace Elements in Natural
Waters. CRC Press, Boca Raton, FL, pp. 1e5.
Brumelis, G., Lapina, L., Nikodemus, O., Tabors, G., 2000. Use of an artificial
model of monitoring data to aid interpretation of principal component
analysis. Environmental Modelling & Software 15 (8), 755e763.
Dixon, W., Chiswell, B., 1996. Review of aquatic monitoring program design.
Water Research 30, 1935e1948.
EDYP, 2004. Result of Water Quality Measurement: Public and Ground Water.
Atmospheric Water Quality Control Section, Forest and Environment
Division of Yamanashi Prefecture.

S. Shrestha, F. Kazama / Environmental Modelling & Software 22 (2007) 464e475


Fukasawa, E., 2005. Determination of origin of nitrate nitrogen in Fuefuki
river using stable isotope method. Bachelor thesis, Department of Ecosocial System Engineering, University of Yamanashi, Japan (in Japanese).
Helena, B., Pardo, R., Vega, M., Barrado, E., Fernandez, J.M., Fernandez, L.,
2000. Temporal evolution of groundwater composition in an alluvial aquifer (Pisuerga river, Spain) by principal component analysis. Water Research 34, 807e816.
Johnson, R.A., Wichern, D.W., 1992. Applied Multivariate Statistical Analysis. Prentice-Hall, Englewood Cliffs, NJ.
Kazama, F., Yoneyama, M., 2002. Nitrogen generation in the Yamanashi prefecture and its effects on the groundwater pollution. International Environmental Science 15 (4), 293e298. (in Japanese).
Kim, J.H., Kim, R.H., Lee, J., Cheong, T.J., Yum, B.W., Chang, H.W., 2005.
Multivariate statistical analysis to identify the major factors governing
groundwater quality in the coastal area of Kimje, South Korea. Hydrological Processes 19, 1261e1276.
Kim, J.-O., Mueller, C.W., 1987. Introduction to factor analysis: what it is and
how to do it. Quantitative Applications in the Social Sciences Series. Sage
University Press, Newbury Park.
Lee, J.Y., Cheon, J.Y., Lee, K.K., Lee, S.Y., Lee, M.H., 2001. Statistical evaluation of geochemical parameter distribution in a ground water system
contaminated with petroleum hydrocarbons. Journal of Environmental
Quality 30, 1548e1563.
Liu, C.W., Lin, K.H., Kuo, Y.M., 2003. Application of factor analysis in the
assessment of groundwater quality in a Blackfoot disease area in Taiwan.
Science in the Total Environment 313, 77e89.
Love, D., Hallbauer, D., Amos, A., Hranova, R., 2004. Factor analysis as a tool
in groundwater quality management: two southern African case studies.
Physics and Chemistry of the Earth 29, 1135e1143.
McKenna Jr., J.E., 2003. An enhanced cluster analysis program with bootstrap
significance testing for ecological community analysis. Environmental
Modelling & Software 18 (3), 205e220.

475

Otto, M., 1998. Multivariate methods. In: Kellner, R., Mermet, J.M., Otto, M.,
Widmer, H.M. (Eds.), Analytical Chemistry. WileyeVCH, Weinheim.
Reghunath, R., Murthy, T.R.S., Raghavan, B.R., 2002. The utility of multivariate statistical techniques in hydrogeochemical studies: an example from
Karnataka, India. Water Research 36, 2437e2442.
Sarbu, C., Pop, H.F., 2005. Principal component analysis versus fuzzy principal component analysis. A case study: the quality of Danube water (1985e
1996). Talanta 65, 1215e1220.
Simeonov, V., Stratis, J.A., Samara, C., Zachariadis, G., Voutsa, D.,
Anthemidis, A., Sofoniou, M., Kouimtzis, T., 2003. Assessment of the surface water quality in Northern Greece. Water Research 37, 4119e4124.
Simeonova, P., Simeonov, V., Andreev, G., 2003. Environmetric analysis of the
Struma River water quality. Central European Journal of Chemistry 2,
121e126.
Simeonov, V., Simeonova, P., Tsitouridou, R., 2004. Chemometric quality assessment of surface waters: two case studies. Chemical and Engineering
Ecology 11 (6), 449e469.
Singh, K.P., Malik, A., Mohan, D., Sinha, S., 2004. Multivariate statistical
techniques for the evaluation of spatial and temporal variations in water
quality of Gomti River (India): a case study. Water Research 38,
3980e3992.
Singh, K.P., Malik, A., Sinha, S., 2005. Water quality assessment and apportionment of pollution sources of Gomti river (India) using multivariate
statistical techniques: a case study. Analytica Chimica Acta 538,
355e374.
Vega, M., Pardo, R., Barrado, E., Deban, L., 1998. Assessment of seasonal and
polluting effects on the quality of river water by exploratory data analysis.
Water Research 32, 3581e3592.
Wunderlin, D.A., Diaz, M.P., Ame, M.V., Pesce, S.F., Hued, A.C.,
Bistoni, M.A., 2001. Pattern recognition techniques for the evaluation of
spatial and temporal variations in water quality. A case study: Suquia river
basin (Cordoba, Argentina). Water Research 35, 2881e2894.

S-ar putea să vă placă și