Surface water quality assessment using factor analysis

In this study, the factor analysis technique is applied to surface water quality data sets obtained from the Buyuk Menderes River Basin, Turkey, during two different hydrological periods. Results show that the indices which changed the quality of water in two seasons and locations differed. During low-flow conditions, water quality was strongly affected by agricultural uses. On the other hand, the main pollution source changed from agricultural uses to urban land uses in high-flow periods. Therefore major water pollution threats in the basin were urban and agricultural land uses which are defined as nonpoint sources. This technique is believed to assist decision makers in identifying priorities to improve water quality that has deteriorated due to various land uses.


Introduction
Water quality monitoring has one of the highest priorities in environmental protection policy (Simeonov et al., 2002).The main objective is to control and minimise the incidence of pollutant-oriented problems, and to provide water of appropriate quality to serve various purposes such as drinking water supply, irrigation water, etc.
The quality of water is identified in terms of its physical, chemical and biological parameters (Sargaonkar and Deshpande, 2003).The particular problem in the case of water quality monitoring is the complexity associated with analysing the large number of measured variables (Saffran, 2001).The data sets contain rich information about the behaviour of the water resources.The classification, modelling and interpretation of monitoring data are the most important steps in the assessment of water quality.
Multivariate statistical methods including factor analysis have been used successfully in hydrochemistry for many years.Surface water, groundwater quality assessment and environmental research employing multi-component techniques are well described in the literature (Praus, 2005).Multivariate statistical approaches allow deriving hidden information from the data set about the possible influences of the environment on water quality (Spanos et al., 2003).
Factor analysis attempts to explain the correlations between the observations in terms of the underlying factors, which are not directly observable (Yu et al., 2003).There are three stages in factor analysis (Gupta et al., 2005):  For all the variables a correlation matrix is generated  Factors are extracted from the correlation matrix based on the correlation coefficients of the variables  To maximise the relationship between some of the factors and variables, the factors are rotated.
A first step is the determination of the parameter correlation matrix.It is used to account for the degree of mutually shared variability between individual pairs of water quality variables.
Then, eigenvalues and factor loadings for the correlation matrix are determined.Eigenvalues correspond to an eigenfactor which identifies the groups of variables that are highly correlated among them.Lower eigenvalues may contribute little to the explanatory ability of the data.Only the first few factors are needed to account for much of the parameter variability.Once the correlation matrix and eigenvalues are obtained, factor loadings are used to measure the correlation between the variables and factors.Factor rotation is used to facilitate interpretation by providing a simpler factor structure (Zeng and Rasmussen, 2005).This study evaluated the possibility that a smaller group of water quality parameters/ locations might provide sufficient information for water quality assessment.Factor analysis was applied to a surface water quality data set collected from Buyuk Menderes Basin, Turkey using 'the Statistical Package for the Social Sciences Software-SPSS 10.0 for Windows'.Water quality monitoring was conducted at 21 stations in the study area during low-and high-flow periods.The selected parameters for the estimation of surface water quality characteristics were: electrical conductivity (EC), total dissolved solids (TDS), sodium (Na + ), potassium (K + ), calcium (Ca 2+ ), magnesium (Mg 2+ ), sulphate (SO 4 2-), nitrate-nitrogen (NO 3 -N), Kjeldahl Nitrogen, biochemical oxygen demand (BOD 5 ) and chemical oxygen demand (COD).COD measurements were performed using the potassium dichromate method.

Study area
The Buyuk Menderes River Basin is located in Western Anatolia and covers Uşak, Aydın and Denizli Provinces with a total land area of about 25 000 km 2 (Fig. 1).The basin is endowed with one of the most fertile soils in the country and the economy of the region is heavily dependent on agricultural production.In addition, rapid industrialisation and population growth over the past few decades have created additional stress on the environmental conditions in the region (Boyacioglu et al., 2004).The population of the basin is about 2 500 000 as of the year 2000, living in more than 320 municipalities and settlements, 65% of which have proper sewage systems with only about 12% of them treating their wastewater prior to discharge (State Institute of Statistics, 2005).In this regard, the study area has been subject to increasing rates of pollution originating mainly from anthropological activities.
The pollution sources of Buyuk Menderes River can be organised into three groups:  Point discharges  Non-point source contributions  Other sources Point discharges originate from either domestic or industrial polluters.While some of these discharges are made to the river after proper treatment, in many cases no treatment is applied prior to the discharge.The basic sources of non-point source pollution in the basin include the diffused transport of contaminants to river channels originating from agricultural practices.In addition, there are also other sources of pollution that degrade the quality of surface waters in Buyuk Menderes River including transport of eroded land, leachates from mining activities and solid waste disposal sites (Boyacioglu et al., 2004).

Assessment of water quality Low-flow period
As was mentioned above, one of the most fertile soils in the country is found in the Buyuk Menderes Basin.In this region, the economy is heavily dependent on agricultural production and also industrial activities, which are concentrated in the Aydin and Denizli Provinces.The climate of the region is typically Mediterranean: hot and dry in summer and temperate and rainy in winter.So, hydrological conditions of the river during the summer and winter periods are quite different.Thus, assessment of the water quality separately for summer (low flow) and winter (high flow) periods will assist in understanding the main pollutants, their sources and also determining priorities to improve water quality in two different hydrological periods.
Firstly, factor analysis was applied to data sets obtained during the low-flow period (between June-August).Descriptive statistics of the data set are presented in Table 1.
The correlation matrix of variables was generated and factors extracted by the Centroid method, rotated by Varimax rotation (Ahmed et al., 2005).Calculated eigenvalues, per cent

391
total variance, factor loadings and cumulative variance are given in Table 2.The factor analysis generated three significant factors which explained 85.9% of the variance in data sets.The following factors were indicated considering the hydrochemical aspects of the water: • Factor 2: Na + and K + • Factor 3: COD, BOD 5 , Kjeldahl -N, NO 3 -N Ca 2+ , Mg 2+ , and SO 4 2-marked Factor 1 (F1) explained 38.2% of the variance.Na + and K + were correlated with Factor 2 (F2) and COD, BOD 5 , Kjeldahl -N, NO 3 -N with factor 3 (F3).The F1 had a high positive loading in Ca 2+, Mg 2+ and SO 4 2-, which were 0.93, 0.91 and 0.90.
Urbanisation influences the water cycle through changes in flow and water quality.Urban land use (Na + , K + , Cl -) may be differentiated from other land uses such as agriculture (Ca 2+ , Mg 2+ ), through the use of biogeochemical fingerprints (Lindeman, 2004).Salts that are commonly found in subsurface drainage water include sulphates, chlorides, carbonates, and bicarbonates of calcium, and magnesium.Tail water also may contain these salts, but generally in much lower concentrations than in drainage water (Jacobsen and Basinal, 2004).Based on the results of the factor analysis and typical sources of water pollutants, it is concluded that F1 can be denoted as the 'agricultural use' factor with presence of Ca 2+ , Mg 2+ .As was mentioned above these parameters are mainly found in agricultural drainage water.F2 is strongly correlated with Na + and K + , assigned as the 'urban land-use' factor.Factor loadings were 0.94 and 0.98.COD, BOD 5 , Kjeldahl -N are included in F3 and are indicators of organic pollution in water, so F3 represents the 'organic pollution' factor.
In summary, three factors representing three different processes are:  Urban land-use factor  Agricultural use factor  Organic pollution factor.
Negative factor loading of NO 3 -N explained the disproportion between this parameter and F3.COD, BOD 5 and Kjeldahl-N which were correlated with F3, decreased with increasing NO 3 -N concentration which was caused by the nitrification process in water.
Therefore, the water quality of the Buyuk Menderes River during the low-level period was mainly controlled by agricultural pollutant sources.The loading plot of factor scores is shown in Fig. 2. Considering the location of the monitoring stations, given in Fig. 1, and the distribution of factor scores, it is concluded that: • Factor 1: Low factor scores of F1 (agricultural use factor) were observed in the west of the basin.The middle and eastern parts where high values were monitored were faced with pollution risks originating from agricultural uses.• Factor 2: High factor scores (urban land-use factor) were obtained in the north-west and also in the regions where population density is relatively high (especially in the centre of the provinces and their surroundings).• Factor 3: F3 (organic pollution factor) scores were distributed in the basin almost uniformly.Depending on the presence of infrastructure and wastewater treatment efficiency, highest and lowest scores were observed even at the stations located next to each other.So, the settlements having no treatment plants increased the organic pollution risk.

High-flow period
The high-flow period may have positive effects with dilution of surface water by rain and stormwater.On the other hand, runoff water increases pollutant concentrations, thereby decreases quality.To assess the water quality of the Buyuk Menderes River under high-flow conditions, factor analysis was applied to data sets obtained from 21 monitoring stations between November-January.Descriptive statistics of the data are presented in Table 3.
Results of factor analysis including factor-loading matrix, eigenvalues and total and cumulative variance values are given in Table 4.
It is suggested that, F1 represents the urban land-use character-  istics shown by presence of K + and Na + .This factor explained 37.33 % of variance.F2 is strongly correlated with Ca 2+ and Mg 2+ which are mainly originated from agricultural uses.F3 was marked by BOD 5 , COD and Kjeldahl-N.Thus, urban land use was the major pollution source in this hydrological period.
For each section, factor scores are shown in Fig. 3. Considering distribution of factor scores and locations of the monitoring stations, it is concluded that: • Factor 1: High factor scores of F1 (urban land-use factor) were observed at the northwest part, downstream of the basin.• Factor 2: Relatively high values of agricultural use factor (F2) obtained in the middle of the basin, where agriculture is the most important economic activity.Low scores were monitored at the west part.• Factor 3: Low and high scores of organic pollution factor (F3) were distributed in the basin, because F3 depends on point pollution sources and is affected by infrastructure (sewage network and treatment plants) of the settlements.

Conclusions
The factors indicative of water quality in different hydro-logical periods and locations differed in Buyuk Menderes Basin.Under high-flow conditions pollutants mainly originated from urban land use and 37.3% of total variance was explained by the urban land-use factor.On the other hand water quality was controlled by agricultural pollutant sources during the low-flow period.Although the agricultural use factor explained 38.2% of the variance, for the land-use factor, it was only 28.5 under dry weather conditions.So, the major pollutant source changed from urban land uses to agricultural uses during the low-flow period.The main reason for this was the negative effect of runoff to surface water quality, because the storage ability, the buffering capacity of roads and buildings to rain or stormwater in urban areas, had been drastically weakened.Thus, major pollution threats in low-and high-flow periods were urban and agricultural land uses which are defined as nonpoint pollution sources.Therefore priority should be given to minimisation of these sources to improve water quality in the basin.
This study shows that factor analysis is a useful method that could assist decision makers in determining the extent of pollution via practical pollution indicators.It could also provide a crude guideline for selecting the priorities of possible preventative measures in the proper management of the surface water resources of the basin (Boyacioglu et al., 2004).

Figure 2
Figure 2The loading plot of factor scores in low-flow period

Figure 3
Figure 3The loading plot of factor scores during the high-flow period