Location and characterisation of pollution sites by principal component analysis of trace contaminants in a slightly polluted seasonal river : a case study of the arenales River ( Salta , argentina )

Principal component analysis (PCA) was used to deduce the common origin of trace contaminants in a slightly contaminated, strongly seasonal river of low-average discharge, aiming to ascertain the type of the pollution. Splitting of data into categories according to specific conductance was essential to reach conclusions. Dry-season data allowed the pinpointing of polluting sites by means of the biplots resulting from the representation of the scores on the components. Concentrations corresponding to the wet seasons yielded no useful results probably due to the high percentage of data below detection limits for 2 of the 6 variables. The Arenales River in North-West Argentina was monitored by means of 19 sampling campaigns between 2003 and 2005 comprising two hydrological cycles, at seven locations along a 25 km section of the river course across the city of Salta. Pollution of the river was not severe, overall mean values in μg/l being: As 1.2; B 490; Cu 4; Fe 92; Pb 13; Zn 83. Simple correlation analysis revealed no significant correlation between these elements. The high positive loadings of variables B and As concentrations on the first principal component and the biplots indicate that their main common point sources are boron mineral deposits still existing in the urban area. Interpretation of the biplots shows that Cu, Fe and Zn contamination also originated at point sources, the contribution of the sewage treatment plant being negligible.


Introduction
River water quality monitoring is mandatory in present-day society, especially for rivers affected by urban effluents.The registration of their physico-chemical characteristics and of the concentration of their main components as well of any trace elements present is recommended to establish the level of contamination, the efficiency of the wastewater treatment facility when it exists, and the degree of recovery of the river water quality.
Even if the river water is not used as a source of drinking water, pollution with microbes and organic and inorganic substances can pose a health hazard to water biota and to humans as well.When the river water is used for irrigation, even low concentrations of certain elements like B, Cu, Fe and Zn can produce drastic effects on the yields, because, though needed for the normal development of plants, at the same time they can be poisonous when present at certain concentration levels.Abundant literature can be found on the determination of the concentration levels of numerous inorganic substances in river water, from the Mississippi (Meybeck et al., 1989) through the Danube (Marjanovic et al., 1985) and to the Huanghe (Zhang et al., 1993), since the first half of the 20 th century.In some instances the mineral contents reflect the main composition of the river basin (De Villiers, 2005), but when abnormally high concentrations are found, the origin of the pollution is identified so as to be in a position to take adequate protective action for the environment (Schmitz et al., 1994).
Data produced by the monitoring of river water contamination with heavy metals and other elements in trace level concentration are generally used to assess water quality according to concentration limits established by environmental protection authorities.But the sources of the different elements are not always clear, the hydrological system constituting a complicated background.Statistical analysis is essential for proper interpretation of these time-and location-dependent data, to establish correlations not only among the data, but between these and the geological and climatic parameters as well, so as to characterise the water system.When concentrations of a relatively high number of elements are recorded, factorial analysis is used to reduce the number of variables and to highlight any relationships among them.
Principal component analysis (PCA) is proving to be a valuable tool to establish the hydrochemistry of rivers, as shown in extensive studies carried out by Simeonov et al. (2003) in Greece; to typify pollution sources of surface waters (Wunderlin et al., 2001); or coastal seawaters (Morales et al., 1999); to study the temporal variation of groundwater (Helena et al., 2000); and for establishing water quality (Praus, 2005).
Trace element concentrations in a highly seasonal river at 7 sampling sites during 2 hydrological cycles were analysed to assess the nature and the origin of the pollution.However, the high background noise provided by the low concentrations involved (less than 1 mg/ℓ, except for B), the sporadic nature of the pollution at most of the sites, and the ample variations of the river flow hindered the simple statistical analysis by correlation coefficients.Even PCA did not yield a clear picture until the data were separated into two groups, not according to rainfall records, but to specific conductivity.Data corresponding to the dry seasons defined in this way proved to be useful to establish correlations among the polluting elements, and to point out the probable origin of the contamination along the river course.'QUIMIO', easy-to-use software created for statistical calculations in chemical analyses (Cela, 1994) was used to process the data.

The monitoring area
The city of Salta, with 500 000 inhabitants, lies at the northern extreme of the Lerma Valley, limited by the Andean cordillera in the West and the Mojotoro hills in the East, in north-west Argentina (Fig. 1).
The Valley is characterised by its proximity to the Tropic of Capricorn and by its elevation of about 1 100 m a.s.l.The climate is, in consequence, of a semi-desert nature, with a short rainy season in summer.The water flow of the Arenales River, which varies quite significantly naturally through a hydrological cycle (8.8 to 284 m 3 /s), has been so drastically altered upstream by its intensive use for irrigation that by the end of the dry season there is practically no superficial flow when it enters the western outskirts of the city.The flow gradually increases when the river course changes due south, by inflows from surfacing aquifers at the eastern border of the Valley, and by the effluents of the city's wastewater treatment plant.Consequently the water quality declines significantly between April and November, B being the main contaminant.This anthropogenic trace component has been introduced to the Lerma Valley from the Andean plateau as raw material used by boron mineral-processing plants, and a decade ago constituted a very serious contamination of the water system (Lomniczi et al., 1997 and1999), including the reservoir General Belgrano of 3 130 Hm 3 , fed mainly by the Arenales River.
Following the introduction of strict environmental protection laws the concentration of B has declined significantly (from a median of up to 3.90 mg/ℓ between 1991 and 1994 to 0.38 mg/ℓ in 2005), but it still frequently presents high values (over the 0.5 mg/ℓ maximum limit for irrigation water), the origin of which must be ascertained.Fish malformation was reported by news media in the river and the reservoir, so heavy metal concentration also had to be monitored.According to the only previous study of the river, sampled on 3 occasions at 7 sites of a 75 km section (including 2 sampling sites 50 km upstream from Salta city) in 1998 to 1999, Cu, Pb and Zn concentrations increased in the urban environment to values considered toxic to water biota (Musso, 2003).The two most probable sites of contamination seemed to be the industrial park and the inflow from the sewage-processing plant.More frequent monitoring of the concentrations of these, as well as of other elements, at an increased number of sites in the urban environment was needed to discern the origin of the pollution.
Sampling sites (Fig. 1) were selected according to suspected inflow of different kinds of effluents.Site No 1 lies on the Arias River, the only tributary of the Arenales River in the area under study, without suspected raw sewage discharges.This site could represent uncontaminated conditions, as according to former studies the natural content of the monitored trace elements is very similar to that of the nearest unpolluted site of the Arenales River, 50 km upstream of the city of Salta (Musso, 2003).Site No 7 lies 12 km south of the city, so that the quality of the river water here would provide a measure of the degree of its selfrecovery from the urban pollution with trace elements.

Experimental
Water samples were collected between September 2003 and March 2005 on 19 occasions, in 5 ℓ high-density polyethylene containers, previously washed with AR quality HNO 3 and rinsed afterwards with water distilled over glass, and delivered to the laboratory on the day they were collected.They were filtered through glass-fibre filter paper (Whatman 934-AH or similar) and preserved with HNO 3 , according to Standard Methods (1992).Total concentrations of Cu, Fe, Pb and Zn, in samples belonging to the 2003 sampling campaigns, were determined by Flame Atomic Absorption Spectrometry (FAAS) after a ten-fold preconcentration by extraction with methyl-isobutylketone and ammonium pyrrolidine-dithiocarbamate at pH 3.5, (regulated with sodium citrate/citric acid buffer solution).For the 2004 and 2005 sampling campaigns expanded instrumental response by FAAS to Cu, Fe, and Zn was applied to the original samples, once this method was proved to be able to yield repeatability (%RSD) of the same order by performing 12 instead of 5 absorbance readings.At the same time Pb determination was shifted to electrothermal graphite furnace AAS (ETAAS).FAAS preceded by continuous-flow hydride generation with sodium borohydride was used as analytical technique for the determination of As, while B concentration was determined by molecular absorption spectroscopy with azomethine-H, the detection limit being 0.02 mg B/ℓ, as established in previous studies (Lomniczi et al., 1995).Table 1 shows typical numerical values of DL expressed in µ/ℓ, calculated as 3 times the standard deviation of the concentration of reactive blanks.Repeatability of the analyses was assessed as percentage relative standard deviation (%RSD) of duplicates of samples systematically prepared with each batch of determination.Typical values of %RSD are listed in Table 1.Analyses of standard solutions as well as of blanks taken to the sampling sites in collector vessels of the type used for sampling assured that contamination during transport, as well as loss by adsorption on the sampling vessels, were negligible for all trace elements considered.Data were handled according to analytical quality assurance procedures (Bartram et al., 1996).Accuracy was controlled by spiking several samples of every batch under analysis, mean recovery for spiked samples being 83 to 117%.Validation requirements of trace-metal analyses are satisfied by external evaluation through national inter-laboratory comparative exercises practiced every 2 years since 1996.

Results and discussion
General mean concentrations of Cu, Pb and Zn in the Arenales River are high compared to maximum levels tolerated by National Environmental Law 24.051 for water biota protection, while that of B is very near the 500 µg/ℓ maximum recommended limit for irrigation water, confirming the persistence of the pollution previously detected (Musso, 2003).(Table 2) Variation of the concentrations along the river course through the city should point out the sites of clandestine sporadic inflow of urban waste, the efficiency of the local wastewater treatment system in eliminating them from the sewage, and the capacity for self-recovery of the river regarding these contaminants.
All data distributions were tested for normality, concentrations below detection limits considered as zero.The cumulative frequency distributions were compared with the Gaussian cumulative distributions corresponding to the average and standard deviation values of the data.Correlation coefficients between these were significant in every instance, the t value calculated according to Fisher (Sánchez de Peña, 1997) several times higher than the tabulated value of 2, so that r 2 values could be considered statistically significant at 0.95 confidence level.Consequently frequency distributions of all trace elements can be considered normal.
Preliminary examination of the data by means of correlation plots could not detect any correlation between the trace elements and climatologic or hydrologic data, nor between the elements, not even in the case of the As -B couple.As generally accompanies B as a trace component in the Andean minerals so that a definite correlation could be expected to exist between the concentrations of these two elements.In spite of this, the Pearson correlation coefficient is very low (r 2 = 0.31).A similar behaviour of trace element concentrations has been observed by Helena et al. (2000).But PCA of their time-and site-dependent concentrations should provide information regarding the type and location of the still existing pollution sources, while Cu, Fe, Pb and Zn could be introduced by unauthorised sewage dumping into draining canals as well as by leaching of contaminated soil by rain, or by air borne particles.Nevertheless, PCA applied to 720 data revealed no clear associations between the contaminants nor did it show any grouping of the sampling sites.
While the area is affected by a sharp separation between dry and rainy seasons, which evidently has a strong influence on the concentrations, the splitting of data into two groups proved to be difficult.The best parameter for establishing limits between data corresponding to different seasons should be the river flow, but data were not available for all sampling points.Surface velocity values could not be correlated with the existing flow data, so it was felt that they did not represent the flow fluctuations.When concentration data of each trace element were distributed into two groups according to rainfall data for the sampling day, no statistically meaningful correlation could be obtained.This can be due to the fact that precipitation data are registered at a location 10 km to the West from the monitored river section, so that it frequently does not coincide with the actual situation in the city.Accumulated rainfall data for the 5 d (or 10 d) preceding sampling proved to be only slightly better, Spearman correlation coefficients for Cu and B with rainfall being the only ones to give higher t values than the tabulated one for a 0.995 confidence level.Plots of mean specific conductivity at the 6 sampling sites on the Arenales River vs. sampling date were finally taken into account to define time limits between seasons (Fig. 2).
Specific conductivity data distributed into two categories according to this criterion show the notoriously different behaviour of the water system according to the two seasons (Fig. 3), which could explain the impossibility to find any correlations when applying PCA to the complete set of data.In the dry seasons conductivity suddenly increases at Site 3, an indication of the existence of illegal sewage discharge.In the wet seasons, on the other hand, conductivity remains low, the sudden rise corresponding to sampling Site 7, 12 km downstream from the urban environment, a sign of inorganic, diffuse type of pollution.Concentration data distributed following this parameter produced Spearman´s correlation coefficients with t values higher than the tabulated one for 0.995 confidence level for all elements except Pb.Concentration ranges for dry and wet seasons defined according to this criterion are presented in Tables 3 and 4.

Sampling campaign No
B is the only element with mean concentration roughly following mean specific conductivity when represented vs. sampling campaign as well as vs. sampling sites, presenting maxi- 360 data for each season were normalised and then subjected to PCA.The covariance matrix for the dry seasons shows a correlation between B and As concentrations (with a coefficient of 0.51).According to the matrix for the wet seasons, correlation was only observed between Cu and Pb concentrations.For each season, the first 3 eigenvalues explained more than 70% of the system´s variance, being very similar in magnitude.
The first 3 PCs for both seasons combine all 6 variables, so none of them can be discarded.To achieve a sharper separation of the variables, as well as a better clustering of the objects, the first 3 PCs were subjected to rotation.The software 'Quimio' offers the Oblimin method for rotation, a method with the possibility of orthogonal or oblique rotation, according to the value selected for the γ parameter.With γ = 0 oblique rotations are performed by iterative calculations (constituting the Quartimin method) without the requirement of non-correlation of the rotated components.
Composition of the rotated PCs can be observed in Fig. 5. Separation of the variables is quite clear in the dry seasons, coupling As with B in the first PC and separating these two from the heavier elements.The 60 objects (sets of the concentrations of the 6 elements corresponding to a given sampling site and date), when pondered by the inverse of the square root of their communality, form definite clusters on the biplots (Fig. 6 -next page).
In dry weather, when there is no possibility of leaching or draining from diffuse contamination sources, the clustering of the sampling dates according to sampling points, with high scores on a rotated principal component, can be considered as a sign of the existence of point sources of contamination espe-cially rich in the trace elements with high loadings on the component.The origin of these contamination sources could be illegal sewage discharges as well as the existence of contaminant-containing sediments at the bottom of the river.
On the Mahalanobis biplots (Fig. 6 a and c) all the objects of Site 1 are clustered at the quarter opposite the vectors of all trace-element concentrations so that this site can be considered uncontaminated by the 6 elements.
Most of the objects corresponding to Site 2 are found along R 1 in Fig 6 (a) and (b), an indication that the site is contaminated mainly with B and As, that is to say, with boron minerals.The only possible source of contamination here is Canal A located upstream, into which a boron-processing plant used to flush its effluents before it was closed down.
Site 3 presents most of its objects arranged at negative values of R 1, so that there is no B and/or As contamination at this site.As these elements are highly soluble, and Site 2 was found to be polluted with them, a drainage entrance between the 2 sites must exist which produces their dilution.The rise of the specific conductivity at Site 3 (Fig. 3 a)) points to sewage of domiciliary origin.
Sites 4 and 5 have their objects aligned along positive values of R1 in Fig. 6 (a) and (b), and at high values of R2 in Fig. 6 (c).This means a new inflow of B and As, along with Zn, Cu and Fe.Probable sources of this contamination are Canals B and C. As the objects of Site 5 do not score higher on the rotated axes than those of Site 4, there is no evidence of raw effluents of the Industrial Park entering the river.
Objects belonging to Site 6 are scattered at high as well as at low values of R2, but they score low on R1: occasionally Cu and Zn concentrations are high at this site, but B and As are not important here as contaminants.This means that the effluents of the city sewage treatment plant most of the time dilute the polluting elements existing at Site 5, and do not contain B (or As), as can be expected from predominantly household effluents.
Most of the objects of Site 7 score low on R1, and on R2 as well, but not always on R3.This means that the river seems to have recovered from the B and As contamination, but occasionally still scores high in the quarter containing Cu and Zn.The rise of specific conductivity at this site, during both seasons (Fig. 3), indicates the concurrence of pollution of the point as well as of the diffuse type.
The loadings of the concentrations of the trace elements on the first three rotated components found for the wet seasons are different from those for the dry seasons (Fig. 5) but no clear conclusions could be derived from the biplots, the objects showing no clustering.This can be ascribed to the unfavourable relation between concentration variation and noise, due to the remarkable dilution produced by the heavy rains, and also to the correlations existing between the original components which persist on the rotated ones.Lower DLs for the elements are needed to derive conclusions from the data system.

Conclusions
PCA of trace element concentrations was used to provide answers not only to the location of pollution sources in the Arenales River, but also to their type.Accurate splitting of data into sets of wet and dry seasons, before applying this statistical method, was essential to their interpretation because of the characteristics of the river: low-level contamination, highly seasonal flow and several suspected intermittent pollution sources.While the entire set of data neither led to separation of the variables nor to any noticeable grouping of the objects, once split into two groups according to the cyclic variation of specific conductance, conclusions could be drawn from those corresponding to the dry seasons.Data belonging to the wet seasons behaved in the same way as the entire data population, confirming that the low concentration of the pollutants compared to the DLs of the analytical techniques employed caused the failure of PCA to yield meaningful results.Data of the dry seasons, on the contrary, were found to be useful to characterise the pollution of the river with trace elements.Two of the variables, B and As concentrations, were coupled on the first rotated component.This association allowed to ascertain their common origin as boron The biplots also confirmed the previously unsuspected existence of two other point-type sources of contamination contributing mainly Cu, Fe and Zn to the river water.The distribution of the objects on the biplots also led to the conclusions that the municipal sewage treatment plant effluent does not affect the quality of the river water.No conclusive evidence as to the recovery of the river from urban pollution at the last sampling site could be derived from the biplots.

aknowledgements
Funding for this research was provided by the Consejo de Investigación de la Universidad Nacional de Salta (Argentina).The authors want to thank Dr José Ávila Blas for controlling the statistical treatment of data.

Figure 1
Figure 1Map of the sampling sites on the Arenales River and the city of Salta.Numbers increase according to the river flow direction

Figure 3
Figure 3 Variation of specific conductivity along the downstream sampling line with data separated into two seasons: a) dry seasons b) wet seasons.Box sizes give the average values of two consecutive seasons of the same type and whiskers represent the general average with the average standard deviation of the data in the two seasons.
Available on website http://www.wrc.org.zaISSN 0378-4738 = Water SA Vol.33 No. 4 July 2007 ISSN 1816-7950 = Water SA (on-line) 483 mum values at sites 2, 4 and 5. Mean concentration of As follows the behaviour of B concentration, but it shows a sudden rise at the end of the 2003 dry season which spoils their simple lineal correlation (Fig. 4).

Figure 4 Figure 5
Figure 4Mean B and As concentrations (in mg/ℓ and µg/ℓ respectively), and mean specific conductivity (in µS/cm) vs. sampling campaign (a) and sampling site (b) Figure 6Mahalanobis biplot representation of the scores of the objects corresponding to the dry seasons on the rotated principal components.Loadings of trace element concentrations on the components are represented by vectors.Groups of sampling sites are highlighted in black frames.

TaBLE 2 Overall mean concentration of trace elements in arenales River and maximum acceptable values recommended by argentine national environmental law Nº 24 051
Available on website http://www.wrc.org.zaISSN 0378-4738 = Water SA Vol.33 No. 4 July 2007 ISSN 1816-7950 = Water SA (on-line)