ANN-based sediment yield models for Vamsadhara

Most universally accepted feed-forward error back-propagation artificial neural network models, supported by batch- and pattern-learning, daily, weekly, ten-daily and monthly sediment yield were developed for the Vamsadhara River basin of India. The fast gradient descent optimisation technique improved with variable learning rate (α) and momentum term (β) was used for optimisation. In the process of optimisation and updating of weights, criteria adopted to terminate the process of learning was selected as a per-decided high number of iteration and the other is the generalisation of model through crossvalidation. In all cases of model formulation, the data were normalised with the maximum value of the variable of the series individually. The pattern-learned models were found superior to batch-learned models. High numbers of iterations adopted for model development were found to reduce the value of the objective function, but with model’s over-learning and that is reflected? Unclear what is meant by an increase and decrease of the performance in calibration and cross-validation, respectively. The generalised pattern- learned models for different time scales were compared with linear transfer function models and it was found that the pattern-learned models developed with generalisation through cross-validation were superior in general, except weekly for the study area.


Introduction
Since the 1930s, numerous linear and non-linear hydrological models have been developed to simulate and forecast various hydrological processes and variables.The suggested models have continuously been improved by introducing new tools to effectively simulate the processes.The developed models in the area of hydrology can broadly be classified into regression models, stochastic models, conceptual or parametric models and system models.
Regression models are either regression or correlation based, and correlate the input(s) and output(s) of a process in the form of linear or non-linear relationship, which estimate the constants of regression models.Few universally acceptable hydrological models that have been developed in this category are the USLE, MUSLE and Elwell models.The stochastic models normally extract the statistical properties of time series and propagate these properties during prediction.Such models normally require a long time series and their extrapolation properties are poor.Some of the widely used stochastic models in hydrological studies are auto-regressive, auto-regressive moving average, auto-regressive integrated moving average, seasonal auto-regressive integrated moving average, etc.The conceptual models are designed to approximate within their structure the general internal physical subprocesses.The models usually incorporate simplified forms of physical laws or a series of physical laws to represent the transformation of input to output.These laws are generally linear or non-linear, time variant or time invariant, lumped or distributed, casual or non-casual and dynamic or memory less.Among the most widely used conceptual models in the field of hydrology are the Sacramento Soil Moisture Accounting (SAS-SMA) model of the U.S. National Weather Service, HEC of U.S. Army Corps of Engineers and the Stanford Watershed Model and System Hydrologic European (SHE) model.
Artificial neural network (ANN) is a new soft computing technique composed of densely interconnected processing nodes which has the ability to extract and store the information from the few patterns (data) in training through learning.ANN architecture parallels in processing with that designed to process the information in neuro-computing (Vemuri, 1992).The model is easy to develop; yields satisfactory results when applied to complex systems poorly defined or implicitly understood; and more tolerant to variable, incomplete or ambiguous input data.Hydrologic applications of ANN include the modelling of daily rainfall-runoff-sediment yield process, snow-rainfall process, assessment of stream's ecological and hydrological responses to climate change, rainfall-runoff forecasting, ground water quality prediction and ground water remediation.ASCE (2000a;b), Jagadeesh (2000), Tokar (2000), Rajurkar (2002) among others provided a good overview of the ANN application to rainfall-runoff simulation and prediction.Imrie (2000) improved the generalisation by adding a guidance system to the cascade correlation learning architecture and extrapolation properties using an activation function.Wilby (2003) was able to interpret the internal behaviour of an ANN-based rainfall-runoff model.To this end, he deleted all the nodes other than the hidden nodes and compared with the state variables and internal fluxes.Danh (1999) and Elshorbagy (2000) proposed feed-forward error back-propagation artificial neural network (BPANN) models for runoff forecasting using fixed stopping criterion and independent variables, respectively, and compared them for performance with the available conceptual models.The works of Thirumalaiah (2000), Xu (2002), Birikundavyi (2002), Shivakumar (2002), Cigizoglu (2003), and Xiong (2002) among others are notable for real time forecasting of runoff.Other ANN applications include derivation of unit hydrograph (Lange, 1998).Only a few studies (Tayfur, 2002;Nagy, 2002; Cigizoglu Available on website http://www.wrc.org.za2004) focused on ANN-event based sediment yield modelling and sediment concentration.
The fast gradient descent optimisation is probably the most widely used supervised algorithm applied in multilayer feedforward artificial neural network to optimise the error function and therefore it is adopted in the present study.In model building, the convergence of error is normally linked with a pre-decided tolerance value, such as a minimum error, minimum error gradient or a high level of iteration.The use of a pre-decided tolerance value to stop convergence may under-or over-learn the model (Fu, 1996).The cross-validation which improves the generalisation of network and obviates the under-or over-learning, however, requires intensive computations and demands large data set and computer memory.Therefore, it is in order to develop the generalised batch-and patternlearned BPANN-based sediment yield models considering high level of iteration and cross-validation as criteria to terminate the process of learning, which forms the objective of this paper.These models are simultaneously subjected to network pruning to achieve parsimony, and compared with linear transfer function (LTF) models using daily, weekly, ten-daily, and monthly data of Vamsadhara River basin in India.

Linear transfer function model
The linear transfer functions (LTFs) are the time-dependent regression models with simple mathematics, requiring minimum input, little computation, and yielding the results of desired accuracy (Johnston, 1972).For a lumped linear system, the two or more timedependent observations are linked as: where: S t is a dependent observation R t and Q t are independent observations p, q and r are the time responses a j, b j and c j are parameters associated with the j th variable.
The least square method can be used to solve the set of t linear equations for parameters.Representing [a 1 , a 2 ,…, a p , b 1 , b 2 ,…,b q, c 1 , c 2 ,…,c r ] T as H ˆ, the variance of the time response {var.( H ˆ)} is given as (Johnston, 1972): where: A is the input matrix and σ 2 is the variance of error term (e t ) expressed as: The Var ( H ˆ) helps avoid the parsimonious selection of time responses by comparing the respective standard error associated with each parameter as follows: For parsimonious selection of time responses, the initially selected time response value is increased one by one and if the resulting H îs less than the respective standard error { ) H Var( }, the parameter value is decreased.The obtained time response value represents the number of successive past events of the variable affecting output (Johnston, 1972).

Artificial neural network
In a multilayer back propagation artificial neural network (BPANN), the nodes of input layer receive the input data, process it and pass the output to the nodes of subsequent hidden layer(s), and from last hidden layer to the output layer.The structure of the system with nodes in input, hidden and output layers can be represented by j, i, and k, where j varies from 1 to j, i from 1 to i, and k from 1 to k.A particular structure for j=2, i=4 and k=1 is shown in Fig. 1.

C o n n e c t i n g w e i g h t s
Input Output

Nodes in Nodes in
Nodes in input layer hidden layer output layer j=1 to j i=1 to i k=1 to k

Figure 1 Structure (2, 4,1) and notations of a multilayer BPANN
In a feed-forward BPANN scheme, nodes of the input layer receive the normalised data set (input).The weighted sum corresponding to each node of the next layer is calculated and passed to the next layer usually through a sigmoid activation function.The error (E) calculated at the output is propagated back to hidden layer(s) and finally to input layer by updating the weights of interconnection.The error (E) is defined as: (5) where: d (k) is the observed output at the k th node of the output layer O (k) is the estimated output at the k th node of the output layer The updating of weights in all iterations is carried out using the following equation: The speed of convergence is increased normally by introducing a momentum term β and the effect of previous weight change as: The change in weights (∆W) in the direction of negative gradient is given by: where: α is the learning rate such that 0 < α < 1 and governs the rate of change of weights.
The model parsimony can be achieved through an A information criterion (AIC) (Akaike, 1974), B information criterion (BIC) (Rissanen, 1978) or by the pruning of network (Karnin, 1990).AIC and BIC criteria utilises root mean square error (RMSE) statistic which is penalised for having more numbers of free parameters for deciding the number of free parameters (Xu, 2002).In pruning, Karnin (1990) suggested to use the sensitivity of error Se (ij) with respect to weight W (ij) for elimination of the respective weight without excessive calculations.Se (ij) is defined as: Finally the sensitivity of the error with respect to weight reduces to: where: i and f indicate the initial and final values of weights Since the weight update is available for each iteration during learning, the determination of the summation of squared of change in weight ∆W (ij) 2 to estimate the sensitivity of error function is the only extra computation required.A low sensitivity suggests the respective weight to be in sensitive and pruning of the corresponding node?unclear.Suggest: "insensitive and pruning of the corresponding node is recommended".The generalisation of model can be checked using the available statistical evaluation criteria: root mean square error (RMSE), correlation coefficient (CC), and coefficient of efficiency (CE) (Nash and Sutcliffe, 1970).Here, it is noted that in cross-validation, the model is trained on training data set and for every iteration the model is simultaneously verified on another data set through statistical evaluation criteria.The training is continued until the performance evaluation criteria show improvement in both training and verification.The method monitors the generalised performance and stops the process of learning when there is no more improvement in training and its performance in first verification.Since in this method the performance of the developed model is checked in all iterations, the level of accuracy is not fixed in the beginning of model formulation.

Study area and data
The selected Vamsadhara River basin up to Kashinagar (area = 7 820 km 2 ) (Fig. 2) is located between Mahanadi and Godavari River basins of India.The basin is narrow and highly undulated.The daily rainfall data (mm), runoff (m 3 /s), and suspended sediment yield (kg/s) of the active period (June 1 to October 31) for years 1984 to 89 and 1992 to 95 were available.The weighted average rainfall is computed with the rainfall values of 1 to 6 rain-gauge stations (Fig. 2) with Theissen weights as 0.2640, 0.1835, 0.2696, 0.1096, 0.1509 and 0.0224, respectively.
The collected daily rainfall (mm), runoff (m 3 /s) and sediment yield (kg/s) of the active period for all the years were subjected to pre-analysis and formulation of the database for different time units, i.e. weekly, ten-daily and monthly.The yearly weighted average rainfall, runoff, sediment yield, runoff-rainfall ratios and the sediment-runoff ratios (sediment concentration) for the active monsoon period is shown in Table 1.
It can be seen that the runoff-rainfall ratio in the first four years varied in between 0.10 to 0.23 and for subsequent six years it varied in between 0.20 to 0.43.It indicates that the catchment behaviour to infiltration and other losses has changed and runoff corresponding to rainfall has increased in years 1988 to 1995.The weighted average rainfall and runoff indicated no specific trend and correlation between them.
The sediment concentration was found to be decreasing with time (Table 1).The change in sediment concentration indicates that the catchment is improving with time in respect of soil conservation.The main reason for this could be the regular Jhum cultivation practices being adopted by the tribes of these areas.On the other hand, it also reflected that the data are not exactly from one = - Available on website http://www.wrc.org.zahomogenous population.
Since the sediment concentration is found to be fast decreasing over the years, a sediment model developed with the data of 1984, 1985, 1986, 1987 and verified on years 1988, 1989, 1992, 1993 and 1994 to 1995 may therefore over-estimate for these years.
The daily data of the active period of monsoon (June 1 st to October 31 st ) of water years were used to determine weekly, tendaily and monthly data.The weekly data consisted of weekly summation starting from June 1 st to October 31 st .The last six days of October were not used in the weekly data.The ten-daily data were the summation of days 1 to 10, 11 to 20 and 21 to 30 or 29/31 days of each month.The monthly data were the summation of data of all the days of the month.Thus the data for four different time bases (daily, weekly, ten-daily and monthly) comprise 153 d or 21 weeks or 15 ten-dailies or 5 months and were used for the development of respective simulation and forecasting models.

Model development
The development of daily, ten-daily, and monthly rainfall-runoffsediment yield (RQS) models based on the above described LTF and multilayer BPANN approaches utilises the above data of 1984 to 87, and their generalisation/verification on the data of 1988 to 89 and 1992 to 95, as follows.

LTF models
Linear transfer function (LTF) models were developed with an initial selection of time response for the input variables as twice time units.The response of the respective variable was increased by one, if estimated H ˆwas greater than the respective value of the standard error of the variance of H ˆ as { ) H Var( }, otherwise the time Here, it can be observed that the sediment yield from the daily model depends on its successive past sediment yield value, whereas for weekly, ten-daily and monthly models, it is independent of their past values.The performance evaluation results of all the developed simulation modes are given in Table 2.It can be seen that the model performance in calibration generally improves with an increase in the time scale, but it does not hold in verification.The performance in second verification period (1992 to 95) is better than the first verification period (1988 to 89).

ANN models
In BPANN the same number of input variables was taken as those obtained while developing LTF models.All the input-output pairs of data set were first normalised considering the maximum value of the series and thus reducing the individual variables in the range of 0 to 1 to avoid any saturation effect that may arise from the use of sigmoid activation function.In both batch-and pattern-learning, the initial values of interconnecting weight were randomly selected between -0.5 and +0.5 (Dawson and Wilby, 1998) against the recommended ranges of (-1.0, +1.0) or (-0.5, +0.5) or (-0.1, +0.1) (Lorrai and Sechi, 1995;Dawson and Wilby, 1998).The sigmoid activation function was considered in model development.The values for both learning rate (α) and momentum term (β) were initially considered as 0.5, which, however, decreased in successive iterations.All the interconnecting weights were updated using the error of input-output pairs.In batch-learning, these weights were adjusted only after processing all data sets for error, and the data set indicating the highest error was used for leaning?and updating of weights.This type of learning is fast, but requires more computation for it is governed by the data set indicating the highest error (Minns and Hall, 1996).The pattern-learning is governed by the error of each data set; the interconnecting weights were simultaneously adjusted.
The processing in pattern-learning was slow for it continuously improved the weights for each data set.
The number of input nodes in the input layer was taken equal to the number of input variables.Since no guideline is yet available on the number of hidden nodes in the hidden layer(s) (Vemuri, 1992), these were initially taken equal to twice of input nodes (Hipel et al., 1994), and increased one at a time considering the improved generalisation and the above pruning criteria.However, corresponding to one output, only one node was taken in the output layer.Thus, a three-layer network structure with varying numbers of hidden nodes in the hidden layer was tried, and the finally selected ANN structure along with the performance of developed daily, weekly, ten-daily, and monthly models for both pattern-and batch-learning is listed in Table 3.
Table 3 shows a comparative performance of pattern and batchlearned daily, weekly, ten-daily, and monthly BPANN models for generalisation: • With maximum number of iterations restricted to 5 000 • Using cross-validation (by least errors both in calibration and cross validation).
Apparently the batch-learned models for both the cases of generalisation yield a relatively low value of the objective function compared to the similar pattern-learned models, but perform poorly in both cross-validation and verification.It suggests avoiding the use of fast optimisation methods in model generalisation with such data.Furthermore, in calibration, the performance of pattern-learned BPANN models generalised with cross-validation is generally superior to those generalised using high level of iteration.It suggests preference of best-fit models to ANN models requiring high numbers of iterations, for it over-learns and memorises the data, and performs poorly in cross-validation and simultaneously in verification (French et al., 1992;Hus et al., 1995).The generalised patternlearned sediment yield models are superior in both calibration and verification based on RMSE, CC and CE criteria (Table 3) and, therefore, are preferred to the other models based on the batchlearning and LTF concept.Similar to the model based on the LTF concept, the performance of the best pattern-learned BPANN model also generally improves with increase in time scale in calibration only.

Conclusion
Based on the selected performance evaluation criteria, viz., RMSE, CC, and CE, the developed BPANN and LTF sediment yield models for the Vamsadhara River basin exhibit an improvement with increase in time scale in model calibration.The pattern-learned BPANN models perform better than batch-learned models irrespective of their high convergence.On the other hand, the pattern-learned BPANN models generalised with cross-validation perform better than those generalised with a high level of iteration and LTF models.This study suggests that a fast and high convergence is not essential in generalised model development.

Figure 2
Figure 2 Index map of VamsadharaRiver basin showing hydrological details

TABLE 2 Performance evaluation of LTF sediment yield models for Vamsadhara River basin for different time scales
ISSN 0378-4738 = Water SA Vol. 31 No. 1 January 2005 9 9 Available on website http://www.

wrc.org.za response
was decreased.The model was re-developed after each change in time response and finally the chosen LTF models satisfying the criterion were reported through Eqs.(11) to (14).