Analysis and Modelling of Flood Risk Assessment Using Information Diffusion and Artificial Neural Network

Floods are a serious hazard to life and property. The traditional probability statistical method is acceptable in analysing the flood risk but requires a large sample size of hydrological data. This paper puts forward a composite method based on artificial neural network (ANN) and information diffusion method (IDM) for flood analysis. Information diffusion theory helps to extract as much useful information as possible from the sample and thus improves the accuracy of system recognition. Meanwhile, an artificial neural network model, back-propagation (BP) neural network, is used to map the multi-dimensional space of a disaster situation to a one-dimensional disaster space and to enable resolution of the grade of flood disaster loss. These techniques all contribute to a reasonable prediction of natural disaster risk. As an example, application of the method is verified in a flood risk analysis in China, and the risks of different flood grades are determined. Our model yielded very good results and suggests that the methodology is effective and practical, with the potentiality to be used to forecast flood risk for use in flood risk management. It is also hoped that by conducting such analyses lessons can be learned so that the impact of natural disasters such as floods can be mitigated in the future. INTRODUCTION Natural disasters are increasing alarmingly worldwide. Flooding is a common natural disaster which often causes loss of property and human life. Recent flooding disasters have shown the vulnerability of the so-called developed and developing countries to such events. In China, flood disasters occur frequently, and about two-thirds of the country's area is facing the threat of different types and degrees of floods, as a result of both natural and unnatural influences, such as social and economic factors (Chen, 2010). Natural disasters present a great challenge to society today. Flood risk assessment for an area is important for flood disaster managers to be able to implement a compensation and disaster-reduction plan. As severe floods occur frequently, flood risk assessment and management play an important role in guiding governments in making timely and effective decisions for disaster rescue and relief. Risk management for the operation of an existing flood protection system is the sum of actions for a rational approach to flood disaster mitigation. Its purpose is the control of flood disasters, in the sense of being prepared for a flood, and acting to minimise its impact. It includes the process of risk …


INTRODUCTION
Natural disasters are increasing alarmingly worldwide.Flooding is a common natural disaster which often causes loss of property and human life.Recent flooding disasters have shown the vulnerability of the so-called developed and developing countries to such events.In China, flood disasters occur frequently, and about two-thirds of the country's area is facing the threat of different types and degrees of floods, as a result of both natural and unnatural influences, such as social and economic factors (Chen, 2010).Natural disasters present a great challenge to society today.Flood risk assessment for an area is important for flood disaster managers to be able to implement a compensation and disaster-reduction plan.As severe floods occur frequently, flood risk assessment and management play an important role in guiding governments in making timely and effective decisions for disaster rescue and relief.
Risk management for the operation of an existing flood protection system is the sum of actions for a rational approach to flood disaster mitigation.Its purpose is the control of flood disasters, in the sense of being prepared for a flood, and acting to minimise its impact.It includes the process of risk analysis, which forms the basis for decisions on maintaining and improving the system.
Risk analysis is a challenging task at the present.Assessing flood risk is difficult because of the lack of objective measures of acceptable risk, scarcity of data, and an abundance of unknown probability distributions.Flood risk analysis methods progressed from the direct integral method, Monte Carlo method, and mean first-order-second-moment method, to advanced first-order-second-moment method, second-order-method and JC method (design point in first-order second-moment method).Theories and methods of flood risk analysis have been established based on the work of several authors: e.g.Ang and Tang (1984), Ashkar and Rousselle (1981), Diaz-Granados et al. (1984), Kuczera (1982), Stedinger and Taylor (1982), Todorovic and Rousselle (1971), Todorovic and Zelenhasic (1970), Wood and Rodríguez-Iturbe (1975).Recently, many risk analysis approaches have been based on using linguistic assessments instead of numerical values.Using fuzzy sets theory (Zadeh 1965), data may be defined in vague, linguistic terms, such as low probability, serious impact, or high risk.These terms cannot be defined meaningfully with a precise single value, but fuzzy sets theory provides the means by which these terms may be formally defined in mathematical logic.
In traditional flood risk analysis, the probability statistics method is usually used to estimate hydrological variables' exceedance probability, because it is based on a mature basic theory and has easy application.However, problems exist in the feasibility and reliability of the method's outcomes, without considering its fuzzy uncertainty.Especially in the case of small samples, results based on the classical statistical methods are usually unreliable.It is also rather difficult to collect long sequence flood data and the sample is usually small.Information diffusion theory can be applied enabling as much useful underlying data as possible to be extracted from the sample and thus improving the accuracy of system recognition (Huang, 2002;Palm, 2007).
Information diffusion is a fuzzy mathematic set-value method for samples, which considers optimising the use of fuzzy information of samples in order to offset the information deficiency.In order to map the multi-dimensional space of a disaster situation onto a one-dimensional disaster space 644 nonlinearly, and to test the grade criteria for flood disaster loss, resolving the non-uniformity problem of evaluating results expressed as disaster loss indexes, an artificial neural network model, back-propagation (BP) neural network, is suggested for evaluating the degree of flood disaster, where the disaster's degree of loss is a continuous real number.
In this study we propose a composite method, based on information diffusion and artificial neural networks, to establish a flood risk assessment model that can be applied with a small number of measured samples.This method is then successfully applied to flood risk analysis in China.The principle of the modelling framework is briefly described in the next section.This is a new attempt at applying information diffusion theory and artificial neural networks to flood risk analysis.Computations based on this flood risk model can yield an estimated flood damage value that is relatively accurate.The aforementioned model exhibits fairly stable results, even when using a small set of sample data.This also indicates that information diffusion technology is highly capable of extracting useful information and can therefore improve system recognition accuracy.This method can also be easily applied and understood, as illustrated by the example given.

MATERIALS AND METHODS
The essence of flood disaster risk analysis is to estimate the probability distribution of an index.Because of the incomplete data that is usually available, the application of traditional statistical methods cannot guarantee high precision in the results; fuzzy mathematical treatments are therefore necessary for a small sample size.This paper uses artificial neural network techniques and obtains continuous degree index values for the samples; the degree values for observed samples are then turned into fuzzy sets by information diffusion method, to finally produce the risk values.This method is tested in a case study, showing that it is superior to traditional statistical models, and offers a means improve on the results of traditional estimation.

Basis of artificial neural network methods
Artificial neural networks (ANN) are massive parallel interconnected networks of simple (usually adaptive) nodes which are intended to interact with objects of the real world in the same way that biological nervous systems do (Simon, 2009).ANN was proposed based on modern biology research concerning human brain tissue, and can be used to simulate neural activity in the human brain (Markopoulos et al., 2008).ANN has the topological structures of information processing, distributed in parallel.The mapping of input and output estimation responses is obtained via combinations of non-linear functions (Srivaree-Ratana et al., 2002).
In terms of their structures, neural networks can be divided into two types: feedforward networks and recurrent networks.In a feedforward network, the neurons are generally grouped into layers.Signals flow from the input layer through to the output layer via unidirectional connections, the neurons being connected from one layer to the next, but not within the same layer.The multi-layer perceptron (MLP) is perhaps the best known type of feedforward network.For the typical multi-layer perceptron of the feed-forward mode neural network there is an input layer, output layer, and hidden layer.Neurons in the input layer only act as buffers for distributing the input signals x j to neurons in the hidden layer.Each neuron j in the hidden layer sums up its input signals x j after weighting them with the strengths of the respective connections w ji from the input layer and computes its output y j as a function ƒ of the sum: (1) in which ƒ can be a simple threshold function or sigmoidal, hyperbolic tangent or radial basis function.The output of neurons in the output layer is computed similarly.The back-propagation (BP) algorithm, a gradient descent algorithm, is the most commonly adopted MLP training algorithm.It gives the change ∆w ji, , the weight of a connection between neurons i and j, as follows: (2) where: η is a parameter called the learning rate and δ j is a factor depending on whether neuron j is an output neuron or a hidden neuron.
For output neurons: (3) and for hidden neurons: (4) In Eq. ( 3), net j is the total weighted sum of input signals to neuron j and y j (f ) is the target output for neuron j.The neural cell of each layer only affects the status of the next neural cell.If the expected output signals cannot be obtained in the output layer, the weight values of each layer of the neural cells must be modified.Erroneous output signals will be backward from the source.Finally, the signal error will arrive in certain areas with repeated propagation.After the neural networks' training procedure is complete, the forecast information can be analysed with weight values and thresholds.

Information diffusion method
Information diffusion is a fuzzy mathematic set-value method for samples, which optimises the use of fuzzy information of samples in order to offset the information deficiency (Huang and Shi, 2002).The method can turn an observed sample into a fuzzy set, that is, turn a single point sample into a set-value sample.The simplest model of information diffusion is the normal diffusion model.

Information diffusion:
Let X be a set of samples, and V be a subset of the universe, µ: is called a kind of information diffusion of X on V, and satisfies 3 conditions as follows (Huang and Shi, 2002): • It is decreasing.

•
; let v* be observed value of x, which satisfies .
• µ(x,v) is conservative.If and only if , its integral value on the universe is 1: .
In particular, if the random variable's domain is discrete, suppose it is U = {u 1 , u 2 , …, u m }, the conservation condition is .

645
Let X = {x 1 , x 2 , …, x m } be a sample, and U = {u 1 , u 2 , …, u m } be the discrete universe of X. x i and u j are called the sample point and the monitoring point, respectively.we diffuse the information carried by x i to u j at gain ƒ i (u j ) by using the normal information diffusion shown in Eq. ( 5).
(5) where: h is called the normal diffusion coefficient, calculated by Eq. ( 6) (Huang and Shi, 2002;Huang, 2005). ( where Let ( 7) We obtain a normalised information distribution on U determined by x i , shown in Eq. ( 8). (8) For each monitoring point u j , summing all normalised information, we obtain the information gain at u j , which came from the given sample X.The information gain is shown in Eq. ( 9). ( 9) q(u j ) means that, with the information diffusion technique we infer that there are q(u j ) (generally is not an integer) sample points in terms of statistic averaging at the monitoring point u j .
Obviously q(u j ) is not usually a positive integer, but is certainly a number not less than zero.It is assumed that: (10) where: Q is the sum of the sample size of all q(u j ).
Theoretically, Q = n, but due to the numerical calculation error, there is a slight difference between Q and n.Therefore, we can employ Eq. ( 11) to estimate the frequency value of a sample falling at u j : (11) The frequency value can be taken as the estimation value of its probability.The probability value of transcending u j is calculated by: (12) where: P(u j ) is the required risk estimation value.

Flood disaster risk assessment
According to the above theory, we can calculate the probabilities of each degree of flood disaster in China, based on the historical data from 1950 to 2009 collected by the Ministry of Water Resources of the People's Republic of China (see Table 1).

646
We select the set of 60 records as the large sample, and then 30 records are randomly chosen to form a small sample in order to provide a comparison of the results obtained using the method for a small sample.Damage area, inundated area, dead population, and collapsed houses were chosen as the disaster indicator for flood risk analysis.By frequency analysis the disaster level is classified into 4 levels: small, medium, large and extreme (see Table 2).
In order to map the multi-dimensional space of the disaster situation onto a one-dimensional disaster situation, a relationship between the disaster degree and the disaster indexes is needed.Because it is impossible to describe the relationship using a related function, we adopt the 'simulation' and 'memory' of the neural networks in flood degree evaluation.This is because the advantages of neural networks can be used to simulate and record the relationship of the input and output variables in the complex 'function', through training and learning and without the use of any mathematical models.
We used damage area, inundated area, dead population, and collapsed houses as input variables and disaster grading value as an output variable, and then set the nodes of the input as 4 and of the output layers as 1.This follows from Kolmogorov's theorem (Hecht-Nielsen, 1987) that the number of nodes in the hidden layer is at least 2n +1, where n is the number of nodes in the input layer.Since n = 4, the number of nodes in the hidden layer is at least 9. Considering accuracy, we determined that the number of nodes in the hidden layer is 10.Thus, we obtained the topology structure (4, 10, and 1) of the neural networks for flood degree forecasting.
The four flood grades are small, medium, large and extreme flood, with degree values in the interval [0,1], [1,2], [2,3], [3,4]; we use the disaster grading standard boundary values (Table 2) as 5 four-dimensional training samples for training and learning in the BP neural network.Meanwhile initial parameters of BP model weights and biases are randomly assigned before the commencement of training.With 100 000 cycles of training and learning in the training samples, the global error of the networks is set E = 10-6.Learning rate and impulse parameter of the network are changed adaptively, and function trainlm is used for fast training.
The calculated output values are compared with the expected values where the mean square error is 5.498909 x 10 -8 , indicating a good fit.Thus the BP neural network has completed the training procedure.So we can use the BP network to forecast disaster degrees of all the samples with the weighting coefficients and the thresholds modified.The flood degree estimations are listed in Table 3.
Based on the BP neural network, the disaster degree values of the 60 samples can be calculated (see Table 3), that is the sample points set X = {x 1 , x 2 , …, x n }.We selected the set of 60 records as the large sample whose degree value ranges from 0 to 4. Then 30 records are randomly chosen to form a small sample to compare with the large sample.
The universe of discourse, namely the monitoring points set, is taken as U = {u 1 , u 2 , u 3 , …, u 41 } = {0, 0.1, 0.2, … , 4.0}.The normalised information distribution of each x i , that is, , can be obtained according to Eqs. ( 5)-( 8), then, based on Eqs. ( 9)-( 12), disaster risk estimation, namely probability  647 risk value, is calculated.The relationship between the recurrence interval N (years) and the probability p can be expressed as N = 1/p.The exceedance probability curve of flood disaster degree value is shown as Fig. 1, with a comparison to that obtained by the traditional statistical method.

RESULTS AND DISCUSSION
By information diffusion method (IDM), we obtain the exceedance probabilities on the different disaster degree values shown in Fig. 1.In Fig. 1, the results reflect that the risk of the flood decreases smoothly with disaster degree value, using the BP-IDM model.The curve of the BP-IDM model is smoother and more accurate than that obtained using the traditional statistical method.
Thirty records were randomly chosen to form a small sample, and analysed in the same way for comparison with results from the large sample.These results are compared in Figs. 2 and 3.
Figure 2 provides a comparison of the two curves for estimated risk using the BP-IDM model with the small sample and large sample.The two curves match well, which indicates that the result is largely unchanged when the sample size changes; the method results are thus stable and not greatly affected by the size of the sample.If the analysis results for a very large sample are used as the standard, the BP-IDM method results are considered to be closer to the standard than the results of the statistical method, as proven by some experiments (Huang and Shi, 2002).
In Fig. 3, we compare two curves for the estimated risk with a small sample and large sample by frequency statistics.The mean error between the results for a large vs. small sample by frequency statistics is 0.0428, which is larger than that obtained for the BP-IDM model.
Figures 2 and 3 indicate that the results for the small sample analysed by the BP-IDM model are satisfactory.The results reflect the fact that the risk of flood decreases smoothly with the increase in disaster degree value, and that the BP-IDM model gives more satisfactory results than the statistical method for practical problems.The results obtained with the BP-IDM model are closer to the standard.Table 4 presents a comparison of the mean errors between the results with large vs. small samples using the BP-IDM model and traditional statistics.Table 4 also shows that the mean error given by the BP-IDM model is much smaller than that given by the statistical method.

Flood disaster risk evaluation for China
The following 4 categories of flood degree are used(Chen, 2009): • If 1.0 ≤H≤ 1.5, then flood degree is small (1st grade).
The result in Fig. 2 also illustrates the risk estimation, i.e., the exceedance probability of the disaster degree value.From this information, we know the risk estimation is 0.0356 when the

648
We hope that further technological developments in flood control and many new effective methods of flood risk analysis can be used to enhance prediction accuracy.By conducting such analyses, lessons can be learned so that the impact of natural disasters, such as floods in China, can be prevented or mitigated in the future.disaster indicator is 3.5.In other words, floods exceeding the 3.5 degree value (extreme floods) occur every 28.0899 years.Similarly, the probability of floods exceeding 2.5 degrees (large floods) is 0.2011, which means that floods exceeding that intensity occur every 4.9727 years.These findings indicate the serious situation relating to floods in China.The frequency and recurrence interval of the floods of the four grades are shown in Table 5.

Limitations
A study of some simulation experiments by Huang reveals that the superiority of the information diffusion method is dependent on whether we are blind to the population and whether the size of a given sample is small (Huang and Shi, 2002).In the experiments, the given sample is considered fuzzy due to its small size, so some benefits can be obtained by information diffusion method.The work efficiency of the information diffusion method is about 35% higher than that of the histogram estimate.That is, if no knowledge is available about the population from which the given sample is drawn, and if the sample size is small, we have to obtain more observations, adding about 35%, to guarantee that the estimation is as good as the one given by the fuzzy method.
However, if we have a lot of knowledge about the population in order to confirm an assumption, the statistical object with respect to a given sample is clearer.So if the size of a given sample is large, there is an abundance of statistical information in the sample.In this case, it is unnecessary to replace the statistics with the information diffusion method as little benefit can be obtained from this.

CONCLUSIONS
Disaster risk analysis is a complex multi-criteria problem crucial to the success of strategic decision making in disasters.In China floods occur frequently and cause significant property losses and casualties.Flood risk analysis of an area is important for flood disaster managers to be able to implement a compensation and disaster-reduction plan.But results from use of traditional statistics for flood risk analysis are frequently inaccurate, especially in the case of small samples.In the present study, a comprehensive fuzzy BP-IDM method for flood disaster risk assessment is developed.This method provides an enhanced implementation of the information diffusion process which better corresponds to the actual situation.
In fact, disaster risk, as a natural or societal phenomenon, is neither precise nor certain.In the current paper, we use a fuzzy method of flood risk assessment based on BP neural network and information diffusion technique to improve probability estimation, and test this method using an example.The proposed method can be generalised as an integration of techniques and has been tested as stable and reliable.In view of the theoretic system of flood risk assessment developed thus far and the fact that observed time-series of flood disasters are quite short or even unavailable, the method adopted in the paper is indisputably an effective and practical method.

Figure 3 Figure 1
Figure 3Comparison of the risks by traditional statistics with small sample and large sample

TABLE 5 Flood disaster risk evaluation for China Disaster level Small flood Medium flood Large flood Extreme flood
Exceedance probability risk 0.7258 0.5955 0.2011 0.0356 Recurrence interval (years) 1.3778 1.6793 4.9727 28.0899