Application of geographic weighted regression to establish flood-damage functions reflecting spatial variation

Flood damage functions are necessary to ensure comprehensive flood-risk management. This study attempts to establish a residential flood-damage function through interviewing the residents living in the region where flood disasters occur frequently. Keelung River basin, near Taipei Metropolitan in Taiwan was selected as study area. Flood damages are related to the flood depths, which are the most commonly considered factor in previously published work. Ordinary least squares (OLS) regression was used to construct the flood-damage function at the beginning. Analytical results indicate that flood depth is the significant variable, but the spatial pattern of the residuals shows that residuals exhibit spatial autocorrelation. The Geographically Weighted Regression (GWR) Model was then applied to modify the traditional regression model, which cannot capture spatial variations, and to reduce the problem of spatial autocorrelation. The R-square value was found to increase from 0.15 to 0.24, and the spatial autocorrelation in the residuals was no longer evident. A modified OLS model with a dummy variable to capture the spatial autocorrelation pattern was also proposed for future applications. In conclusion, the residential flood damage is determined by flood depth and zone, and the GWR model not only captures the spatial variations of the affecting factors, but also helps to discover the independent variable to modify the traditional regression model.


Introduction
Floods are major disasters worldwide that causes serious damage to agriculture, fisheries, housing and infrastructure and impact severely on socio-economic activities.Risk management plays a very important role in mitigating these impacts of flood disasters.A complete flood-risk management and mitigation framework comprises a hydrological module for channel discharge calculation, an economic module for damage estimation, and a risk analysis process (Grigg, 1985).Studies on hydrology and hydraulics have received far more attention than those on flood-damage assessments (Chang, 2000).This study focuses on establishing residential flood-damage functions for flood-loss estimation that was considered to be one of the most important aspects in regional flood-risk management (Grigg, 1985).
Flood-damage functions are traditionally estimated by an empirical flood depth-damage curve (Smith, 1994).These curves can be constructed through damage investigations after the disaster (FIA, 1970;Grigg and Helweg, 1975;Smith, 1994;Lekuthai and Vongvisessomjai, 2001;Su et al., 2005;Thieken et al., 2005), or by synthesis (Smith, 1994;Chang, 2000;Chang and Su, 2001;Kang, Su, and Chang, 2005).Although these two methods are different in the establishment of the curve, they both assume that the flood depth is the only factor in the flood-damage function.Nevertheless, the flood depth may not be sufficient for a household flooddamage function.McBean et al. (1988) pointed out that there were many factors besides flood depth that could affect the flood damage, such as time of year of flooding, velocity and sediment load of floodwaters, duration of flooding as well as the warning time, and therefore, it is recommended that the flood-damage function should be adjusted.Yang et al. (2005) also noted that some meteorological, physiographic and human factors such as rainfall, terrain and flood-prevention measures could influence the actual flood damages.Hence, the relationships between various factors and flood damages are now widely examined.The most common factor being considered is the type of building Grigg, 1974;FEMA, 1977;McBean et al., 1988;Smith, 1994;Taiwan Water Resource Agency, 1997;Chang, 2000;Kang, Su, and Chang, 2005;Thieken et al., 2005;Baro-suarez et al., 2007).Other factors include floor area, family income (McBean et al., 1988;Lekuthai and Vongvisessomjai, 2001), flood-warning system (Wind et al., 1999;David, 2000;Du Plessis, 2002), flood-warning lead time Penning-Rowsell et al., 2000;Thieken et al., 2005), experience of flooding McPherson, 1977;McBean et al., 1988;Wind 1999;Krasovskaia, 2001), the preparation before disaster (Penning-Rowsell et al ., 2000), duration of flooding (McBean et al., 1988;Torterotot, Kauark-leite and Roche 1992;Hubert, Deutsch, and Desbordes, 1996;Lekuthai and Vongvisessomjai, 2001;Thieken et al., 2005;Baro-suarez et al., 2007), velocity of floodwaters (CH2M Hill, 1974;Black, 1975;Smith, 1994;Beck et al., 2002), persons per household (McBean et al., 1988;Shaw, Huang and Ho, 2005) and the location of the household Chang, 2000;Shaw, Huang and Ho, 2005).Since flood damage is affected by many factors, some multiple regression models to incorporate such factors were also proposed (Shaw, Huang and Ho, 2005).Although this approach can incorporate more factors as the predictors and improve the statistical significance of the fitting model, it also increases the difficulty of data collection of predictors when predicting the damage in the future.Global multiple regression methods were used in most of these studies, and the regression coefficients were assumed constant across the study region (Platt, 2004).In other words, the spatial variation was not considered, so the global model residuals may exhibit spatial autocorrelation (Fotheringham, Brunsdon and Charlto, 2002;Zhang, Gove and Heath, 2004;2005;Kupfer and Farris, 2007).Thus, the aim of this study is to establish the flood-damage function for one household by using the smallest possible number of independent variables, while also considering the spatial variation and solving the problem of spatial autocorrelation in residuals.

Method
The first step is to determine the factors affecting flood damages.Many flood-damage factors exist as described above, but the characteristics of flood damage vary among regions.Shaw et al. (2005) incorporated flood depth, inundation time, building and structure types, the numbers of floors, presence of a basement, floor area, persons per household and region in his study, and the flood depth which was found to be the major factor of flood-damage functions in that study.Some other studies even show that without considering other factors, the flood depth alone was still appropriate for estimating the flood damages (Grigg, 1996).Based on the information presented in previously published work, the flood depth was chosen as the principle factor for assessing the flood damages.
The ordinary least squares (OLS) for global regression was used initially to establish the flood-damage function in this study.After the model was confirmed through all the needed statistical tests, the Moran's I (Fotheringham et al., 2002) statistics were then used to examine if there were any spatial autocorrelations in residuals.If spatial autocorrelations among residual were present, then the Geographically Weighted Regression (GWR) Model was applied to solve the problem.

Global regression model
First a global regression model, formulated using OLS regression, was adopted in this study to establish the flood-damage function.Since flood damage increases with flood depth, the following S-curve model was constructed: (1) where: y is the flood damage (NT dollar) x is the depth (cm) β 0 , β 1 are the regression coefficients ε is the residual By taking the natural logarithm of Eq. ( 1) it becomes: (2) where: ε 2 is the residual Through this transformation, β 0 , β 1 can be estimated by a simple linear regression model.A basic assumption in fitting such a model is that the observations are independent of one another.
A second assumption is that the structure of the model remains constant over the study area.That is, the estimated parameters have no local variations.The established model was subjected to all necessary statistical tests including coefficients significance, model goodness of fit and residuals pattern examination.

Residual spatial autocorrelation test
After the regression model is confirmed with all needed statistical tests, Moran's I test was used to detect any existing spatial autocorrelation among the residuals.According to Bailey and Gatrell (1995), Moran's Index can be expressed as: (3) where: n is the number of points or cells y m is the value in zone m y is the mean of attribute y w ij is the spatial proximity of point i and j The inverse of the distance between points i and j is often used to represent the spatial proximity, and w ij can be defined as 1/d ij , where d ij is the distance between point i and j.This assumes that attribute values of points follow the first law of geography.
With the inverse of the distance, smaller weights are given to points that are farer apart and larger ones to points that are closer together.The expected value of Moran's I when there are no spatial pattern in the data set is: When the resulting Moran's I value is larger than the expected value, it indicates positive spatial autocorrelation where similar values cluster together.On the other hand, when the index value is below the expected value, it shows negative spatial autocorrelation where similar values are more dispersed.Under this assumption, the I variance is given by: (5) where: The distribution of I is asymptotically normal under the assumption of random distribution.The standardised Z scores can be calculated as: where: The null hypothesis is set as the residuals randomly distributed in spatial sense.If −1.96 < Z(I) < 1.96, then the null hypothesis can not be rejected within a statistical significance level of 5%, and we may conclude that the residual patterns are not of significant statistical difference from a random pattern.Otherwise, the residual pattern will be clustered as Z (I) > 1.96 and will be dispersed when Z(I) < −1.96.

GWR model
If the residual has spatial autocorrelation, then GWR can be utilised to modify the OLS regression to solve the problem (Brunsdon et al., 1996;Fotheringham et al., 1998;2000;2002;Platt, 2004;Zhang et al., 2004;2005;Kupfer and Farris, 2007).If the spatially varied characteristics in flood damages are taken into account, Eq.( 2) can be modified as: (8) where: y i is the flood damage of point i x i is the flood depth of point i u i , v i is the coordinates of the ith point in space In a simple linear regression model, a single set of parameters is estimated for the relationship between each independent and dependent variables by OLS and the relationship is assumed to be constant across the study area.It can be estimated as follows: (9) The GWR model recognises that spatial variations in relationships might exist.So the estimate in GWR becomes: (10) where: X is the matrix of the independent variable's observation value, which is the matrix of n × 1: β is the matrix of the regression coefficient, which is the matrix of n × 2: W is an n × n matrix whose off-diagonal elements are zero; the diagonal elements denote the geographical weighting of observed data for point i.That is: The weighting of each observed data is: where: d ij is the Euclidean distance between observed data i and j h is the constant value of bandwidth The bandwidth h may be either supplied by the user, or estimated by using a technique such as cross-validation.The parameter estimated with GWR is then plotted onto the map to determine the parameter estimated to exhibit significant spatial autocorrelation.GWR analysis not only can modify the problem of spatial autocorrelation in the residuals from OLS regression, but also can take into account the spatial variation of flood-damage characteristics.

Data collection and study area
To establish the flood-damage function for one household in a residential area, the Keelung River basin near Taipei Metropolitan in Taiwan, where flood disasters occur frequently, was selected as the study area.Field survey data of the flood damages caused by Nari Typhoon in 2001 were collected.The investigated areas are shown in Fig. 1 (next page), and include Xizhi City, and townships of Qidu, Nangang, Neihu, SongShan, Sinyi and Da-an.The flood-damage surveys included such items as the basic household information (the characteristics of the building like the numbers of floors and floor area, persons per household, income levels, etc.), flood depth and inundation time, level of damage (the damage to household furniture, interior decorations, and vehicles, etc.) and the risk-perception factors (experience of flood, risk information, fear of the risk, willingness to take the risk, and the influence of mass media).A total of 302 completed questionnaires were collected.All data were geocoded for spatial analysis and plotting onto a map.

Global regression model
The regression result of Eq. ( 2) is shown in Table 1.The coefficient of determination R 2 is 0.15 and the estimates for both parameters are significantly different from zero at 0.05 significance level.While the residuals plot is shown as in Fig. 2 (next page).From the figure, the residuals seem to be fluctuating randomly around zero, indicating a good fit for a linear model.The residuals were then mapped, as shown in Fig. 3, to determine if there is any existing spatial autocorrelation.Obvious clustering pattern was observed in the figure.Moran's I test was then employed to test the existence of spatial autocorrelation and the result was the following: 0.6118 with Z(I) = 4.936 >1.96.This implies that the residuals had significant spatial autocorrelation and it violates the assumptions for linear regression.Therefore, the GWR as described above was applied to modify the model.

GWR model
The application of GWR model improved the R 2 increased from 0.15 as in OLS regression to 0.26, demonstrating that GWR

212
provides a better interpreting ability than OLS.As shown in Fig. 4, the histogram of intercept estimates displays three obvious groups.Figure 5 depicts the spatial distribution of these three groups.The intercept term in Eq. ( 2) can be interpreted as the basic or fixed flooding damage due to from cleaning and restoration.There is a significant clustered pattern indicating that basic flood damages increase gradually from west to northeast corner in the study region.
Figure 6 also shows that there are two groupings of the estimates for the parameter of inversed flood depth in the GWR model.Figure 7 shows that the high value group was located in the central and western parts, and the group with low values was located in the northeast corner.These parameter estimates indicate the change of the flood damages with the flood depths, and are increased gradually from northeast to west in the study area.
The residuals of the GWR were then mapped, as shown in Fig. 8, to examine if there exists any pattern or spatial auto-

Modified global regression model
Although the GWR cured the spatial autocorrelation problem in residuals, the model is of little use in term of future applications.The GWR generates regression coefficients for each sample points.These estimates are only good for those specific locations and can not be used for further estimation at locations other than those of the sample sites.Therefore, the GWR model results were more closely examined in this study to develop further knowledge for later use in modifying the traditional OLS regression model.
Since grouping patterns were shown in Fig. 4 and 6 of the estimates for both parameters, these estimates were summarised in Table 3.All the sample points can be categorised into three groups as shown in Table 3.By ocular observation, the map in Fig. 9 shows strong spatially clustering tendencies.The original OLS was then modified according to the grouping result by adding two dummy variables, GP1 and GP2.The dummy variable GP1 is 1 for data in zone 1 and is 0 otherwise.The dummy variable GP2 is 1 for data in Zone 2 and is 0 otherwise.
The original OLS regression model was then modified as follows:

214
(12) where: y is the flood damage x is the flood depth GP1 is 1 when sample is in zone 1 and is 0 otherwise GP2 is 1 when sample is in zone 2 and is 0 otherwise β 0 , β 1 , β 2 , β 3 , β 4 , β 5 are the regression coefficients e is the residual.
Stepwise regression was adopted to determine the main variables.The results revealed that only 1 / x and GP1 were significant and the model could be modified as: (13) Table 4 shows the results of the modified OLS regression model.The regression estimates were all statistically significant at a statistical significance level of 5%.The coefficient of determination R 2 also increased from 0.15 (OLS) to 0.26 (modified OLS), similar to that of the GWR model.
To test if the residuals exhibit spatial autocorrelation, the residuals of the modified OLS were mapped to the map as shown in Fig. 10    to capture spatial variations in regional flooding damages.The paper proposed an approach that not only uses the smallest numbers of explained variables to establish the flood-damage functions for single household, but also solves the problem in traditional regression models for overlooking the spatial variations in flooding loss characteristics.The introduction of the GWR model improved the coefficient of determination from 0.15 in the original OLS to 0.26.The GWR model corrects the spatial autocorrelation problems in residuals, but it also has some drawbacks.It produces a different set of estimates for the regression parameters at each sample points.This makes its application for estimating the flood loss at locations other than those at the sample points difficult.A modified OLS model was then proposed in this study by intruding dummy variables differentiating regions with different flooding loss characteristics.This modified OLS model not only corrects the spatial autocorrelation problem in residuals but can also be used for future applications in regional flood-damage assessments.
Figure 1 Geographic distribution of study area in Taiwan

Figure 7
Figure 8Residuals from GWR model and the Moran's I value was obtained.The Moran's I is 0.0313 with Z(I)=0.231<1.96 indicating a random pattern in the spatial distribution of residuals.The modified OLS has successfully corrected the spatial autocorrelation problem of residuals in the original OLS model.The resulting flood-damage functions from OLS and Modified OLS models are shown as in Fig. 11.The damage functions share the same patterns and trend for both models.Houses located outside zone 1 would suffer from bigger flood damages than those in zone 1 when flooding occurs.From the figure, the maximum flood damage per household is NT$50 000 for OLS and approximately NT$26 000 for Zone 1 and NT$80 000 for area other than Zone 1 in the modified OLS model.The modified OLS model shows better results than the global OLS model by distinguishing the differences in flooding damage characteristics between areas.ConclusionsAlthough flood-damage curves are used commonly for flood risk assessments, most of the currently used flood-damage curves fail

Figure 10
Figure 10Residual spatial distribution from modified regression model

TABLE 2 Results of Monte Carlo test for spatial non-stationary a (nis302) P-Value
a Tests if regression coefficients change over space in a way that is unlikely to occur at random *** is significant at .1% level ** is significant at 1% level * is significant at 5% level

TABLE 4 Result of modified global a regression model (nis302) Para- meter Esti- mate Std estimate Std Err T P-value
a The average regression result of the entire study area