Hydraulic head and groundwater 111Cd content interpolations using empirical Bayesian kriging (EBK) and geo-adaptive neuro-fuzzy inference system (geo-ANFIS)

In this study, hydraulic head and 111Cd interpolations based on the geo-adaptive neuro-fuzzy inference system (Geo-ANFIS) and empirical Bayesian kriging (EBK) were performed for the alluvium unit of Karabağlar Polje in Muğla, Turkey. Hydraulic head measurements and 111Cd analyses were done for 42 water wells during a snapshot campaign in April 2013. The main objective of this study was to compare GeoANFIS and EBK to interpolate hydraulic head and 111Cd content of groundwater. Both models were applied on the same case study: alluvium of Karabağlar Polje, which covers an area of 25 km2 in Muğla basin, in the southwest of Turkey. The ANFIS method (called ANFISXY) uses two reduced centred pre-processed inputs, which are cartesian coordinates (XY). Geo-ANFIS is tested on a 100-random-data subset of 8 data among 42, with the remaining data used to train and validate the models. ANFISXY and EBK were then used to interpolate hydraulic head and heavy metal distribution, on a 50 m2 grid covering the study area for ANFISXY, while a 100 m 2 grid was used for EBK. Both EBKand ANFISXY-simulated hydraulic head and 111Cd distributions exhibit realistic patterns, with RMSE < 9 m and RMSE < 8 μg/L, respectively. In conclusion, EBK can be considered as a better interpolation method than ANFISXY for both parameters.


INTRODUCTION
Earth scientists (hydrologists, geologists, biogeochemists, etc.) are interested in understanding the behaviour of hydrosystems (Kurtulus et al., 2011;Flipo et al., 2012).Usually they first do experiments/observations in the field at specific locations and then try to distribute these observations/measurements both in space and time using modelling techniques that are based on abstractions.As part of the hydrosystem, aquifer systems play a decisive role in its behaviour and act as a reservoir.Metals and metalloids in these kinds of reservoirs might be assessed as pollutants according to their abundance.In the case of using polluted water stored in reservoirs for domestic, industrial or irrigational purposes, harmful side effects will most likely be encountered.Recent studies have shown that groundwater heavy metal contamination often cannot be detected, especially in cities (Huang et al., 2014).Pollutant sources could be artificial or natural.Industrial, agricultural and domestic waste might be the sources if they are continuous and the concentration of waste is high enough to pollute the water.Geological units which are in contact with groundwater could also be the source as they dissolve with water.The accumulable-stable characteristic and toxicity of heavy metals in groundwater make them very important pollutants worldwide (Okbah et al., 2014).In order to avoid negative effects and to ensure environmental sustainability, hydraulic head and pollutants must be examined together.
The state of an aquifer unit is characterized by the piezometric head or hydraulic head, measured as the water level in piezometers.The mapping of these point data is useful for many environmental applications, such as water resources management during high flow conditions.Estimations of hydraulic head distribution are frequently used to determine the capture zone of pumping wells.Hydraulic head maps are also important tools for earth dam monitoring (Rivest et al., 2008).They are also used to initialize distributed models, which are critical tools nowadays for managing water resources at the basin scale (Perkins and Sophocleous, 1999;Billen et al., 2007;Flipo et al., 2007Flipo et al., , 2012Flipo et al., , 2014)).As reported in Flipo et al. (2012) many inverse methodologies in hydrogeology use hydraulic head maps as a pre-requisite (24 publications among 45).The mapping of hydraulic heads requires synchronous measurements, usually achieved with synchronous snapshot campaigns.Synchronous snapshot campaigns are feasible for relatively small aquifer units (~100 km 2 ), such as the Orgeval basin (Kurtulus et al., 2011;Kurtulus and Flipo, 2012;Mouhri et al., 2013).The larger the aquifer unit, the longer the measurement campaign, which can last several years for regional aquifer systems (>100 000 km 2 ) and therefore introduce uncertainties in the final mapping result (Tóth, 2002).
A lack in the literature of comparison of ANFIS and EBK for interpolation of hydraulic head and a metal parameter provided the main motivation of this study.EBK and Geo-ANFIS were tested and compared with each other in order to determine their performance and practicality for hydraulic head and 111 Cd interpolation.

EXPERIMENTAL SITE AND DATA
The 25 km 2 study area was located in Karabağlar Polje, near the Muğla city centre, in the southwest of Turkey (Fig. 1).Elevation ranges from 606 m to 717 m.Average annual air temperature is 14.9°C, and mean annual precipitation is 1 222 mm.The study area is covered by a Quaternary alluvium unit with a thickness of 80-100 m (Fig. 1) (Atalay, 1980).Intensive agricultural activities and operations such as a drinking water treatment facility, lime quarry, sand and gravel quarry, industrial zone and swimming pool are situated in the area.Nevertheless, Kurtuluş and Sağır (2017) indicated that there is no are available, but what is needed are groundwater surfaces based on these measurements.Robust interpolation methods are needed to interpolate hydraulic head point measurements.Many such methods have been discussed in the literature (Kurtulus and Flipo, 2012).
On the other hand, hydrologists have started to incorporate fuzzy logic and artificial neural network(ANN) concepts in their methodologies, with more than 500 papers published on this topic between 1999and 2007(Maier et al., 2010)), and especially for rainfall-discharge transformation The basic tool used for kriging is the semi-variogram γ (Eq.1), defined as half the expectancy of deviation between values of samples separated by a distance h.In this case it produces the spatial variability of the variable Z(x): where: E[V] defines the mathematical average of the coordinates of the vector V.If Z''(x) is the kriged value at location x, Z(x i ) is the known value at location x 1, λ i is the weight associated with the data, μ is the Lagrange multiplier and y(x i x j ) is the value of variogram corresponding to a vector with origin in x i , and extremity in x j , the general equation of Kriging estimator is: In order to achieve unbiased estimations in kriging and to minimize the variance of estimates the following set of equations should be solved simultaneously (Chauvet, 1999): groundwater pollution.The Jurassic Yılanlı formation, which is composed of dolomite, dolomitic limestone and limestone, underlies the Quaternary alluvium deposit (Kurttaş, 1997).
Based on field observations, the Yılanlı formation has the most advanced karst features in the area and several dolines were also spotted where surface water flows and infiltrates the karst system during high-flow periods.A hydrogeological conceptual model of the study site and surroundings was created and a hydraulic connection between the area and Gökova springs (south of the study site) was revealed (Açıkel, 2012).The other geological unit, the Tertiary-Miocene aged Köprüçay formation, consists of conglomerate and limestone present to the south of the site.
The depths of the studied water wells ranged between 8 and 20 m and they penetrate only the Quaternary alluvium deposit.During a snapshot campaign in April 2013, hydraulic head measurement and water sampling were performed for 79 water wells.Heavy metal analyses for several elements were done.Based on the preliminary statistical characteristics (e.g.normal distribution) of elements and the absence of certain elements at certain wells, 111 Cd was chosen as the most suitable element to interpolate and for testing different interpolation methods; 38 data for 111 Cd were available for use.In order to get an approximately similar spatial distribution to the 111 Cd data, 42 data-points for hydraulic head were selected and analysed.Measured hydraulic head values were plotted against surface elevation with the objective of determining correlation (Fig. 2).High correlation between these parameters was expected for the alluvium aquifer, and for all of the measured 79 hydraulic head values, a highly correlation with 111 Cd was found (R 2 = 0.93).When the higher elevation values are excluded, there is poor correlation (R 2 = 0.24).According to these correlations, the studied aquifer is driven by the alluvium at the full scale.But in the areas at lower altitude where the dolines are situated, groundwater flow in the alluvium unit is affected by karst structures.

Kriging
Geostatistics aims at providing quantitative descriptions of natural variables distribution in space and time (Matheron, 1978;Journel, 1986;Chilès and Delfiner, 1999).Initially developed to address ore reserve evaluation issues in mining (Isaaks and Srivastava, 1989), it is now commonly applied Rule 2: If X∊A 2 and Y∊B 2 then: p 1 , q 1 , r 1 , p 2 , q 2 , r 2 are defined in the first layer of the Geo-ANFIS (Fig. 3).
Each node i of Layer 2 calculates the firing strength w 1 of the i th rule via multiplication: Node i in Layer 3 calculates the ratio of the i th rule's firing strength to the total amount of all firing strengths: Node i in Layer 4 calculates the contribution (weight) of the i th rule toward the overall output via multiplication: Finally, Layer 5 is made on a single node that computes the overall output as the summation of the contribution from each rule: ANFIS uses a hybrid learning algorithm that combines the back-propagation gradient descent and least squares method to create a fuzzy inference system whose membership functions are iteratively adjusted according to a given set of input and output data (Jang, 1993).For each iteration, the backpropagation method involves minimization of an objective function using the steepest gradient descent approach in which the network weights and biases are adjusted by moving a small step in the direction of a negative gradient.The iterations are repeated until a convergence criterion or a specified number of iterations is achieved.It has the advantage of allowing the extraction of fuzzy rules from numerical data and adaptively constructs a rule base.

Empirical Bayesian kriging
Empirical Bayesian kriging (EBK) first appeared in the literature several years ago (Pilz and Spöck, 2007;Pilz et al., 2012).EBK is a geostatistical interpolation method that automates the difficult aspects of building a valid kriging model.Other kriging methods require one to manually adjust model parameters, but EBK automatically calculates these parameters through a process of sub-setting and simulations (Chiles and Delfiner, 1999).The EBK method can handle moderately non-stationary input data estimates and then uses many semi-variogram models rather than a single semi-variogram.EBK accounts for the error introduced by estimating the underlying semi-variogram through repeated simulations (Finzgar et al., 2014).

Geo-ANFIS
The adaptive neuro-fuzzy inference system (ANFIS) (Firat and Gungor, 2007;Jang, 1993Jang, , 1995Jang, , 1996;;Pratihar, 2007;Takagi and Sugeno, 1985;Wang et al., 2009) is a modelling technique which assumes that input and output data are ill-defined with uncertainty that cannot be exactly assessed with probability theory based on a two-valued logic.It uses fuzzy set theory first proposed by Zadeh (1965).A fuzzy set is a set of elements with an imprecise (vague) boundary (Pratihar, 2007).A fuzzy set does not have a crisp boundary.That is, the transition from 'belonging to the set' to 'not belonging to the set' is gradual and is characterized by membership functions.A fuzzy set A(x) is then represented by a pair of two things -the first one is the constituent elements x and their associated membership values μ A (x) that is their degree of belongingness: where: X is the universal set consisting of all possible elements.The membership function μ A ranges between 0 and 1.If the value of the membership function is restricted to either 0 and 1, the fuzzy set is then reduced to a classical crisp set with a known boundary.As stated by Jang (1995), the fuzziness does not come from the randomness of the constituent members of the sets, but from the uncertain and imprecise nature of the abstract thoughts and concepts.
In ANFIS the relationship between input and output is expressed in the form of If-Then rules.ANFIS used for the present work is based on the Sugeno fuzzy model (Takagi and M. Sugeno, 1985) which formalizes a systematic approach to generating fuzzy rules from an input-output dataset.A typical fuzzy rule in a Sugeno fuzzy model has the format: If x∊A and y∊B then: where: A and B are fuzzy sets in the antecedent and f (x, y) is a crisp function in the consequent.Usually f is a polynomial function.
The architecture of the Geo-ANFIS is composed of 5 layers (Fig. 3).Each layer has a specific function.The first layer generates a membership grade of a linguistic label, which means that it defines the parameter of the membership function.For instance, consider a first-order Sugeno fuzzy inference system which contains 2 rules:

Implementation of EBK
Exploratory data analysis, automatic variogram fitting and kriging were performed using ArcGis 10.2 software.The EBK method is based on 3 main steps: Firstly, a semi-variogram model is estimated from the observed data set.Secondly, a new value is simulated at each of the observed data locations by using the semi-variogram estimated in the previous step.
Thirdly, a new semi-variogram model is estimated from the newly simulated data from the second step.By using Bayes' rule, a weight for this semi-variogram model is calculated which shows how likely it is that the observed data can be generated from the semi-variogram.The second and third steps are repeated.This process creates a spectrum of semivariograms (Pilz and Spöck, 2007).New parameters are also needed for EBK, such as subset size which defines the number of points in each subset, overlap factor which specifies the degree of overlap between subsets and number of simulation which specifies the number of semi-variograms that will be simulated for each subset.

Implementation of Geo-ANFIS
The neuro-fuzzy model was developed using the ANFIS procedures of MATLAB (Demuth and Beale, 2000).In this study, a code is written in Matlab 2012b for ANFIS using appropriate functions to calculate the best performance of the methods.Before using the model to interpolate unknown outputs (hydraulic head and 111 Cd), its actual predictive performance must be tested by comparing outputs estimated by calibrated models with known outputs.At each phase (training, validation and test), Geo-ANFIS performance is measured by the determination of the coefficient of goodness-of-fit (R 2 ) and the root mean square error (RMSE).
where: E, Z * and Z are previously defined.
Input data are XY coordinates for the ANFIS XY .The data are pre-processed by elimination of unrealistic values to obtain a more stable dataset.Predictions of hydraulic head and 111 Cd are the Geo-ANFIS output.
The selection of appropriate input parameters is a complex task.At first step; numbers of training, validation and test data are decided by order: 60%, 20% and 20%.Assignment of data points to training, validation and test subsets is realized by random selection ability of ANFIS.Triangular (TriMF), Gaussian (GaussMF), Generalized bell (GbellMF), Splinebased (PiMF), Trapezoidal (TrapMF) and their different types of curves (named as 2, 3, 4 and 5) were used as membership functions in Geo-ANFIS.Random simulation number was decided as 100 which provides 100 different data assignments to training, validation and test subsets for each type of membership function curve.For ANFIS XY simulations, the number of rules is set to 3 for each input.

SELECTION OF INTERPOLATION MODELS EBK process
X-Y coordinates, hydraulic head and 111 Cd values were used as input to EBK.For the semivariogram cloud creation of hydraulic head; subset size, overlap factor, number of simulations, maximum neighbours, minimum neighbours and radius (m) are determined by order: 20, 2, 100, 15, 10 and 1 500.For the semivariogram cloud creation of 111 Cd; subset size, overlap factor, number of simulations, maximum neighbours, minimum neighbours and radius (m) are determined by order: 20, 2, 100, 15, 10 and 1500.

Geo-ANFIS model selection
The Geo-ANFIS model selection is based on available data.Using these datasets at each phase (training, validation and test), the Geo-ANFIS performance is measured by the coefficient of goodness-of-fit (R 2 ) and root mean square error (RMSE).ANFIS XY is run up to 2 000 iterations with 100 random data simulations for 4 types of each membership function.100 results for each type of membership function are analysed automatically to select the best ones.RMSE and R 2 values of training, validation and test subsets for the chosen types of membership functions for hydraulic head and 111 Cd are given in Table 1.

INTERPOLATIONS OF HYDRAULIC HEAD AND 111 CD
Hydraulic head and 111 Cd interpolation maps using EBK on a 100 m square grid are given in Fig. 4 2).Also, the difference maps between EBK and Geo-ANFIS predictions for hydraulic head and 111 Cd are given in Fig. 8 and Fig. 9.

CONCLUSION
In this study, two interpolation methods were tested to estimate hydraulic head and 111 Cd distribution over the Karabağlar alluvium aquifer.Geo-ANFIS was used with 2 inputs as X-Y cartesian coordinates to interpolate hydraulic head and 111 Cd.Both hydraulic head and 111 Cd distribution results show that EBK performs considerably better than ANFIS XY .Further, the prediction map generated by the Geo-ANFIS method doesn't represent smoothed results, as compared to all other interpolation methods.
For hydraulic head, Geo-ANFIS concluded with a 8.38 m RMSE value, which is significantly high when compared to the observed data range of 19.32 m.Also, for 111 Cd, Geo-ANFIS produced a 0.34 µg/L RMSE value while the observed data range is 1.21 µg/L.Geo-ANFIS performed better for 111 Cd than hydraulic head.In this regard, using X-Y cartesian coordinates and elevation for ANFIS (ANFIS XYZ ) may perform better, especially for interpolation of hydraulic head.However, the EBK RMSE value is 0.86 m for hydraulic head and 0.03 µg/L for 111 Cd.Hence, EBK gave better results for 111 Cd as well as hydraulic head.
In conclusion, EBK can be considered as better interpolation method than ANFIS XY to interpolate hydraulic head and 111 Cd in groundwater.However, Geo-ANFIS proved its applicability as an alternative method to interpolate hydraulic head and metal concentration ( 111 Cd).

Figure 1
Figure 1 Geological map of study area and location of wells (Geological information taken from General Directorate of Mineral Research and Exploration of Republic of Turkey)

Figure 2
Figure 2 Plot of measured hydraulic head vs. surface elevation

Figure 3
Figure 3 Geo-ANFIS architecture for 3 inputs x, y.Layer 1: generates membership grades.Layer 2: Fuzzy rules.Layer 3: Calculates weights or rules named firing strengths.Layer 4: Product of the normalized firing strengths.Layer 5. Fuzzy results transformed into a traditional output by summation.
and Fig. 5. Hydraulic head and 111 Cd distributions are calculated on a 50 m square grid for Geo-ANFIS and the maps are given in Fig. 6 and Fig. 7. Observed hydraulic head and 111 Cd values are directly used as input both in Geo-ANFIS and EBK.The EBK model produced less dispersed values for hydraulic head and 111 Cd with standard deviation of 3.28 m and 0.42 µg/L while Geo-ANFIS's are 4.51 m and 0.74 µg/L, respectively.RMSE between observed values and EBK prediction for hydraulic head and 111 Cd are

Figure 4 Figure 5 Figure 7
Figure 4 EBK interpolation, standard error, cross validation, error graph and semi-variogram cloud of hydraulic head

TABLE 2 General descriptive statistics of Geo-ANFIS and EBK predictions
Geo-ANFIS and EBK predictions are assessed together based on RMSE, mean absolute error (MAE) and descriptive statistics given in Table2.Performance of EBK is slightly better than Geo-ANFIS for both parameters interpolated.RMSE and MAE of EBK hydraulic head prediction values are 0.86 m and 0.66 m, whereas these values are 8.38 m and 7.54 m for Geo-ANFIS prediction.For EBK 111 Cd prediction, RMSE and MAE values are 0.03 µg/L and 0.02 µg/L.Besides the statistical values of hydraulic head prediction, map patterns were taken into consideration.