Optimising water distribution systems using a weighted penalty in a genetic algorithm

Genetic algorithms (GAs) have become the preferred water system design optimisation technique for many researchers and practitioners. The main reason for using GAs is their ability to deal with nonlinear complex optimisation problems. The optimal decision in terms of designing, expansion/extending, addition or rehabilitation of water supply systems has to review possible options and select a cost-effective and efficient solution. This paper presents a new approach in determining a penalty value depending on the degree of failure, of the set pressure criteria, and the importance of the link supplying a specific node. Further modifications are also made in the cross-over and mutation procedures to ensure an increase in algorithm convergence. EPANET, a widely used water distribution network simulation model, is used in conjunction with the proposed newly developed GA for the optimisation of water distribution systems. The developed GA procedure has been incorporated in a software package called GANEO, which can be used to design new networks, analyse existing networks and prioritise improvements on existing networks. The developed GA has been tested on several international benchmark problems and has proved to be very efficient and robust. The EPANET hydraulic modelling software as well as the developed GANEO software, which performs the optimisation of the water distribution network, is freeware. The software provides a tool for consulting engineers to optimise the design or rehabilitation of a water distribution network.


Introduction
As a vital part of water supply systems, water distribution networks represent one of the largest infrastructure assets of industrial society.Simulation of hydraulic behaviour within a pressurised, looped pipe network is a complex task, which effectively means solving a system of non-linear equations.The South African objective to provide 'water for all' makes it essential that the limited capital has to be employed in such a way to provide the maximum benefit.Optimising a relatively small system will require numerous repetitive calculations.The discrete nature of the network optimisation problem (pipe diameters) and the size of the solution space make it virtually impossible to apply any of the conventional optimisation techniques to find the global optimum.
The development of hydraulic models in the last two decades improved the ability to simulate hydraulic behaviour of large water distribution networks (Rossman, 2000).Most optimisation techniques are applicable when continuous variables are evaluated.GAs are applicable when a large solution space has to be searched, consisting of discreet variables, and it is now accepted by most experts that the GA is the best technique for network optimisation (Van Vuuren et al., 2005).
According to Michalewicz (1994), GAs can basically be described as artificial evolution search methods based on the theories of natural selection and mechanisms of population genetics.GAs apply the principle of survival of the fittest in a mathematical sense.In the evolutionary process, all species develop in such a way to improve their chances of survival and quality of living.It therefore means that all species are striving to a certain optimum, being physical, behavioural or otherwise.In this paper a traditional GA is developed which incorporates a unique penalty structure and tailored cross-over and mutation procedures.

Problem formulation
The difficulty of optimising water distribution systems is mainly due to the discrete nature of the variables and the size of the solution space.Many optimisation techniques can only be applied to problems, which have continuous variables, unlike the pipe diameter variables in the network optimisation problem.The size of the solution space (the total number of possible solutions to the problem) for the network optimisation problem can be calculated as the number of possible discrete pipe diameters to the power of the number of pipes in the network.
A major problem for water supply authorities is to select those components in a network that should be changed, increased or replaced to ensure a sustainable service to the consumers at the lowest cost.Water supply authorities are also faced with problems other than that of network design such as network calibration, operation and reliability.For each type of optimisation, the main objective function, possible variables and the main constraints are summarised in Table 1.
A water supply distribution system consists of a complex network of interconnected pipes, service reservoirs and pumps that deliver water from the treatment plant to a consumer.The distribution of water through the network is governed by complex, non-linear, non-convex and discontinuous hydraulic equations (Keedwell and Khu, 2005).

538
Two equations, which are used to determine if a network is hydraulically balanced, are the continuity and energy equations (Eqs.( 1) and (2) respectively). (1) The continuity equation is applied to each node with q i the flow rate (in and out of the node) and n the number of pipes joined at the node. (2) The energy equation is applied to each loop in the network with h i the head loss in each pipe and m the number of pipes in the loop.The head loss is the sum of the local head losses and the friction head losses.The friction head loss can be calculated using the Darcy-Weisbach (Eq.( 3)) or Hazen-Williams (Eq.( 4)) empirical equations or something similar.The network can be hydraulically balanced utilising Eqs.(1) to (4) and methods such as the Hardy-Cross and nodal methods.
The aim of designing a new network is to obtain a system that will meet the demand at each node at a required minimum pressure.Similar for the rehabilitation or improvement of a network, the optimum system is achieved when the system components are identified, to replace or rehabilitate, which will provide the level of service required.

Genetic algorithm
Genetic algorithms (GAs) have been developed (Holland, 1975) to assist in searching through complex solution spaces for the optimum solution.GAs have been applied as search techniques for various engineering problems such as, structural design optimisation, water distribution network evaluation, pump scheduling, hydrological runoff predictions and resource utilisation.According to Michalewicz (1994), GAs can basically be described as artificial evolution search methods based on the theories of natural selection and mechanisms of population genetics.GAs emulate nature's optimisation technique of evolution, based on: • Survival and reproduction of the fittest members of the population • The maintenance of a population with diverse members • The inheritance of genetic information from parents • The occasional mutation of genes.
A GA evolves optimal solutions by sampling from all the possible solutions.The best of these solutions are then combined, using the genetic operators of cross-over and mutation, to form new solutions.The identification of these best solutions is done based on a set objective function.This process continues until some termination condition is fulfilled.A flow diagram of the basic GA process is given in Fig. 1.
The objective function in the optimisation of a water distribution system is usually the minimisation of total cost.The total actual cost is a combination of the capital costs and operating and maintenance costs.In this paper only the capital cost of the pipes (supply, lay and jointing) are considered and hence it can be generally expressed as: (5) The pipe costs per unit length usually vary nonlinearly with its diameter and in Eq. ( 5) it is assumed that it can be expressed as a single term for all diameters, where f cost (D) is the cost of the pipes, L j and D j are the lengths and diameters of the j th pipe and K and n are constants that will depend on local conditions (Vairamoorthy and Ali, 2000).System configuration; number of simultaneous events

Genetic algorithm optimisation model
GAs can be used to optimise various different parameters in water distribution systems.Various types of optimisation can be identified for water distribution systems as shown in Table 1.In many cases the types and subsequent objectives of optimisations are in conflict with each other.For example, attempts to minimise operational cost will generally place the system in a more vulnerable state and less able to handle abnormalities such as pipe bursts, thus reducing the level of service (Jowitt et al., 1988).In such cases it is necessary to strike a balance between the objectives.This can be done by either defining the balance before hand (for example using an objective function equally weighed for cost and level of service) or by using multi-objective optimisation.In this paper only a single objective, in this case minimising cost, will be set for a water distribution system (new or upgrading/rehabilitation). The genetic algorithm process with its newly developed penalty function is summarised in Table 2.

Benchmarking the model
The developed genetic algorithm optimisation model was tested against benchmark problems in order to establish its functionality and efficiency.Software was developed (called GANEO) and

Create the initial population
Make a random selection of pipe diameters, from a selected list of available pipes, for the pipe network to create a string (possible network solution).Repeat this process to generate the entire population of network solutions.

Hydraulic analysis
Perform a hydraulic analysis on each of the population's strings (using hydraulic modelling package such as EPANET) to determine the pressure and supply at each node in the network as well as the flow rate in each pipe.

Fitness of each string (solution)
If a node does not meet the minimum pressure requirement the pipes supplying that node are penalised.If nodes have negative pressures the pipes supplying these nodes are penalised extensively to emphasise the poor results thereof.The cost of a pipe that results in a node not meeting the minimum pressure requirement will be calculated by modifying Eq. ( 5) as follows (Eq.( 6): The aim of the weighted penalty cost structure as defined above is to increase the penalty on a system, the greater the pressure deficiency is and to add some proportional distribution of the importance of the supply pipe based on the flows in the pipes, to the cost.The more water a specific pipe supplies to the node the greater the importance of that pipe.The higher the user-specified penalty factor (PF) is the higher the cost component will be.The pressure penalties are subject to an if-thenelse statement, which means that if the pressures fall within the specified boundaries, no penalties would be applied.The total cost of the network is the total cost of all the individual pipes (including penalties).A similar approach is followed in case a velocity criterion is not met.

Reproduction and cross-over (pairing)
In this proposed model, 75% of the top ranked solutions of the generation is retained and the worst solutions (25%) are discarded.A new set of strings (offspring) is generated from the remaining strings/solutions based on probabilities of their fitness values.Through a random process, or the spin of the roulette wheel, the new strings for the new generation are created.Thereafter a single point cross-over where the genes of the strings are transferred between parents is performed.The selection of parents for cross-over and determining the position of cross-over in each of the pairs is again a random process (although the developed software allows for other cross-over procedures).

Mutation
To force the solution to include gene strings from the total solution space and to steer away from the local optimum a mutation is performed with a probability equal to the mutation rate.Each gene (pipe) of each string (network solution) has in other words the probability of mutating and being replaced with a randomly selected gene from the available gene pool.

Termination
Following the selection, cross-over and mutation operators and introduction of the new child organisms into the population, the process is repeated until an appropriate termination condition is met.The simplest technique is to use a fixed number of generations or alternatively when complete convergence has occurred or no improvement in the fitness value of the best chromosome has occurred in some fixed number of the previous generations.

Fitness of each string (solution)
If a node does not meet the minimum pressure requirement the pipes supplying that node are penalised.If nodes have negative pressures the pipes supplying these nodes are penalised extensively to emphasize the poor results thereof.The cost of a pipe that results in a node not meeting the minimum pressure requirement will be calculated by modifying Eq. ( 5) as follows (Eq.( 6 If a node does not meet the minimum pressure requirement the pipes supplying that node are penalised.If nodes have negative pressures the pipes supplying these nodes are penalised extensively to emphasize the poor results thereof.The cost of a pipe that results in a node not meeting the minimum pressure requirement will be calculated by modifying Eq. ( 5) as follows (Eq.( 6 All three benchmark problems were analysed utilising a Pentium computer wit a 3.2 GHz Intel processor, 1024 MB RAM, using Microsoft Windows XP Professional as operating system.The GANEO software utilised, an example of an improvement to a distribution network (Example 1) and two examples of designing of new networks (Examples 2 and 3) are described in the following paragraphs.

GANEO software
The developed GA optimisation model as described above required the development of a software program to perform the computations.The developed program called GANEO requires six easy steps to optimise a water distribution system: Step 1: Using EPANET, create a working network model and export the water distribution system into the correct format for importing in GANEO (*.inp file).
Step 2: Create a new project in GANEO and import the EPANET model (see Fig. 2).
Step 3: Create a pipe selection file in GANEO from which pipes (genes) will be selected for population of the network (strings).
Step 6: Evaluate the alternative options and export the result back to EPANET.

Example 1: New York tunnels
The New York water supply system has been studied by a number of researchers in the pipe network optimisation field.The aim of the various studies was to determine the most economical design to improve the existing system of tunnels that constituted the main water distribution system.Figure 4 is a general layout of the system indicating the pipes (Table 3), nodes (Table 4) and single supply reservoir.The existing system was found to be inadequate due to ageing and an increase in demands, in terms of the pressure requirements.

542
The method utilised to improve the system was to lay parallel pipes between certain nodes.There are 15 available commercial diameters (Table 5), which could be used as well as a socalled 'do nothing' option for each of the 21 pipes in the system.The cost per unit length as shown in Table 5 is similar to other researchers' defined cost functions (see Eq. ( 5) with L = 1 m, K = 0.06537, n = 1.24 and D measured in mm).
A Hazen-Williams friction factor of C H = 100 is assumed for both the old tunnels and the new pipes.It has been indicated that the conversion from imperial units to metric units could result in small differences and subsequent changes in optimum solutions (Savic and Walters, 1997).
The only system constraint that this network has is that the minimum head at each node should be as indicated in Table 4.Although the system is fairly unsophisticated and small in comparison to the internal distribution system there are 1.93 x E25 or 16 21 possible solutions for this system.It is thus impossible to analyse every single network improvement alternative.
The optimum obtained with the proposed GA procedure as used in the GANEO program was $M38.65.The number of iterations (generations) it took to obtain this optimum was 684.The convergence to the optimum is shown in Fig. 5.As can be seen the initial reduction in total cost of the system occurs fast after which this improvement process slows down.
The nodes with its pressure closest to the minimum required are Nodes 16, 17 and 19 with pressure elevations of 79.26 m, 83.16 m and 77.73 m respectively.The five best solutions, based on randomly selected initial seed values, and their costs are shown in Table 6.The following GA parameters were used: number of generations = 1 000, population = 100, penalty factor = 5 -6 and mutation rate was set at 3%.(Eq.5) with K = 0.008593, n = 1.5 and D measured in millimetre.
The Hanoi network has been optimised by other researchers as shown in Table 11.Fujiwara and Khang (1990) did not use discrete values for the pipe diameters and for this reason direct comparison with their solution is not possible.
The following GA parameters were used: number of generations = 2 000, population = 100, penalty factor = 2 and mutation rate was set at 3%.The optimum obtained with the GANEO program was $M6.110.The number of iterations (generations) it took to obtain this optimum was 495 and it was achieved after 324 s.
According to Savic and Walters (1997) the solution of Fujiwara and Khang ( 1990) is based on a continuous cost function solution since their method could not handle discontinuous objective functions directly.Furthermore when the Fujiwara and Khang (1990) solution is reanalysed with the range values of ω = 10.5088 and ω =10.9031 it does not meet the minimum 30 m pressure requirement with pressures as low as 10.31 m and 7.69 m respectively.
As can be seen in Fig. 7 the initial reduction in total cost of the system occurs rapidly after which the improvement process slows down.The ground elevation of all the nodes is 0.0 m.The pipes that could be utilised in the design of the system are shown in Table 10.The head constraint for this system is 30 m, i.e. the pressure everywhere in the system should be greater than 30 m.The costs as indicated in Table 10 were determined with the cost function

545
Vairavamoorthy and Ali obtained the optimum solution after approximately 25 min.Liong and Atiquzzaman analysis took approximately 11 min whilst the authors obtained the optimum after generation 495 and 6.4 min.The minimum pressure in the system was 30.045 m at node 32.
As detailed in Table 2 a new weighted penalty function is proposed by the authors.Wu and Walski (2005) provide a comparison of various constraint-handling techniques and this is reproduced in Table 12 (including the author's results).
As can be seen in Table 12 (next page) the proposed GA produced satisfactory results compared to others.Furthermore when the optimum Wu and Walski solution, Table 11, is reanalysed with ω = 10.667 as used in EPANET it does not meet the minimum pressure requirement of 30 m. Direct comparison is thus difficult but the solutions obtained by the authors are competitive.

Example 3: Two-loop network
The two-loop network was first studied by Alperovits and Shamir (1977) and many others thereafter (Keedwell and Khu, 2005).The two-loop network consists of eight pipes, which are fed from a single fixed head reservoir to supply the demands as shown in Fig. 8 (next page).
The pipes in the network are all 1 000 m long and the only system constraint is the minimum pressure requirement for nodes 2 to 7 defined as 30 m.The available pipe diameters and costs that could be used in the design of the system are shown in Table 13.

546
The objective of the system was to determine the required pipe diameters that would yield the least total cost whilst still supplying the demand and adhering to the system constraint of minimum pressure at each node.Although the system seems extremely simple there are still 8 14 possible combinations of pipes.
Savic and Walters (1997) found two solutions consistently at 419 000 and 420 000 cost units (depending on ω) which satisfied the demand and pressure requirements.According to Keedwell and Khu (2005) the algorithm was ran for 500 generations with a population size of 50, i.e. 25 000 network simulations.Using a cellular automata (CA), genetic algorithm (GA) combination approach Keedwell and Khu (2005) also analysed the two-loop network.The Cellular Automation for Network Design Algorithm combined with a GA (called by the authors CANDA-GA) can be described as seeding the GA with CA solutions, i.e. providing a better initial population to start with.The optimisation results of Savic and Walters (1997), Keedwell and Khu (2005) and the authors are presented in Table 14.
As discussed by Savic and Walters (1997) other researchers also provided solutions in this range such as Kessler and Shamir (1989) with 402 352 and Eiger et al. (1994) with 417 500 but these solutions did not obtain the minimum pressure requirement of 30 m as every node.Some other researchers such as Goulter et al. (1986) with a cost of 435 015 and Alperovits and Shamir (1977) with a cost of 497 525 obtained feasible solutions but utilised split pipe solutions.Keedwell and Khu (2005) reported that both algorithms (GA and CANDA-GA) fitness converged after 3 000 generations.If this is compared with what was obtained with GANEO it can be clearly seen that an optimum or near optimum is very quickly obtained.The results furthermore show an improvement of the average total cost of the five runs (different initial seed values) that were performed.The worst result obtained with GANEO was 9 000 cost units better than that obtained by Keedwell and Khu (2005) with the standard GA and 2 000 cost units better than that obtained with the CANDA-GA.

547
The optimum solution of 419 000 cost units is obtained if the pipes as listed in Table 16 are used resulting in a minimum pressure in the system at node 6 being 30.44 m.These results were obtained with the following GA parameter: number of generations = 1 000, population = 100, penalty factor = 1.5 and mutation rate was set at 10%.time limit or when no change in fitness occurs for a set number of generations • Setting of population size -four to one hundred • Penalty factors -the penalty factor can be set for nodes not meeting the minimum pressure requirement as well as a penalty factor for velocities which are greater than a specified value • Pipe/link fixing -when an existing network is analysed for rehabilitation purposes GANEO allows the fixing, restricting the changing or adding, of certain pipes.These pipes will thus be kept as is and won't be upgraded or improved although these will be included in the hydraulic analysis of the system.The reason for fixing a pipe is for instance when it is too costly or difficult to change/improve or if it is part of the existing system that must simply be analysed as part of the new extension of the network.

Conclusions
The developed genetic algorithm optimisation model was tested on three benchmark networks and it has been shown to produce good results in a limited number of generations (in relation to other GA-based methods).The weighted penalty function produced satisfactory final results and showed faster initial convergence.The developed GANEO program can be used in the design and analysis of a new network as well as providing suggestions on how to improve an existing network (adding additional pipes or replacing existing pipes).The EPANET software used for the hydraulic analysis of the systems is well-accepted, well-tested analysis software used in various other hydraulic analysis packages.The EPANET hydraulic modelling software as well as the developed GANEO software, which performs the optimisation of the water distribution network, is freeware.The software provides a tool for consulting engineers to optimise the  Savic and Walters (1997) each run took 10 min CPU time on a PC 486/DX 2 50 and one of these solutions were always identified when 10 runs were performed.

#
The average cost of a number of different runs.The optimum cost of 419 000 was also obtained, see Table 15.

Features of GANEO
The developed GA was implemented in the software package GANEO to test its ability, and some of the features that makes the optimisation process more powerful are: • Initial seed number -setting of the initial seed number for the pseudo random generator (m) D = internal diameter (m) V = velocity (m/s) g = gravitational acceleration (m/s²) λ = Darcy Weisbach friction factor C = Hazen-Williams friction factor ω = numerical conversion constant (in this paper ω = 10.667)

Figure 1
Figure 1Basic genetic algorithm process PC Hmin-j = Cost of pipe 'j' with added penalty cost due to minimum pressure constraint C P = Cost of pipe per unit length, which is KD j L P = Length of the installed pipe b = Penalty factor (b = 5 if P calculated < 0) PF = User specified penalty factor (0.5 to 10) P calculated = Calculated pressure at node P required = Minimum residual pressure required at node website http://www.wrc.org.zaISSN 0378-4738 = Water SA Vol.34 No. 5 October 2008 ISSN 1816-7950 = Water SA (on-line) 540 used to test common networks for which many optimisations have been performed; these include traditional and heuristic methods (Savic and Walters, 1997).Three systems were tested: New York Tunnels (Example 1), Hanoi (Example 2) and Twoloop network (Example 3) detailed below.In all three examples the Hazen-Williams equation (Eq.(4) will be utilised to determine the friction loss in a pipe link between two nodes.Previous researchers have investigated these systems extensively and obtained numerous solutions that met the defined fitness function of minimum cost based on the constraint of required pressure and demand at every node.Due to different interpretations of the Hazen-Williams equation different researchers obtained different solutions (Savic and Walters, 1997) which made direct comparison not always possible.According to Savic and Walters (1997) the numerical conversion constant (ω), see Eq.(4), varied from 10.5088 to 10.9031.The consequence of this variation of the ω values used is that systems designed with ω = 10.5088calculate a lesser friction head loss when compared to ω = 10.9031.This result in solutions meeting the set pressure criteria when analysed with the lower boundary of ω but when reanalysed with the upper boundary failing and thus providing an unfeasible solution.The value of ω as used in the EPANET software, which was used in the hydraulic analyses, is 10.667 (similar toDandy et al. (1996), Montesinos et al.  (1999), Wu and Simpson (2002) andKeedwell and Khu (2005)).

Figure 2
Figure 2 GANEO input screen Figure 3 GANEO results screen

Figure 5
Figure 5Convergence to optimum solution(New York Tunnels)

Figure 7
Figure 7Convergence to optimum solution (Hanoi water distribution network)

TABLE 6 Optimisation results of the New York tunnel system Initial seed Total cost ($M) Optimum obtained during generation Time to reach optimum solution (sec)
Savic and Walters (1997) Walters (1997), GAs are stochastic-search techniques and the solution found will not always be

TABLE 4 Node data of the existing system/network Node Minimum required pressure level (m)* Demand (m³/s)
* The ground elevation is 0.0 m.

TABLE 10 Available pipes from which the pipe selection will be made
A Hazen-Williams friction factor of C H = 130 is assumed for all new pipes

TABLE 14 Optimisation results of the two-loop network
The solution obtained will provide an unfeasible solution if ω = 10.9031,since the pressure at node 3 drops to29.97 m (Coley, 2003)** According to *

•
Selecting of cross-over method -single point, double point or uniform • Selecting the mutation procedure and value -random or Min-Max • Setting the termination criteria -number of generations, Available on website http://www.wrc.org.zaISSN 0378-4738 = Water SA Vol.34 No. 5 October 2008 ISSN 1816-7950 = Water SA (on-line) 548 design or rehabilitation of a water distribution network.Further research is envisaged to include local search procedures once near optimum solutions are obtained.