Evaluation of models generated via hybrid evolutionary algorithms for the prediction of Microcystis concentrations in the Vaal Dam, South Africa

Cyanobacteria are responsible for many problems in drinking water treatment works (DWTW) because of their ability to produce cyanotoxins that potentially can have an adverse effect on consumer health. Therefore, the monitoring of cyanobacteria in source waters entering DWTW has become an essential part of drinking water treatment management. Managers of DWTW rely heavily on results from physical, chemical and biological water quality analyses, from grab samples, for their management decisions. However, results of water quality analyses may be delayed from 3 h to 14 days depending on a magnitude of factors such as sampling, distance and accessibility to laboratory, laboratory sample turnaround times, specific methods used in analyses, etc. Therefore, the benefit to managers and production chemists to be able to forecast future events of high cyanobacterial cell concentrations in the source water is evident. During this study, physical, chemical and biological water quality data from samples taken from 2000 to 2009 in the Vaal Dam, supplying South Africa’s largest bulk drinking water treatment facility, were used to develop models for the prediction of the cyanobacterium Microcystis sp. in the source water (real-time prediction together with 7, 14 and 21 days in advance). Water quality data from the Vaal Dam from 2010–2012 were used to test these models. The model showing the most promising results for incorporation into a ‘Cyanobacterial Incident Management Protocol’ is the one predicting Microcystis sp. 7 days in advance. This model showed a square correlation coefficient ( R 2 ) of 0.90 when tested with the testing dataset (chosen by bootstrapping from the 2000–2009 input dataset) and a R 2 of 0.53 when tested with the 3-year ‘unseen’ dataset from 2010–2012.


INTRODUCTION
Algae and cyanobacteria occur naturally in source waters worldwide. However, certain species are known to form harmful blooms (Harding and Paxton, 2001), which can cause extensive problems in the drinking water treatment industry (Knappe et al., 2004;Meriluoto and Codd, 2005;Zoschke et al., 2011). Cyanobacteria (especially Microcystis sp.) are widely responsible for many water treatment problems due to their ability to produce organic compounds. These organic compounds include the cyanotoxin microcystin (Conradie and Barnard, 2012), which can have an adverse effect on consumer health, as well as taste and odour compounds (like geosmin and 2methylisoborneol) that decrease consumer confidence in drinking water (Zoschke et al., 2011). Therefore, the monitoring of cyanobacteria in source waters entering drinking water treatment works (DWTW) has become an essential part of drinking water treatment management (Swanepoel et al., 2008).
Recently Cyanobacterial Incident Management Protocols (Du Preez and Van Baalen, 2006;Du Preez et al., 2007) and Water Safety Plans (Bartram et al., 2009) have been used to manage incidents of, for example, high cyanobacteria concentrations in source water destined for drinking water purification. In order to fully utilise these management tools (protocols and safety plans), managers and production chemists of DWTW, rely heavily on results of physical, chemical and biological water quality analyses for their water treatment and management decisions. However, results of water quality analyses can be delayed from 3 h to 7 days or longer, depending on factors such as sampling, distance and accessibility to laboratories, laboratory sample turnaround times, and specific methods used in the analysis, etc. (Swanepoel et al., 2008). Therefore, the application value of models that are able to predict the cyanobacteria concentration in source waters, a few days or weeks in advance, is evident. Such models will enable managers and production chemists of DWTW to prepare for a cyanobacteriarelated incident before it occurs.
Previous studies have demonstrated that highly complex ecological time-series data can be successfully probed to develop rule sets as prediction tools, by using hybrid evolutionary algorithms (HEAs) (Talib et al, 2007;Chan et al., 2007;Recknagel et al, 2008;Van Ginkel, 2008;Welk et al, 2008;Recknagel et al., 2013 andRecknagel et al., 2014). Ecological data is considerably more prone to observational and/or measurement noise and the ecological interactions are inherently more complex and nonlinear. In a previous study by Van Ginkel et al. (2010), different ecological informatics modelling techniques were compared. The rule set discovered by hybrid evolutionary algorithms (HEA) proved to be highly applicable to the hypertrophic reservoirs of South Africa. During the current study, physical, chemical and biological water quality data from samples collected from 2000 to 2009 in the Vaal Dam were used to develop models for the prediction of Microcystis sp. in the source water. The aim of this study was to evaluate the suitability of Microcystis sp. prediction models in the Vaal Dam (real-time, 7, 14 and 21 days in advance), for application to a large bulk drinking water treatment facility and possible incorporation into its 'Cyanobacterial Incident Management Protocol' (Du Preez and Van Baalen, 2006). This will enable the DWTW to initiate preventative measures for dealing with source water containing high concentrations of Microcystis sp. cells, before it even reaches the plant.

Study site
The Vaal Dam (Fig. 1) is approximately 150 km south of Johannesburg, South Africa. The catchment area of the dam is approximately 38 500 km 2 with a wall height of 63.4 m above the lowest foundation (DWA, 2013b). The Lesotho Highlands Water Project pumps water into the system in order to supply water to the industrial hub of Gauteng. This water is being transported from Lesotho via the Liebenbergsvlei and Wilge Rivers (LHDP, 2013). The Vaal Dam is classified as mesotrophic, according to the classification system used by the South African Department of Water Affairs (DWA), where mean total phosphate (0.077 mg/L), mean chlorophyll-a concentration (14.8 µg/L) and percentage of time where chlorophyll-a is >30 µg/L (17%) is taken into account (DWA, 2013a).
From the Vaal Dam, a 20 km long canal supplies water to Stations 3 and 4 at the Zuikerbosch DWTW -South Africa's largest bulk drinking water treatment facility (Fig. 1). This facility can produce approx. 3 000 ML of drinking water per day (depending on demand). Samples for analyses are collected at the dam wall (coordinates: X: 28.12059553; Y: −26.88444867); the lake behind the dam wall has a surface area of about 320 km 2 and is 47 m deep at full capacity (DWA, 2013b). Results from physical, chemical, and biological analyses done by Rand Water's Analytical Services Laboratory on water samples from the Vaal Dam supplying the Zuikerbosch DWTW, for the period 2000 to 2012, were used in this study (Fig. 1).

Physical, chemical and biological analyses of water
Sampling and laboratory analyses of samples from the Vaal Dam took place once a month. All chemical and biological analyses were carried out according to SANAS (South African National Accreditation System -affiliated at ILAC), accredited standard methods (APHA, 2013).
The Microcystis sp. counts were performed according to the phytoplankton identification and enumeration method described by Swanepoel et al. (2008). During sample preparation, the gas vacuoles of cyanobacteria were pressure-deflated using a specially-designed mechanical hammer that exerts a pressure of 49.5 kPa on the sample (Walsby, 1971(Walsby, , 1994, which is approximately the pressure needed to collapse the gas vacuoles of cyanobacteria. The sample was then homogenised at 13 000 r/min for ±15 s after which 3 mL of sample was pipetted into a sedimentation chamber. The sedimentation chambers were then centrifuged for 10 min at 3 500 r/min to allow phytoplankton cells to settle to the bottom thereof. After settling, all phytoplankton cells were identified and enumerated with an inverted light microscope, using the technique described by Lund et al. (1958) and adapted for Rand Water by Swanepoel et al. (2008). One of the eyepieces of the microscope contains a Whipple grid to delineate the counting area (called a 'field').
The glass bottoms of the sedimentation chambers were examined in 'fields' covering most parts of the sedimentation chamber, while counting all algal cells inside the grid or 'field'. The original sub-sample volume that was transferred to the sedimentation chamber, the area of the sedimentation chamber, the area of a 'field' as well as the number of 'fields' counted, were used to calculate the concentration of individual phytoplankton genera as cells per millilitre (cells/mL).

Statistical analyses
Principal component analysis (PCA) was carried out on the input dataset used for the model development in order to characterise the water in the dam according to the relationships between variables. All physical, chemical and biological variables were used as concentrations but centred and standardised to compensate for unit differences in the PCA. The cyanobacteria concentration was the only variable transformed to the natural log of the concentrations to reduce the large variability in the cyanobacteria counts. The computer package CANOCO, Version 4.5 was used (Ter Braak, 1988) to perform the PCA. Ordinations were interpreted using the following rationale: Parameters are (i) positively correlated with each other if their arrows subtend a small angle, (ii) not correlated if their arrows are 90°, (iii) negatively correlated if their arrows are directed oppositely (180°); (iv) parameters with the longest arrow relative to an axis have the greatest influence on that axis.
Square correlation coefficients (R 2 ) and root mean square error (RMSE) of the models were tested with (i) 25% of the data from the original database (2000 -2009) that was used for training the models (chosen by bootstrapping and called the 'testing database') and (ii) 3 years of 'unseen data' (data not used in training the models -2010-2012), were determined by XLSTAT, Version 2009.4.06.

Hybrid evolutionary algorithms (HEAs)
Evolutionary algorithms (EAs) are adaptive methods used in search of suitable representations of models, which recognise patterns in data sets. EAs mimic the processes of biological evolution, natural selection and genetic variation based on the principle of 'survival of the fittest' (Welk et al., 2008, from Cao et al., 2006. EAs have been designed to discover predictive rule  correlation coefficient (R 2 -value) and (iii) visual comparison between the predicted and measured data as according to Chan et al. (2007) and Bennet et al. (2013). For applications of the HEAs an initial population of 100 and a maximum number of generations and repetitive runs of 80 were chosen, because the database was relatively large and 80 repetitive runs could take anything from 24 h to 72 h to complete. The rule-sets were discovered and optimised using a large-scale parallel computational device and relevant software developed in the Ecoinformatics and Watershed Ecology Laboratory at the University of Adelaide, Australia.
Sensitivity analyses were carried out for the best performing predictive rule sets as follows: The minimum, maximum and median of all input variables used to develop the model were determined. A linear range of all the variables used in each model (either in the THEN or the ELSE branch) was constructed ranging from the minimum (at 0%) to the maximum (at 100%) in increments of 5%. To determine the sensitivity of the model towards a specific variable, the model was tested by substituting all variables with the median thereof, except for the variable being tested. The tested variable was substituted with the range of values from 0% to 100% on the x-axis and the result from the model on the y-axis. The curve with the steepest slope (either positive or negative) was identified as the variable towards which the model showed the greatest sensitivity. This implies that small changes in a variable towards which the model shows a high sensitivity will have a bigger influence on the result of the model when compared to a variable towards which the model shows a low sensitivity.

Characterisation of Vaal Dam water
A PCA was performed on the same dataset used to develop the models (Vaal Dam monthly collected physical, chemical sets in complex ecological time-series data by applying genetic programming for the optimisation of the rule structures (IF x, THEN y, ELSE z) and genetic algorithms for the optimisation of parameters of the rule sets (Cao et al., 2006;Recknagel et al., 2008). For this study a Hybrid EA (HEA) designed for rule discovery in water-quality time-series was applied (Cao et al., 2006). Hybridisation was used in order to improve the performance of the evolutionary algorithm and to improve the quality of the solutions obtained by the algorithm (Grosan and Abraham, 2007). Improvement of the models was achieved by structure optimisation using genetic programming as well as parameter optimisation by using genetic algorithms (Welk et al., 2008).
The HEA was applied for short-time forecasting of Microcystis sp. concentrations in the Vaal Dam using physical data (turbidity, water temperature and Secchi disk depth) and chemical data (conductivity, pH, dissolved oxygen, PO 4 3− , NO 3 − , NH 4 + , Si, Fe 2+ , Mn 2+ , chemical oxygen demand), as well as biological data (chlorophyll-a concentration and initial cyanobacterium inoculum). Table 1 displays the descriptive statistical values of the input data used for the model development.
Because of the fact that samples were only taken once a month, and prediction time necessitated date ranges of 7, 14 and 21 days, the data were linearly interpolated to have corresponding results for all variables at a frequency of 360 days per year. For the development and first stage of testing of the models, 75% of the dataset was used to train the models and 25% of the dataset was used to test the models. Boot-strapping (i.e. random selection) were used to determine which 75% of the dataset was to be used as the training dataset and which 25% as the testing dataset. Boot-strapping also implies that a different 75% portion of the data will be used during training for all the different models. Fifty different models were developed for each set of 'x input variables = 1 output variable' chosen beforehand. From the 50 models, the best model relating the measured data and the predicted data were chosen based on the following criteria: (i) root mean square error (RMSE), (ii) the square  Table 2 and the results are represented in Fig. 2.
From the results in Table 2 and Fig. 2, it is evident that the first axis, which accounts for 22% of the variation, mostly explains the variation in the nutrients (NO 2 − , NO 3 − , NH 4 + and PO 4 3− ) as well as chemical oxygen demand (COD), Mn 2+ and water temperature (Temp). The second axis, which accounts for an additional 18% of the variation, mostly explains the variation in turbidity (Turb), Secchi disk depth (Sec), Fe 2+ , Si, pH, electrical conductivity (Cond), chlorophyll-a (Chla) and cyanobacteria (LnCyano).
Nutrients ( ) together with Mn 2+ are higher during the colder winter months, since the arrows representing them lie in the opposite direction to the arrow representing temperature. High turbidity associates closely with high pH, high Si, high Fe 2+ and high chlorophyll-a (Chla), while high cyanobacteria (LnCyano) concentrations associate with low conductivity (Cond), low dissolved oxygen (DO) and low Secchi disk depth (Sec). The arrow representing chlorophyll-a subtends a ±90° angle with water temperature (Temp) indicating that high chlorophyll-a concentrations do not only occur during summer or winter, but vary throughout the year in the Vaal Dam. One can therefore deduce that the chlorophyll-a level is not solely caused by the presence of cyanobacteria, but other phytoplankton as well. Chlorophyll-a shows a positive correlation with pH indicating that, during periods where algal blooms occur, pH increases, most probably due to the consumption of CO 2 during photosynthesis.

Real-time Microcystis sp. prediction
For the best model developed for real-time Microcystis sp. prediction the IF criterion of the model (Fig. 3) is determined by the Fe 2+ concentration. The THEN branch of the model (Fig. 3a) represents the low-range rule set and shows the greatest sensitivity towards the initial cyanobacteria inoculum. The ELSE branch of the model (Fig. 3b) represents the high-range rule set and shows the greatest sensitivity towards the initial cyanobacteria inoculum. The other variables used in the model (conductivity and Mn 2+ in the THEN branch and pH, DO and chlorophyll-a, in the ELSE branch) display very little influence on the predicted Microcystis sp. concentration.
The comparison, between the measured Microcystis sp. concentration and that resulting from the models predicting realtime Microcystis sp. when using the 25% boot-strapped testing dataset (Fig. 4a), shows a R 2 -value of 0.95 and a root mean square error (RMSE) of 4 262.2 cells/mL. When the model was tested with 3 years of 'unseen data', the correlation showed a R 2 -value of 0.97 and a RMSE of 4 766.6 cells/mL (Fig. 4b), indicating that the event prediction of increased Microcystis sp. concentration together with the magnitude of the event displayed a significant correlation.

Microcystis sp. prediction 7 days in advance
For the best model developed for the prediction 7 days in advance the IF criterion of the model is determined by a combination of conductivity, PO 4 3− and pH (Fig. 5). The THEN branch of the model (Fig. 5a) represents the highrange rule set and shows the greatest sensitivity towards the initial cyanobacteria inoculum. The ELSE branch of the model (Fig. 5b) represents the low-range rule set and shows the greatest sensitivity towards the initial cyanobacteria inoculum. The other variables in this model (namely conductivity and DO), in comparison to the initial cyanobacteria concentration, display very little influence on the predicted Microcystis sp. concentration.
The comparison, between the measured Microcystis sp. concentration and that resulting from the models predicting Microcystis sp. 7 days in advance, when using the 25% bootstrapped testing dataset (Fig. 6a), shows a R 2 -value of 0.90 and a RMSE of 3 135.7 cells/mL. When the model was tested with 3 years of 'unseen data' (Fig. 6b), the model showed a R 2 -value of 0.53 and a RMSE of 44 559 cells/mL. The event prediction   'unseen data' from the Vaal Dam of increased Microcystis sp. concentration showed a significant correlation; however, it seems that the Microcystis sp. concentration is over-estimated somewhat by the model.

Microcystis sp. prediction 14 days in advance
The IF criterion of the best model developed for the prediction 14 days in advance is determined by the chemical oxygen demand (COD) (Fig. 7). The THEN branch of the model ( Fig. 7a) represents the high-range rule set and shows the greatest sensitivity towards the initial cyanobacteria inoculum. The ELSE branch of the model (Fig. 7b) represents the low-range rule set and shows the greatest sensitivity towards the initial cyanobacteria inoculum. The other variables in this model, namely dissolved oxygen (DO) and NH 4 + in the THEN branch and DO and water temperature (Temp) in the ELSE branch, display very little influence on the predicted Microcystis sp. concentration.
The comparison between the measured Microcystis sp. concentration and that of the results from the models predicting Microcystis sp. 14 days in advance, when using the 25% bootstrapped testing dataset (Fig. 8a), shows a R 2 -value of 0.79 and a RMSE of 4 493.7 cells/mL. When the model was tested with 3 years of 'unseen data' (Fig. 8b), the model showed a R 2 -value of 0.39 and a RMSE of 48 129.6 cells/mL. The event prediction of increased Microcystis sp. concentration together with the magnitude of the event showed a significant correlation.

Microcystis sp. prediction 21 days in advance
The IF criterion of the best model developed for the prediction 21 days in advance, is determined by a combination of nutrients (PO 4 3− and NH 4 + ) concentrations (Fig. 9). The THEN branch of the model (Fig. 9a)  the greatest sensitivity towards the initial cyanobacteria inoculum. The ELSE branch of the model (Fig. 9b) represents the high-range rule set and shows the greatest sensitivity towards the initial cyanobacteria inoculum as well as the Si concentration, particularly during the lower 10% of the input range. The other variables in this model (dissolved oxygen (DO), Si, chlorophyll-a, and NO 3 − in the THEN branch, and turbidity in the ELSE branch), display very little influence on the predicted Microcystis sp. concentration.
The comparison between the measured Microcystis sp. concentration and that of the results from the models predicting Microcystis sp. 21 days in advance, when using the 25% bootstrapped testing dataset (Fig. 10a), shows a R 2 -value of 0.74 and a RMSE of 4 993.6 cells/mL. When the model was tested with 3 years of 'unseen data' (Fig. 10b), the model showed a RMSE of 18 493.9 cells/mL and a R 2 -value of 0.25, which is not a good correlation. Neither event prediction nor the magnitude of the increased Microcystis sp. concentration demonstrated a significant correlation when the 3 years of 'unseen data' were tested on the model.

Comparison between models
The frequencies of the different input variables used in the models to predict Microcystis sp. concentrations are displayed in Table 3, ranging from the most frequently included variable to that least frequently included in the models. The frequency distribution table (Table 3) indicates that the initial cyanobacteria concentration and the dissolved oxygen concentration were the variables most frequently used in the models to predict Microcystis sp. concentrations. Turbidity, and nutrients (NH 4 + , NO 3 − and PO 4 3− ), as well as Fe 2+ and chlorophyll-a concentration, were used in 50% of the models. The rest of the variables (water temperature, conductivity, pH, Si, Mn 2+ and COD) were only used in 25% of the models.    Table 4 indicates the summary of the statistical and visual comparisons between the models tested with (i) the 25% of the original dataset chosen as testing dataset by bootstrapping and (ii) 3 years of 'unseen data' from follow-up years that were not used in the model development.
The square correlation coefficients (R 2 ) decrease with increasing time prediction and overall the square correlation coefficients when testing the models with 'unseen data' did not correlate as well when compared to tests with the 25%

DISCUSSION
Approximately 40% of the variation in the physical, chemical and biological data in the Vaal Dam from 2000-2009 could be explained by the first two axes of the two principle component analyses (Fig. 2) performed on the dataset used to develop the models. The PCA was performed in order to determine which of the variables would influence cyanobacteria and most probably be important as input variables for model development.
The negative correlation of nutrients (NO 2 − , NO 3 − , NH 4 + and PO 4 3− ) to temperature (Temp), might be due to large cyanobacteria blooms in summer utilising and depleting the nutrients causing nutrient concentrations to be higher during winter times, when cyanobacteria or other phytoplankton concentrations are lower in the Vaal Dam. Chlorophyll-a and pH correlate positively indicating that increasing photosynthesis will inevitably increase the pH as CO 2 is removed from the aquatic environment. The negative correlation between chlorophyll-a and dissolved oxygen is most probably due to temperature since the highest levels of DO were observed during winter, although it may also be due to aerobic bacterial activity when large numbers of phytoplankton cells are decomposed during and after blooms. The fact that the arrow representing chlorophyll-a subtends a ±90° angle towards the arrow representing temperature, indicates that high chlorophyll-a concentrations are not limited to a specific season (either high or low temperatures) but can vary throughout the year. The chemical and biological results from the Vaal Dam indicate that it is in a mesotrophic state, but chlorophyll-a concentrations as high as 194 µg/L have been detected in the Vaal Dam during this period (Table 1).
The variables used by the HEA to predict the Microcystis sp. concentration in the Vaal Dam were: the initial cyanobacteria inoculum, dissolved oxygen (DO), turbidity (Turb), NH 4 + , PO 4 3− , NO 3 − , Fe 2+ , chlorophyll-a (Chla), water temperature (Temp), conductivity (Cond), Secchi disk depth (Secchi), pH, Si, Mn 2+ , and chemical oxygen demand (in order of most to least frequently incorporated into the models - Table 3). The importance of these variables was also evident in the PCA which indicated that the nutrients, especially PO 4 3− , NO 2 − , NH 4 + and NO 3 − , could explain the variation in the Microcystis sp. concentration and indirectly that of DO, due to photosynthesis and temperature. Initial cyanobacteria inoculum and dissolved oxygen were incorporated in all of the models predicting Microcystis sp. concentration, with the nutrient concentrations (NH 4 + , PO 4 3− or NO 3 − ) used separately or in combination in 50% of the models. It should be noted, however, that at least one of the nutrients (either NH 4 + , PO 4 3− or NO 3 − ) is incorporated in all Microcystis sp. models, except the model predicting the realtime concentration (Fig. 3). The reason for this may be that the real-time Microcystis sp. concentration cannot be influenced by a change in the nutrient concentration on that day, while the future occurrence of Microcystis sp. will inevitably be influenced by the nutrient concentration in the water.
The sensitivity analyses of the models predicting the Microcystis sp. concentration indicate that the greatest sensitivity is towards the initial cyanobacteria inoculum, Figs 3a and 4b, Fig. 5a and 5b, Figs 7a and 7b as well as Figs 9a and 9b. The initial cyanobacteria inoculum will have a large influence on the Microcystis sp. concentration, provided that the total cyanobacteria inoculum mostly comprises of Microcystis sp. cells (as is usually the case in the Vaal Dam). The 21 days in advance Microcystis sp. model also shows sensitivity towards the Si concentration, particularly during the first 10% of the silica input range. Silica might represent a secondary effect on Microcystis sp. concentrations, since Si is mostly utilised by diatoms in winter (Wetzel, 2001), when cyanobacteria like Microcystis sp. are not abundant in the Vaal Dam.
The models predicting the occurrence of Microcystis sp. (Figs 4a, 6a, 8a and 10a) show relatively good square correlation coefficients (R 2 -values range from 0.95 at real-time prediction to 0.74 at 21-days prediction) when tested with the 25% bootstrapped testing dataset from [2000][2001][2002][2003][2004][2005][2006][2007][2008][2009]. Although the testing with the 3-year 'unseen data' (Figs 4b, 6b, 8b and 10b) did not show square correlation coefficients as high as when tested with the 25% boot-strapped results (R 2 values range from 0.97 at real-time prediction to 0.25 at 21-days prediction), it was still regarded as a significant correlation (with the exception of the model for 21 days in advance). Overall, the square correlation coefficients decrease and the RMSE increase with increasing prediction times (Table 4), displaying the increase in uncertainty over longer prediction periods. The visual inspection of the models was essential in determining the suitability of the model for further application (Bennet et al., 2013).
Currently the Zuikerbosch DWTW managers and production chemists are solely reliant on laboratory analyses of cyanobacteria cell counts, which (depending on sampling, distance and accessibility to laboratory, laboratory sample turn-around time and various other facts) may delay results for up to a week or even longer (Swanepoel et al., 2008). By the time the results become available, consumers might already have been exposed to cyanotoxins in their drinking water. With the prediction models, the managers and production specialists at DWTW can anticipate the occurrence of Microcystis sp. in the source water and start preparations before it happens. The models that would most probably have the greatest value when incorporated into the 'Cyanobacteria Incident Management Protocol' of the Zuikerbosch DWTW (Du Preez and Van Baalen, 2006) are the models predicting the Microcystis sp. 7 days in advance. A 7-day advance warning gives the plant sufficient time to prepare for incidences of high cyanobacteria and their related metabolites (e.g. microcystin) in the source water.

CONCLUSIONS
The most important variables for predicting of Microcystis sp. in the Vaal Dam were shown to be initial cyanobacteria inoculum and dissolved oxygen as they occur in 100% of the models. Initial cyanobacteria inoculum will determine how many cells are available for further bloom development. Dissolved oxygen is probably included due to the significant negative correlation with cyanobacteria which usually blooms during higher temperatures. Nutrients (either PO 4 3− , NH 4 + or NO 3 − ) are also important in predicting Microcystis sp. concentrations in advance (7-21 days).
The models that most probably would have the greatest value when applied at the Zuikerbosch DWTW are the models predicting Microcystis sp. 7 days in advance, since those were the most accurate. Seven days is sufficient time to prepare for treatment of source water containing cyanobacteria and their related metabolites.
It is evident that these predictive models will contribute significantly in anticipating and managing high Microcystis sp. concentrations in the source water supplying the Zuikerbosch DWTW. These models might also have application value to recreational water users, where event managers of large and small water-sport events can use such models to predict the Microcystis sp. concentration in the water whenever recreational events are planned.