Development of an automated desktop procedure for defining macro-reaches for river longitudinal

This paper presents an automated desktop procedure for delineating river longitudinal profiles into macro-reaches for use in Ecological Reserve assessments and to aid freshwater ecosystem conservation planning. The procedure was developed for use where there are limited data and/or where a repeatable, statistically defensible regional or national assessment is required. The delineation of longitudinal profiles into macro-reaches between ‘controls' or ‘break points' such as exposed resistant rock formations, knick points, or significant changes in lithology provides the initial coarse filter for further assessment of lower levels of organisation, channel type for example. The division is necessary, as research has demonstrated that not all macro-reaches respond in the same way to disturbance or stress, nor do they have the same biotic assemblages. Four statistical
methods (Von Neumann mean square error, CUSUM plots or unweighted values and the Worsley Likelihood Ratio Test (WLRT)) were used to define macro-reach breaks for four South African rivers (Crocodile, Olifants, Mhlathuze and Seekoei
Rivers) and were compared to previously defined macro-reach delineations based on expert-driven approaches. Results
indicate that the CUSUM and WLRT approaches most closely match the macro-reach breakspoints as defined by the expert-driven approach. An automated desktop procedure was developed for computing statistically defensible, multiple change points along profiles using an adaptation of the WLRT method. The adapted approach does not require an a priori knowledge of the break points, as is the case in other applications of the WLRT. It is concluded that the adapted WLRT approach can be used with a reasonable degree of certainty where there are insufficient data and/or where a regional or national assessment is required that is repeatable and statistically defensible. Where possible, however, there is no substitute for primary data collection,
field work and a detailed expert-driven approach. Water SA Vol.32 (3) 2006: pp.395-402


Introduction
The South African National Water Act (No. 36 of 1998) transformed water resource management in South Africa.The pre-1998 apartheid-based legislation gave way to legislation that seeks to achieve a balance between protection and utilisation of the nations' water resources for the benefit of all.This progressive legislation stresses the twin themes of sustainability and equity and seeks to 'legislate for sustainability' at a national level.These themes are echoed in parallel legislation, the South African National Environmental Management: Biodiversity Act (No. 10 of 2004), which seeks, amongst other things, to ensure that aquatic biodiversity is conserved (Roux et al., 2005).To help meet these legislative requirements, a number of enabling tools, processes and mechanisms, some in their infancy, have been developed (e.g.DWAF, 1999;Brown and Joubert, 2003;King et al., 2003;DWAF, 2004;2005;Nel et al., 2005).As these tools were developed to meet the needs of state departments mandated to allocate water (Department of Water Affairs and Forestry (DWAF)) and to conserve the environment (Department of Environment Affairs and Tourism (DEaT)), it is important that their founding concepts, assumptions and logic trains are transparent and defensible and that the tools are practical and implementable.
It is to this end that a method was developed to undertake repeatable and unbiased assessments where an expert-driven approach is not feasible.This paper presents a repeatable, statistically defensible technique for dividing river longitudinal profiles into units that can be used as part of the Ecological Reserve determination process (DWAF, 1999).The technique can also be used to aid the process of determining regional and national spatial biodiversity plans for freshwater ecosystem conservation (Nel et al., 2005).

River classification systems
There are numerous river classification systems (see Berman, 2002 for a recent comprehensive review), most of which recognise the biophysical complexity of river systems across space and time.Coping with this complexity in a conceptual model represents a significant challenge (Thorp et al., 2005).Nevertheless, most conceptual models recognise that hierarchical classification is a valuable means to organise, interpret and understand complex systems such as fluvial landscapes (Berman, 2002;Poole, 2002).Both structure-based (e.g.Jensen et al., 2001;Higgins et al., 2004) andprocess-based (e.g. Montgomery, 1999;Church, 2002) hierarchical classification systems divide the river (usually the longitudinal profile) or catchment into similar reaches, zones or patches.These are variably called macro-reaches (e.g.van Niekerk et al., 1995;Moon et al., 1997), reaches (Rosgen, 1996;Fox et al., 1996;Rowntree and Wadeson, 1997;Rice and Church, 2001), zones (Harrison, 1965;Noble and Hemens, 1978;Western et al., 1997;Rowntree and Wadeson, 2000;Thoms and Parsons, 2003), functional process zones (Metsi Consultants, 1999) and hydrogeomorphic patches (Thorp et al., 2005).The assumption is made that within these spatially-defined units (for the purposes of this paper the term macro-reach will be used) there is sufficient uniformity in terms of physical form, process and response that the unit can be managed and interpreted in a consistent manner (Montgomery and Buffington, 1998).
The delineation of macro-reaches is, however, an expertdriven process, and while this is desirable, where there is limited information, or where a regional-or national-scale approach is required (cf.Stein et al., 2002;Nel et al., 2005), a repeatable, statistically defensible desktop approach is necessary.This paper presents an automated desktop procedure for delineating river longitudinal profiles into macro-reaches.This tool was developed as part of a project that seeks to develop policy and planning tool(s) for the systematic conservation planning of freshwater ecosystem biodiversity in South Africa (Nel et al., 2005).The approach can also be used as part of a suite of tools for determining EcoStatus (cf.Kleynhans et al., 2005a) within an Ecoregion context (cf.Kleynhans et al., 2005b).It should be noted, however, that while this approach presents a statistically defensible method for delineating break points along a profile, these may not be ecologically significant.

Division of river longitudinal profiles
River longitudinal profiles are idealised as logarithmic curves from source to mouth.An idealised profile occurs where an equilibrium condition is attained between the processes of erosion, transport and deposition along the profile mainly in response to elevation.Idealised profiles can be defined for different types of rivers, or sections of river.For example, Rice and Church (2001) point out that exponential or quadratic functions best describe longitudinal profiles of aggrading alluvial systems unaffected by significant lateral inputs of water or sediment.This ideal, however, is seldom evident except over short sections of river.Divergence from the idealised curve, however, provides useful clues as to the evolutionary pathway of the fluvial system (Rãdoane et al., 2003).
In South Africa, the division of river longitudinal profiles into macro-reaches forms part of the Ecological Reserve assessment process (Rowntree, 2000).Most rivers cross a variety of geological strata that also have intrusions of more resistant rocks, dolerite for example (cf.Tooth et al., 2002;2004).Further, over time, tectonics, river capture, climate change and changes in base level alter the equilibrium level toward which the profiles tend (Sinha and Parker, 1996).This results in profiles that are far from the 'ideal' and are consequently irregular along their length.This requires the overall profile to be divided into shorter macro-reaches that extend between 'controls' or 'break points' such as exposed resistant rock formations, knick points, or significant changes in lithology.The physical characteristics (template) of the macro-reach constrain form and process at lower levels of organisation, channel type for example (cf.Dollar et al., 2006).Thus, nested within a macro-reach might be a single channel type, 'anabranching' (cf.Tooth and McCarthy, 2004) for example, or a combination of channel types, such as alternating sequences of 'braided' and 'anabranching' channel types.This is of significance, as evidence has shown that not all macro-reaches (and/or channel types) respond in the same way to disturbance or stress (e.g.Rountree and Rogers, 2004;Parsons et al., 2005a;b), nor do they have the same biotic assemblages (Van Coller et al., 2000).
In South Africa, macro-reach boundaries are commonly defined on the basis of major breaks in valley slope (cf.Rowntree, 2000;Heritage et al., 2000), valley form, potential sediment yield, the position of major tributaries in relation to the main stem, major lithological and structural changes, and where possible, an analysis of channel type from video helicopter surveys.There are good theoretical (Lacey, 1930;Blench, 1952;Chang, 1988) and empirical grounds (Van Niekerk et al., 1995;Rowntree and Wadeson, 1999;Tooth et al., 2002;Dollar and Rowntree, 2003;Rãdoane et al., 2003;Tooth and McCarthy, 2004;Tooth et al., 2004) for utilising changes in average longitudinal slope around a point as a useful means to discriminate between macro-reaches, especially where no other data are available.Evidence from Ecological Reserve studies over the past 10 years has demonstrated that there is usually a good correlation between macro-reaches and lower level descriptors such as channel type (Dollar and Bijker, 2002;Dollar, 2003).However, where there are no data, or where macro-reach delineation is required for large areas, there are currently no statistically defensible desktop techniques for delineating macro-reach boundaries.
The following section explores some of the available methods for identifying change points in river longitudinal profile data.These were compared to four river longitudinal profiles (the Mhlathuze River in northern KwaZulu-Natal, the Crocodile River in Mpumalanga, the Olifants River in Mpumalanga and the Seekoei River in the Northern Cape) that had previously been sub-divided using survey data, 1:50 000 topographical maps, 1:250 000 geological maps, aerial photographs and helicopter video footage as part of Ecological Reserve assessments.
The assumption is made that if a statistical method can identify similar change points, then the method can be applied with reasonable certainty to rivers for which expert-driven assessments have not been performed.

Statistical methods to find change points in river longitudinal profile data series
Various statistical methods were evaluated to determine which were suitable for defining significant changes along river longitudinal profiles.Four possible methods are described here.

Von Neumann Ratio
The Von Neumann Ratio is a ratio of cumulative differences between successive data points to the cumulative difference from the mean: (1) The position of the breakpoint on the river longitudinal profile was taken as the position where N was at a minimum, provided the calculated value was less than the test statistic given in Owen (1962).

Mean square error (MSE)
If k is defined as the position where a shift in the mean occurs, then the mean square is defined as (after Taylor, 2000): (2)

397
The data are split at k and the mean square error is calculated.The position of k in the series that gives the lowest value of the estimate is the last data point before the change.As such, there is no level of significance against which to test this statistic to determine whether the break point is significant.However, Hinkley and Schechtman (1987) suggest that a bootstrapping technique can be used to determine the level of significance.The bootstrapping technique is also applicable to the CUSUM (see below), although the bootstrap method is not described in detail here.For this method, a demonstration version of a computer program was used (Change Point Analyzer by Taylor, 2000).

CUSUM plots or unweighted values
One of the commonly used methods to determine deviations from homogeneity in a data series is the CUSUM plot.Consecutive values are added to reach a cumulative profile for a set of values.The changes are visually assessed and the data are subdivided at positions of distinct change.For example, Brizga et al. (1993) used such a plot to delineate flood-and droughtdominated regimes (FDR and DDRs respectively) for Australian streamflow records.It can also be used to define positions of significant changes in slope of river longitudinal profiles, i.e. break points for macro-reaches: (3) where: Rather than simply adding subsequent values, cumulative deviations from the mean may be considered: The test statistic used to indicate a change in the level of the mean is Q.High values of Q reflect a change in mean.Buishand (1982) presents a table of critical Q test statistics based on sequences of random Gaussian numbers.In order for a change to be significant, must exceed the critical values.The critical values decline with increasing values of n, and decreasing level of significance.The position of the change point is also known because it corresponds to the data point at which the Q statistic is at its maximum. (7)

The Worsley Likelihood Ratio Test
The Worsley Likelihood Ratio Test (WLRT) finds the most likely position of a change in mean in a data set.The method calculates a sum of deviations from the mean and weights them according to their position in the series.The partial sums are rescaled and adjusted by dividing through by the sample's standard deviation (Buishand, 1982).The advantage of the WLRT method is that it can determine the position of the change point whereas a Student t-test can only test whether the hypothesis of a change point is true if the position of the change is known.Worsley (1979) derived a method for determining the most likely position for a change in mean of a data set.In the preceding method, S k * the rescaled, adjusted partial sum of deviations, there is no weighting to account for the position of the data point within the set.In the method derived by Worsley, a weighting factor is applied, and it is proportional to the position within the data set.Points at the start and end of the data set receive the most weighting.
The derivation presented below is after Buishand (1982) since it is easier to apply than the rigorous proof presented in Worsley ( 1979): (8) where: Z k ** is the weighted rescaled adjusted partial sum and is obtained by dividing Z k * by the sample standard deviation.k =1, 2,…, n then: There is a unique relationship between V and the test statistic W derived by Worsley (1979).If only the position of the change point is required, then it is unnecessary to calculate W, but if the level of significance is also required, then W is computed.Critical values for the test statistic W are presented in Worsley (1979).As with the critical values, W decreases as the number of data points in the set grows and increases with the required level of significance.

Application of test statistics to data sets with more than one change point
Analysis of longitudinal profiles or CUSUM plots of deviations from the mean for most rivers show that more than one change point is common and that the profile should be sub-divided into multiple macro-reaches.However, the number of macro-reaches is not constant and is specific to the river profile under investigation.Stephens (1994) presented a method of determining multiple change points in data sets.A problem with the method is that the number of change points must be known in order to write the simultaneous equations to be solved; this, however, is a circular argument.A similar approach was adopted by Brizga et al. (1993) who applied the WLRT to all successive periods of 10 years in the flow record for 3 rivers in Victoria, Australia.For a value to be considered a change point, it needed to be the maximum Z k * value for more than half the iterations at that sample length.Breakpoints that divided the data into periods of less than 10 years were discarded.This, however, is also a circular approach, since prior information was used to bring about the desired result.
It is instructive to note that the aforementioned test statistics were derived to determine the position and level of significance of a single change in mean in a series of data.The method proposed in this paper, however, takes the value with the maximum S k ** or Z k ** in the data series as the first change point.The set is then split at this point and the process repeated on the two subsets (Fig. 1).The splitting of the sample set at change points continues until the test statistics are below the critical values (except for the Von Neumann Ratio where values need to remain below the test statistic) presented by Owen (1962), Worsley (1979) and Buishand (1982), or the number of data points between change points is less than three (the methods fail at spacing of three or less).Although the critical values are given for selected values only, values were linearly interpolated for the remainder.
The advantage of applying the methods in this manner is that no prior knowledge is required of the positions of breakpoints along the river longitudinal profile.Further, no iteration is needed to find the best possible fit of the number of macroreaches.In order to make the number of breakpoints manageable for an automated assessment, a rule was inserted into the coding to ensure that a breakpoint could not occur within 5 km of one of a higher rank.In practice, it is possible to have macroreaches as short as a few hundred metres, but for coarse-scale resolution this is impractical and very difficult to verify.However, for the purposes of this paper, the rule was not applied.

Application of methods to four selected longitudinal profiles of South African rivers
Results of the application of the methods to the Crocodile and Seekoei Rivers are discussed in detail; additional information is presented for the Olifants and Mhlathuze Rivers.

Crocodile River
The Crocodile River flows eastward for ~350 km from its source (~2 200 m amsl) on the Mpumalanga Highlands, through the Great Escarpment, onto the Southeastern Coastal Hinterland, then onto the Lowveld before flowing into Mozambique.The longitudinal profile of the Crocodile River was divided into macro-reaches for the Crocodile Ecological Reserve assessment study (Dollar and Bijker, 2002).Five macro-reaches were for the river (Table 1) uti-   2 shows the profile with the macro-reaches defined by horizontal lines; Fig. 3 shows the cumulative deviation from the average slope or CUSUM plot.
The four statistical techniques described previously were applied to the same longitudinal profile to assess whether an automated desktop method could be applied with similar confidence to the expert-driven approach.An advantage of using statistical techniques rather than the expert-driven approach is that the decision-criteria remain constant and without bias.The automated desktop technique was applied on the slope between sampling points, cumulative slope and as well as on the deviation from the average slope (CUSUM).

Comparison of statistical methods for determining break points
Applying the various statistical methods is relatively simple, but requires numerous iterations of the calculations.The process for each was therefore automated to determine statistically valid change points according to the schematic representation in Fig. 1.
The Von Neumann method defined 24 significant changes in average slope along the Crocodile River profile (level of significance = 0.001).This method is considered too sensitive to changes in river slope to be of use in assessments of break points between macro-reaches.A very stringent level of significance (0.001) was used, and yet many changes were predicted.
The MSE method was applied using a demonstration version of a computer program.The coding for the bootstrapping method was included and was used to define significant changes.Five change points were determined.Three of the changes did not relate to those determined by the expert-driven approach.The other two were near the top and bottom of the waterfall, respectively.However, neither were at positions on the profile where clear changes in slope occurred at the waterfall, but were a short distance away.
The CUSUM and WLRT methods calculated five identical break points.The initial change point (Change 1 in Fig. 4) does not correspond to any break in macroreaches determined by the expertdriven approach.It defines the base of a waterfall which was included in macro-reach 3 in the expertdriven approach (Fig. 3).It does, however, define a very significant change in slope on the river, which is why it was found as the highest level change point.The next levels of breaks or change points (Changes 2 and 3) agree with those from the expert-driven approach.At the next level, Change 2.1 was calculated at the end of a short section of river with a very steep slope.It should be noted, however, that this change point was recognised as being significant in the expert-driven approach, but the macro-reach was considered too short and was therefore incorporated into macro-reach 1.
It should be noted that concave slopes are usually considered as a single macro-reach when using an expert-driven approach.However, using the CUSUM or WLRT methods, the most statistically valid change in concave slopes sometimes occurred where the slopes changed from steep to flatter slopes, even though the transition was gradual.This was evident, for example, in assessing the profile of the Mhlathuze River.For the Crocodile River, of the four methods described, the application of the CUSUM and WLRT methods on the slope of the longitudinal profile generated change point values most closely related to those defined by the expert-driven approach.This was also true for the Olifants and Mhlathuze Rivers.
For the Crocodile River, only one of the break points identified by the expert-driven approach was found to be statistically insignificant using the CUSUM and WLRT methods (Table 2).However, both the CUSUM and WLRT methods identified an additional breakpoint not identified by the expert-driven approach.For the Olifants River (Dollar, 2003), the CUSUM and WLRT methods identified one change point correctly, missed three expected points, but identified one not defined by the expert-driven system (Table 2).For the Mhlathuze River (Dollar, 2002), both methods picked up all three changes from the expert-driven system, but also two extra points (on concave slopes where the slope changed) (Table 2).Using the methods on the Seekoei River (Dollar, 2005), both methods identified only one change point since the profile was very flat near the source of the river (Table 2).Although the CUSUM and WLRT methods did not define all of the same change points as the expert- driven approach, they did, however, pick up the majority of the change points (50%, 25% and 100% of expert-defined changes for the Crocodile, Olifants and Mhlathuze Rivers, respectively).The data used in this investigation were generated from 1:50 000 topographical maps at the positions of the 20 m contour intervals.As a result, where the general slope is low, the distances between recorded points are large.The WLRT method accuracy was improved when points were added by linear interpolation throughout the data set (three points were interpolated between each measured value).This illustrated that a higher density of data allowed more efficient identification of break points.Further, the methods were only applied to the valley slope, whereas the expert-driven approach considers valley form, potential sediment yield, the position of major tributaries in relation to the main stem, major lithological and structural changes, and an analysis of channel type.A perfect fit between the two determinations is therefore unlikely.

Comparison of WLRT and CUSUM methods
In order to determine which of the two methods (CUSUM and WLRT) best replicated the macro-reaches defined by the expertdriven approach, both methods were utilised to define macroreaches for the Crocodile, Olifants, Mhlathuze and Seekoei Rivers using data from a 20 m x 20 m digital terrain model (DTM).The data were extracted from the DTM utilising a method developed by Moolman et al. (2002).Profiles derived from the DTM had more data points than those obtained from the 1:50 000 topographical maps.The overall heights and distances also varied slightly between the two profiles.To simplify comparison of the positions of change points, both profiles were normalised (after Blight, 1994), as shown in the example presented in Fig. 5 for the Crocodile River.
As shown in Fig. 5, there is general agreement in the normalised profiles.Both the CUSUM and WLRT methods detected the same change point positions on the longitudinal profile.Two of the change points found by the two procedures agree; these are at normalised distances of 0.08 and 0.61.The expert-driven approach defined a break between macro-reaches at the top of the waterfall (normalised distance = 0.12) while the CUSUM and WLRT defined a change at the base of the waterfall (normalised distance = 0.14).The CUSUM and WLRT methods did not, however, pick up the change point at a normalised distance of 0.51.The methods also calculated two changes not defined by the expert-driven approach.These were found near the end of the profile.
When repeating the procedure on the longitudinal profile of the Olifants River, the WLRT method was able to locate one more of the expert-defined change points than the CUSUM method.However, there were still two change points from the expert-driven approach that were not identified as significant by  the CUSUM method.An additional change point was calculated by both methods at a change in slope on a section of concave profile, although the positions were not in the same location.
The CUSUM method calculated two more change points than the WLRT method for the Mhlathuze River, but both were unrelated to breaks between macro-reaches observed in the expert analysis.
Both methods identified one of changes identified by the expertdriven process for the Seekoei River, albeit different ones.The CUSUM method identified an additional four breakpoints that were not found during the expert-driven approach.
Evidence from the river longitudinal profile therefore suggests that macro-reach breaks identified by the adapted WLRT method applied to the DTM-derived data most closely resembles the change point positions determined by the expert-driven approach.In some cases the CUSUM method generated more change points than the WLRT method, and in other cases fewer.On the profiles where more change points were defined by the CUSUM method, the estimated locations for breaks between macro-reaches did not relate to change points derived from the expert-driven approach.

Conclusions
The delineation of river longitudinal profiles into macroreaches forms an important part of the Ecological Reserve assessment process, and can be used to aid freshwater ecosystem conservation planning.Further, the deviation of the profile from an 'idealised' profile shape provides important clues as to the evolutionary history of the fluvial system.Evidence presented in this paper has demonstrated that where no other data are available and/or where large numbers of river longitudinal profiles need to be rapidly assessed, the adapted WLRT method is the most reliable of the assessed methods.The value of the adapted WLRT method is that multiple change points can be determined without a priori knowledge of the number of change points and/or their likely spacing.The method therefore provides a simple, statistically defensible and repeatable tool for delineating macro-reaches from longitudinal profile data.This does not, however, obviate the need for field evidence, primary data collection and a detailed expert-driven approach where resources allow.Ground-truthing of the macro-reaches is therefore required where the determination goes beyond a desktop level.
of S k * fluctuate around zero, then the data are fairly homogeneous.Deviations from the mean created by Y i values much greater than the mean cause S k * to become negative.Conversely, Y i values smaller than the mean result in positive S k * values.The S k * values can be further rescaled and adjusted to a mean of zero.This is achieved by dividing the cumulative deviations from the mean, S k * by the sample standard deviation, D Y : Figure 2 Longitudinal profile of the Crocodile RiverLongitudinal profile of the Crocodile River

Figure 4
Figure 4Macro-reaches defined by change points determined using the WLRT to split data successively (∆ is used to denote positions of macro-reach breaks from Fig.1) Figure 5Macro-reaches defined by change points for the 1:50 000 topographical map data and the DTM-derived data