Exploring the utility of the additional WorldView-2 bands and support vector machines in mapping land use / land cover in a fragmented ecosystem , South Africa

Land use/land cover (LULC) classification is a key research field in environmental applications of remote sensing on the earth’s surface. The advent of new high resolution multispectral sensors with unique bands has provided an opportunity to map the spatial distribution of detailed LULC classes over a large fragmented area. The objectives of the present study were: (1) to map LULC classes using multispectral WorldView-2 (WV-2) data and SVM in a fragmented ecosystem; and (2) to compare the accuracy of three WV-2 spectral data sets in distinguishing amongst various LULC classes in a fragmented ecosystem. WV-2 image was spectrally resized to its four standard bands (SB: blue, green, red and near infrared-1) and four strategically located bands (AB: coastal blue, yellow, red edge and near infrared-2). WV-2 image (8bands: 8B) together with SB and AB subsets were used to classify LULC using support vector machines. Overall classification accuracies of 78.0% (total disagreement = 22.0%) for 8B, 51.0% (total disagreement = 49.0%) for SB, and 64.0% (total disagreement = 36.0%) for AB were achieved. There were significant differences between the performance of all WV-2 subset pair comparisons (8B versus SB, 8B versus AB and SB versus AB) as demonstrated by the results of McNemar’s test (Z score ≥1.96). This study concludes that WV-2 multispectral data and the SVM classifier have the potential to map LULC classes in a fragmented ecosystem. The study also offers relatively accurate information that is important for the indigenous forest managers in KwaZulu-Natal, South Africa for making informed decisions regarding conservation and management of LULC patterns. South African Journal of Geomatics, Vol. 4, No. 4, November 2015 415


Introduction
Land use/land cover (LULC) is a fundamental variable that influences and links with many parts of human and physical systems and is a vital data component for many aspects of environmental change (Foody, 2002, Otukei andBlaschke, 2010).The changes in LULC have significant effects on basic ecosystem processes including biogeochemical cycling and land degradation (Penner, 1994, Foley et al., 2005, Otukei and Blaschke, 2010).Similarly, the LULC classification plays an important role in environmental monitoring, and management (Otukei and Blaschke, 2010).Despite this important role, LULC mapping is still facing a complex challenge in nexus to ambiguous classes used (Cingolani et al., 2004, Otukei andBlaschke, 2010).
Additionally, fragmented ecosystems in many parts of Africa are characterized by the removal and clearing of the forest for pasture, agriculture and settlements leading to vegetation species loss (van Wyk et al., 1996, Cho et al., 2013).In most cases, indigenous forests are fragmented into patches of various sizes and shapes surrounded by a matrix of different LULC classes (Benitez-Malvido, 1998, Cho et al., 2013).In this context, information relating to the dynamics, distribution and productivity of LULC is not only beneficial to the source of economic security (Eldeen, 2005, van Wyk, 2008), but is also needed for fragmented ecosystems inventory, management and monitoring (Cingolani et al., 2004, Pignatti et al., 2009, Cho et al., 2013).In order to meet the management and monitoring requirements of fragmented ecosystems, more specific information from LULC surveys and inventories is needed.However, it is quite difficult and challenging to produce LULC maps using traditional field survey approaches.
Traditional approaches are complex and they require intensive fieldwork which is a costly and time-consuming process that often lacks the necessary geometric accuracy, particularly in highly heterogeneous and fragmented ecosystems.In this regard, remote sensing is a particularly useful tool, as it has successfully been used for tree species and land cover classification (Clark et al., 2005, Larsen, 2007).Moreover, multispectral sensors such as Landsat and SPOT cover large areas of the earth's surface at repeated time intervals, making remote sensing a perfect alternative to traditional approaches for LULC.Recently, the developments of high spatial resolution multispectral sensors such as IKONOS have brought unique opportunities for classifying and monitoring LULC (Pu and Landry, 2012).Multispectral data either have high spatial resolution but offer only a few bands like blue, green, red, and near infrared (NIR), or they offer relatively more bands but with lower spatial resolution.The low spatial resolution multispectral sensors might not accurately map LULC classes in a heterogeneous and fragmented ecosystem (Foody, 2002, Cho et al., 2012).Multiple objects within a pixel in such a case can lead to spectral confusion and poor distinction amongst discrete and continuous cover types, resulting in ambiguous LULC classes (Cingolani et al., 2004).On the other hand, multispectral data of very fine spatial resolution could not capture the intra-class variability accurately when course LULC classes are mapped.As long as the spectral resolution is concerned, the advent of hyperspectral data can overcome the limitations of multispectral data by providing spectral data of many and contiguous wavebands (Vaiphasa et al., 2007) for more accurate and reliable LULC maps (Pal, 2006, Petropoulos et al., 2012).However, the use of hyperspectral data has its own limitations in terms of cost, availability, processing, and high dimensionality (Vaiphasa et al., 2007, Dalponte et al., 2009).
Recently, high spatial resolution multispectral sensors such as RapidEye, WorldView-2 (WV-2) and Sentinel-2 were designed with a relatively fewer strategically located bands (additional bands) to overcome the limitations of their spectral capabilities over other high spatial resolution multispectral sensors of conventional bands (standard bands) such as QuickBird.The potential of WV-2 data, for instance, has been demonstrated in a number of diverse studies that include among others, predicting and mapping forest structural parameters (Ozdemir and Karnieli, 2011), urban land cover mapping (Zhou et al., 2012), and discriminating commercial forest species (Peerbhay et al., 2014).These studies discussed the utility of the eight available spectral bands of WV-2 imagery on mapping and predicting a feature of interest and concluded that WV-2 data have considerably improved the classification and prediction accuracies compared to conventional sensors.For instance, Pu and Landry (2012) explored the use of WV-2 and IKONOS data sets on mapping tree species and found that the bands available in WV-2 significantly increased the classification accuracy compared to the bands available in IKONOS sensor.These studies have limitations related to paucity of knowledge on the performance of WV-2 spectral subsets in vegetation and LULC types with advanced classification methods.
Various classification methods have been implemented in order to map vegetation species and LULC classes using WV-2 data.These methods include discriminant analysis (Pu and Landry, 2012), decision trees (Heumann, 2011), maximum likelihood and minimum distance to the mean classifiers (Cho et al., 2011, McCarthy andHalls, 2014).All these classifiers have used supervised classification methods with conventional multispectral data.Amongst these classifiers, the maximum likelihood and minimum distance to the mean have been the most widely used classifiers (Kavzoglu andMather, 2003, Otukei andBlaschke, 2010).The two classifiers have the ability to generate acceptable accuracy, simplicity and availability in most image processing packages (Zhang et al., 2007, Cho et al., 2011).However, all these classifiers have their own limitations that are related particularly to distributional assumptions and to mapping areas with limited training samples (Kavzoglu andMather, 2003, Cho et al., 2012).
To tackle these problems, powerful classification methods are essentially used for mapping LULC (Lu and Weng, 2007).These classification methods include, for example, neural networks, random forest, support vector machines (SVM) and decision trees (Civco, 1993, Heumann, 2011, Pal, 2003, Pal, 2006).Amongst these methods, attention has been paid to the use of SVM due to its superior image handling capability (Vapnik, 1998).Numerous studies have used SVM classifier with multispectral imagery for LULC mapping (Kavzoglu and Colkesen, 2009, Otukei and Blaschke, 2010, Petropoulos et al., 2012, Adam et al., 2014).These researchers have found relatively better or similar performances obtained by this classifier as compared to other classifiers when multispectral and hyperspectral data were used.The exploration of the utility of WV-2 strategically located additional bands for improving the accuracy of LULC maps in a fragmented ecosystem is needed.To the best of our knowledge, there is a lack of information on how SVM classifier can perform on delineating LULC class in a fragmented ecosystem using different WV-2 spectral subsets.Therefore, the objectives of the present study were; (1) to map different LULC classes using WV-2 data and SVM classifier in a fragmented ecosystem; and (2) to compare the accuracy of three WV-2 spectral subsets in distinguishing amongst various LULC classes in a very fragmented ecosystem.

Study area
The study area is located in Dukuduku forest, in the northern part of KwaZulu-Natal at the entrance to the iSimangaliso Wetland Park, South Africa (28 o 25'S, 32 o 17'E) (Figure 1).The area covers approximately 19887 hectares.The subtropical climate dominating the study area has warm moist summers and mild dry winters.The mean daily maximum temperatures are 26°C in January and 21°C in July, while mean daily minimum temperatures are 19°C in January and 9°C in July (Von Maltitz et al., 2003).The rainy season falls between November and March with a mean annual rainfall of 1250 mm (Von Maltitz et al., 2003).The Dukuduku forest is undergoing fragmentation as a result of indigenous forest clearance for agriculture, plantation forestry, and settlements (Cho et al., 2012, Cho et al., 2013).Some other anthropogenic activities in the area have also led to a fragmented ecosystem.The area is covered by various natural indigenous vegetation species with different age groups and other forms of LULC classes.These include sugarcane farms and commercial plantation forests occurring on the artificially drained floodplain to the south of the area and grassland to the north (van Wyk et al., 1996, Cho et al., 2012).Some of the natural vegetation has been removed and cleared for other land uses (van Wyk et al., 1996, Cho et al., 2012, Cho et al., 2013).Therefore, the study area comprises a wide range of land use and fragmented forest classes and thus is appropriate to meet the objectives of this study.(400-450 nm), blue (450-510 nm), green (510-580 nm), yellow (585-625 nm), red (630-690 nm), red edge (705-745 nm), NIR-1 (770-895 nm), and NIR-2 (860-1040 nm).The image was atmospherically corrected and transformed to canopy reflectance using the Quick Atmospheric Correction (QUAC) extension in Environment for Visualizing Images (ENVI 4.7) software (ENVI, 2006).The image was then referenced to the Universal Transverse Mercator (UTM zone 36 South) projection using WGS-84 Geodetic datum.The acquired image was geometrically corrected by DigitalGlobe.After the geometric and atmospheric correction, the WV-2 image was spectrally resized to four standard bands (blue, green, red and NIR 1: SB) and four strategically located new bands (coastal blue, yellow, red edge and NIR 2: AB).These subsets together with all eight bands (8B) of WV-2 were compared for mapping LULC classes using SVM supervised classifier.

Field data collection
The field campaign was carried out on 7th December 2013, within a week of the WV-2 imagery acquisition.This was done in order to collect ground reference data of eight LULC classes, namely: dune forest (DF), indigenous forest (IF), fragmented forest (FF), Eucalyptus spp (EP), Pinus spp (PN), mature sugarcane (MS), young sugarcane (YS), and grassland (GL) using a handheld Leica GS20 GPS (geographical positioning system) with sub-meter accuracy.During the field visit, a total of 75 sample data points were collected for each class.The ground reference data were collected using random sampling protocol to adequately sample LULC classes based on their representative sizes within the study area.The reference data were then divided randomly into training (70%) and test (30%) dataset using Hawth's Analysis tool in ArcGIS 9.3.The SVM classifier was trained on 70% of a randomly selected holdout sample and final accuracy was assessed using the remaining 30% samples.

Statistical analysis
The effectiveness of SVM classifier to map LULC classes was investigated in this study.
The classifier was trained on 70% (n = 53) of a randomly selected holdout sample and final accuracy assessments were evaluated using the remaining 30% (n = 22) of the dataset.When the training positions and classes were allocated, classification signatures were created for the eight LULC classes in the study area.After assessing and adjusting the signatures, SVM supervised classification method was then employed to classify the WV-2 image.SVM parameters were optimized and then input into the ENVI software to map the classes on WV-2 image.The e1071 library version 2.15.2 in R statistical packages (R Development Core, 2012) was employed for SVM parameters optimization.

Support vector machines (SVM) classifier
SVM was originally introduced by Cortes and Vapnik (1995) as a binary classifier.
However, real remote sensing problems usually include identification of multiple classes.
Amendments are made to the simple SVM binary classifier to run as a multi-class classifier (Burges, 1998).SVM is a distribution-free algorithm that requires few training data points without encounter any over-fitting problems due to the use of kernel functions (Cortes and Vapnik, 1995).In the present study, WV-2 subsets (8B, SB and AB) were used for defining the feature space of SVM.A radial basis function (RBF) was utilized to apply the kernel method of SVM.Interested readers are referred to among others, Cortes and Vapnik, (1995), Mountrakis et al. (2011) and Abdel-Rahman et al. (2014) for detailed information about the concept, theoratical and mathematical formulation of SVM algorithm.

Accuracy assessment
A confusion matrix was constructed to compare the true class with the class assigned by SVM and to calculate the overall accuracy (OA), producer's accuracy (PA) and user's accuracy (UA).OA has the advantage of being directly interpretable as the percentage (%) between the number of correctly classified samples and the number of test samples.PA refers to the probability of a certain class being correctly classified, while UA represents the likelihood that a sample belongs to specific class and the classifier accurately assigns it such a class.In addition, two parameters were calculated from the cross-tabulation matrix to evaluate the reliability of SVM classifier.These include quantity disagreement (QD) and allocation disagreement (AD) which were developed by Pontius and Millones (2011).The quantity disagreement is the amount of the contrast between the number of test data and predicted data, while the allocation disagreement describes the number of expected classes that have less than optimal spatial location in comparison to the test data.Depending on the accuracy metrics achieved for each WV-2 data sets in each accuracy assessment method, a statistical analysis can be performed to test if there was any significant difference between the classification results of three WV-2 spectral subsets.Hence, McNemar's test was performed to test whether there were any significant differences amongst the confusion matrices of the three WV-2 spectral dataset.
McNemar's test is a nonparametric test based upon standardized normal test statistic calculated from error matrices of SVM classifier given as follows (Foody, 2004, Leeuw et al., 2006): where 12 denotes the number of samples that are misclassified on the first confusion matrix but correctly classified on the second confusion matrix.21 denotes the number of samples that are misclassified on the second confusion matrix but correctly classified on the first confusion matrix.A difference in accuracy between the confusion matrices of different WV-2 spectral subsets is statistically significant (p≤0.05) if Z value is more than 1.96 (Foody, 2004, Leeuw et al., 2006).

Optimization of support vector machines
The results of grid search and 10-fold cross validation method indicated optimal values of λ and C for SVM, respectively, of 0.1 and 10 for 8B, 0.1 and 1000 for SB, and 0.1 and 100 for AB.When these optimal values were input into SVM classifier, minimum classification error of 32.0%, 35.3%, and 36.3% for 8B, SB and AB, respectively were obtained.The optimal values of λ and C were input into SVM classification algorithm to map different LULC classes in the study area using the three WV-2 spectral data sets.

Accuracy assessment
Figure 2 shows LULC maps obtained using SVM classifier.The main visual difference between the maps is that a relatively homogeneous map was produced when the 8B was used as compared with other spectral subsets.The maps also show that the Dukuduku indigenous forest was mainly surrounded by grassland and commercial forest plantation, while the grassland on the north eastern part of the study area was fragmented.Most of sugarcane farms in the study area were at a mature growth stage.Furthermore, the overall accuracy assessment for mapping LULC class was 78.0% (total disagreement = 22.0%), 51.0% (total disagreement = 49.0%) and 64.0%(total disagreement = 36.0%)using 8B, SB and AB respectively (Tables 1, 2 and 3).SVM classifier obtained quantity disagreement values of 5.0%, 13.0% and 14% for 8B, SB and AB respectively (Tables 1, 2 and   3).The tables also show relatively high allocation disagreement values of 17.0%, 36.0%and 22.0%.Generally, all LULC classes achieved over 70% producer's and user's accuracies, with exception of FF (PA = 63% and UA = 68%) and IF (PA of 68% and UA of 59%) when 8B subset was used (Figure 3).Our results indicate that the values of the UA were less than 50% for dune forest, grassland, and mature sugarcane classes on SB subset (Figure 4) and for grassland and mature sugarcane on AB subset (Figure 5).
According to McNemar's test, there was significant difference (Z ≥ 1.96) at 95% confidence level amongst the confusion matrices of SVM classifier using WV-2 8B, SB and AB subsets (Table 4).Table 5 shows areas under each LULC class obtained from WV-2 8B, SB and AB subsets using SVM classification algorithm.The incomparable areas obtained by WV-2 subsets also confirm the dissimilar performance of SVM classification algorithm.The study area is dominated by indigenous forest, commercial plantation and grassland.
Figure 3. Producer's accuracy (%) and user's accuracy (%) of the studied eight LULC classes using all eight bands subset (8B) and support vector machines classifier for the 30% test data sets

Discussion
The main finding of the present study was that WV-2 8B significantly outperformed SB and AB subsets in mapping LULC classes in a fragmented ecosystem.Our finding is in conformity with other studies (Elsharkawy et al., 2012, Pu and Landry, 2012, Peerbhay et al., 2014) that demonstrated the utility of WV-2 8B in mapping LULC class in intact landscapes.
There are two reasons that may have led to high accuracy.Firstly, the LULC class in Dukuduku forest consists of vegetation and WV-2 strategically located bands are effective in differentiating vegetated surfaces (Marchisio et al., 2010, Yang, 2011, Alsubaie, 2012).Secondly, the SVM classification algorithm is useful for LULC mapping of fragmented ecosystem because SVM reduces classification error on test data points without a prior assumption about their distribution (Mountrakis et al., 2011, Ghosh andJoshi, 2014).
Moreover, SVM is a known versatile classifier that constructs models based on a small data from different classes (Cortes and Vapnik, 1995) maximizing the margin between the support vectors and the hyperplane.The classification error is therefore significantly minimized.
In the present study, we used a non-linear kernel function to perform SVM classification.A nonlinear kernel is an efficient method to solve inseparability problems that may be found in the LULC classes.The relatively good performance of SVM classifier obtained in this study is consistent with the findings of Huang et al. (2002), Kavzoglu and Colkesen (2009) and Petropoulos et al. (2012) who utilized a kernel functions analysis of SVM for classifying remotely-sensed data and concluded that the classifier leads to improved classification accuracy.
A number of authors have found that SVM was the best classification technique for mapping LULC using high spatial resolution imagery such as WV-2 (Pal, 2006, Chen, 2011, Pu and Landry, 2012).To our knowledge, WV-2 subsets (8B, SB and AB) have never been compared for mapping LULC classes in areas with small dataset samples such as the fragmented ecosystem in the Dukuduku area of KwaZulu-Natal, South Africa.Our study showed that SVM classifier was unable to fully deal with the high spectral variation inherent in some LULC classes like mature sugarcane and grassland which obtained relatively lower UA and PA (see Figs 4 and   5).This is a common problem when classifying heterogeneous landscapes using high spatial resolution (WV-2 image) based on per-pixel classification techniques (Lu and Weng, 2007).
Although the eight LULC classes could be separated accurately using only the four standard bands, the use of the WV-2 strategically located bands (coastal blue, yellow, red edge and NIR-2) led to a considerable improvement in the classification accuracy.That is expected when advanced machine learning algorithm is used with WV-2 data.The additional wavebands are expected to provide an increase of up to 30% in classification accuracy (Zhou et al., 2012).
The low UA for the GL and MS classes indicate that there is a probability that pixels classified as GL and MS may not actually exist on the ground.That is expected since the physiological age of a mature sugarcane crop could be similar to densely vegetated grassland as a result of some confounding factors such as weeds and abiotic stressors (Abdel-Rahman et al., 2013) and hence similar spectral characteristics.
In summary, our findings are promising for accurate mapping of LULC in fragmented areas as it demonstrates the possibility of mapping LULC classes using WV-2 data and SVM classifier.Moreover, the relatively accurate classification result obtained in this study provides reliable information on LULC classes in the Dukuduku area.That could be used for the design of management plans policies as a basis for assessing and monitoring natural resources, ecological fragmentation and the ecosystem function.This information is therefore critical in the management of one of the most valuable landscapes in South Africa.Furthermore, mapping sugarcane ages is quite useful information for the Southern African sugar industry for making informed decisions with regard to sugarcane harvesting and milling.Further research is needed to widen the use of WV-2 imagery in identifying the rare forest species within the indigenous forest in the north-western part of the study area.
We have mapped eight course LULC classes using remotely-sensed data of fine pixel size.The intra-classes variability could have exceeded the fine pixel size of WV-2 image.
Hence, remotely-sensed data of medium spatial resolution (e.g., 10 or 20 m) could yield relatively better classification results.Furthermore, we did not consider the scale of the fragmentation which should match the image pixel size to accurately map the LULC classes, since our study area is a fragmented landscape (Cho et al., 2013).Future studies should look at the scale of the fragmentation using fragstats metrics extracted from multi-temporal satellite data.Scale of fragmentation should then be matched with an appropriate image resolution to derive more accurate LULC classification maps.

Conclusions
The present study shows a successful application of multispectral WV-2 data and the machine learning SVM classifier in mapping eight LULC classes in a fragmented ecosystem.
The results show that WV-2 8B significantly outperforms both SB and AB subsets in mapping LULC classes, achieving an overall accuracy of 78%.On the other hand, WV-2 AB subset yielded significantly higher classification accuracy than SB subset.The results further demonstrate that the classification error for mature sugarcane and grassland was relatively higher.Our study provides LULC maps that could be used as essential information for decisionmaking regarding land management and policy making strategies in the fragmented Dukuduku area.It is recommended that further studies should look at identifying threatened (rare) tree species within the indigenous Dukuduku forest using a sub pixel classification algorithm.

Acknowledgments
We would like to thank the University of KwaZulu-Natal, South Africa and University of Khartoum, Sudan for funding the research.Our appreciation also extends to the R development core team for their open source packages for the statistical analysis.Our gratitude further extends to Inkanyamba Development Trust and Manukelana Arts and Indigenous Nursery for facilitating the field data collection.Our gratitude further extends to Dr. Samuel Adelabu for his helpful comments and assistance during data analysis.

Figure 1 .
Figure 1.Location of study area in KwaZulu-Natal Province (KZN) of South Africa

Figure 2 .
Figure 2. Land use/cover classification maps obtained using support vector machines classifier: (a) all eight WorldView-2 bands, (b) four standard WorldView-2 bands and (C) four additional WorldView-2 bands

Figure 4 .Figure 5 .
Figure 4. Producer's accuracy (%) and user's accuracy (%) of the studied eight LULC classes using standard bands subset (SB) and support vector machines classifier for the 30% test data sets

Table 5 .
Area of each LULC class in the study area obtained from all WorldView-2 eight bands, four WorldView-2 standard bands and four WorldView-2 additional bands subsets based on support vector machines classification algorithm