Evaluating Pixel vs . Segmentation based Classifiers with Height Differentiation on SPOT 6 Imagery for Urban Land Cover Mapping

The identification, extraction, classification and mapping of detailed, but reliable Land Use or Land Cover (LULC) data play an increasingly important role in informed decision-making whether employed in urban planning and civil engineering, intensive agriculture, the natural and environmental sciences, for example. One way of extracting LULC information is through the use of algorithms that classify multispectral satellite images according to the required standard and user legend. The meaningful classification of heterogeneous urban and city landscapes however remains challenging and is performed using semi-automated pixel-based, object-based, or hybrid classification workflows. With the prevailing remote sensing technologies enabling professionals to integrate multidimensional data from various sources to improve the quality of LULC classification nowadays, it negated the dependency on (multi)spectral information alone. This study sought to explore how successful a single-acquisition pansharpened SPOT 6 image can be deconstructed into obtaining primary and secondary LULC classes. This was achieved using a comparison of the pixel-based versus segmentation-based classifiers, performed over Soshanguve Township in South Africa. The study further assessed the effect of integrating LiDAR derived 3D land surface data into both classification processes. A supervised Maximum Likelihood classifier was executed for the pixel-based routine, while the ERDAS IMAGINE Objective tool was used for the segmentationbased approach. A total of nine LULC classes were successfully identified from the classification. The results showed that the segmentation-based approach outperformed the pixel-based approach, yet when integrating height information both segmentation and pixel-based overall accuracies increased from 67.5 % to 78.8 % and 57.5 % to 73.8 %, respectively.


Introduction
The extraction of LULC information has over the years become important for various Earth observation applications in fields such as urban and town planning, transport and civil engineering (Bhaskaran et al., 2010).One way of extracting useful LULC information is through the classification of digital imagery.With the recent developments in Remote Sensing (RS) technology, satellite images now provide finer spatial resolutions which allow for the possibility of more detailed mapping of the urban landscapes (Jabari & Zhang, 2013).The classification of remotely sensed imagery is normally performed using object-or pixel-based approaches, or a hybrid of the two classification strategies.
Traditional pixel-based classifiers have been widely used for classifying optical imagery from satellites into meaningful LULC classes.With this approach LULC classification is performed by assigning pixels to classes using either supervised or unsupervised classifiers (Campbell & Wynne, 2011).As high to very high spatial resolution imagery becomes more available, there is a need for new and improved classification routines that will classify this high resolution data (Warner et al., 2009).Approaching sub-meter ground resolutions, classification algorithms that use a single analysis are often not able to extract the desired urban LULC information from this high resolution data (Visual Learning Systems, 2002), even after proper image pre-processing (such as atmospheric corrections and orthorectification).This is because with high spatial resolution the pixel-based approach may classify neighbouring pixels into different land cover classes based on their spectra even though these pixels belong to the same land cover; hence the shift towards object-based image analysis (Blaschke & Strobl, 2001;Djenaliev & Hellwich, 2014).Unlike the pixel-based approach which classifies pixels strictly according to their spectral information, the object-based approach uses both the spatial and spectral resolution to segment and then classify image features into meaningful objects (Xiaoxia et al., 2004).In the object-based approach homogeneous groups of pixels are delineated into meaningful objects based the object's texture, shape, size and other useful information obtained from the imagery (Blaschke, 2010;Djenaliev & Hellwich, 2014).From the resulting segments, homogeneous image objects are extracted based on the local contrast.These homogeneous objects are then classified using traditional classification approaches such as nearest neighbour, or using knowledge-based approaches and fuzzy classification logic (Civco et al., 2002).
Recent urban studies in Earth Observation (EO) show that there has been a shift from the coarser spatial resolution imagery, such as LANDSAT data, to high and even very high resolution imagery obtained by the SPOT 6/7, Worldview and Pleiades series of instruments, for instance.This shift is largely because high resolution imagery offers an ideal opportunity for detailed LULC classification in the urban context.However, the cost acquiring this high resolution data is often high, especially when working with larger geographic regions (Djenaliev & Hellwich, 2014).Even though there have been improvements in the spatial resolution of multispectral RS data, the imagery alone is still not sufficient to automatically classify heterogeneous urban landscapes at a city block level.Studies in literature are now moving towards data integration in order to improve the accuracy of urban LULC classification (Chavez et al., 1991;Pohl & Van Genderen, 1998).Data integration refers to the integration of information from various sources or sensors, such as the integration of LiDAR and optical data (Zhang, 2010).Vegetation penetrating LiDAR provides more accurate position and height information (structure) about objects on the face of the Earth but lacks direct information about other vital attributes such as colour and geometrical shape.High spatial resolution imagery, in this case, will offer more detailed information about the object's attributes such as shape, texture and spectral information (Syed et al., 2005).Thus, integrating different datasets is promising for quality LULC extraction as demonstrated by Awrangjeb et al. (2010), who used LiDAR data and multispectral imagery for automatic detection of residential building.The results from their study showed that the integration of the two remote sensing datasets allowed for the successful detection of urban residential buildings.

1.1
Study motivation Urban land cover mapping from remotely sensed data is important because it gives sound knowledge of the different land covers that exist on the surface of the Earth.In turn, it would assist Government in creating, updating, implementing laws and policies regarding current and future uses of land.With the continuous advancements in RS technology, users now have access to high spatial and spectral resolution data which allows for detailed land cover mapping in complex urban areas.
Mapping complex urban land cover requires advanced methods that seek to produce a more accurate result.One way of doing so is moving away from the use of a single source of RS data to the integration of data from different sensors.An example of a study of this nature is a study conducted by Chen et al. (2009) where QuickBird and LiDAR data were integrated for hierarchical object-oriented classification of urban land cover.From the results obtained, the per-pixel based classification using just the QuickBird optical imagery was found to be 69.12 % whereas the integration of LiDAR and QuickBird datasets had an improved accuracy of 89.40 %.This clearly demonstrated that the integration of height and optical information does increase the classification accuracy.This study used a subset of a single SPOT 6 acquisition scene over a densely populated urban area to compare the traditional pixel-based versus the object/segmentation-based classification approach.The study further assessed the effect of integrating classified height information, derived from a LiDAR point cloud, into the preceding two classification routines.

Study area
The study was carried out in Soshanguve Township, whose history dates back to 1947.
Previously this township was designated for migrants and got its name from the languages spoken in the area (i.e.Sotho, Shangan, Nguni, and Venda).Situated about 45km north of South Africa's capital city, Pretoria in the Gauteng Province, the study area of almost 53km 2 falls within the City of Tshwane Metropolitan Municipality (Figure 1) which covers a total area of approximately 6 298km 2 and has an estimated population of 2 921 488.Soshanguve itself has a total area of 126.77km 2 and an estimated population of 403 162 (Statistics South Africa, 2011).The region has a humid subtropical climate with long, hot rainy summers and short cool to cold winters.According to the 2014 South African LULC dataset (GEOTERRAIMAGE 2014) obtainable from the Department of Environmental Affairs (DEA) and National Geo-spatial Information (NGI), the study area consists of approximately 60 % built-up (buildings and transport); 30 % urban vegetation (grasses, shrubs/bushes and trees) and 15 % natural vegetation (open woodland).

3.1
Remotely sensed data The SPOT 6 scene was obtained from the South African National Space Agency (SANSA) and used as primary input data for the study.The image was acquired in June 2014 and consisted of the Red, Green, Blue (RGB) and Near Infrared (NIR) bands (ground resolution of 6m), and an additional panchromatic band (ground resolution of 1.5m).Although the NIR band was available, this study used the RGB composite because it gave the best distinction between the different LULC than any other band combination when not relying upon widely-used vegetation indices.The preparation of the SPOT 6 subset was done in ArcGIS 10.x where the multispectral and panchromatic bands were fused to create a pansharpened 1.5m RGB image.Height information was obtained from a 2m normalised Digital Surface Model (nDSM) that was constructed from LiDAR data, and used to assess whether the integration of height metrics into the classification process could improve the overall classification results.10cm spatial resolution colour ortho-photos acquired simultaneously with the LiDAR data (September 2013) were used to perform the verification of the allocated points.Traditional pixel-based classification is the most commonly used technique for LULC extraction (Foody et al., 1992;Paola & Schowerngerdt, 1995;Breytenbach et al., 2013).For the purposes of this study, a supervised Maximum Likelihood (ML) classifier was used to classify the subset.Two  In this step, the pixel probability layer created from the RPP node was used to segment image features into objects, based on the specified thresholds.The optimum threshold for minimum value difference was 28 with a variation factor of 3.5 in this case.The probability and size filters were used in the 'Raster Object Operators' (ROO) node to filter pixel objects.A minimum probability of ten percent and a size filter of two or more pixels were found to be the optimal threshold for the purposes of this study.The raster from the ROO node was then vectorised using Polygon

Classification
Trace on the 'Raster to Vector Conversion' node.This raster contained pixels that were grouped into raster objects with associated probability values.The vectorised output was then labelled into the various LULC classes in the 'Vector Cleanup Operators' (VCO) node.Finally, the resulting labelled vector file was then converted back into raster format resulting in a labelled segmentationbased classified output.This final classified product was then further exploited to produce the segmentation-based classification with height integration output.

Height Integration
The second classification approach for the two classifiers was the with height integration method which followed the post-classification approach.In this classification the products of both the pixelbased and segmentation-based classification were integrated with height information from the coregistered 2m nDSM.The height integration procedure was carried out in GIS where each class from the previously classified imagery were used to extract height values that corresponded to it.
For instance, to obtain height values for the buildings class, the buildings land cover from the classification was used as a mask to extract only height values that intersect with this LULC class.This was done by using the Extract by Mask tool in ArcGIS.The extracted height values were then categorised into classes presented in Table 2, where any building pixel that had a height value of 0.5m or less was classified as paved surface; and any pixel with a value greater than 0.5m was classified as a building.

Table 2. Height classification
Height (m) Class 0-0.5 Grasses/short vegetation, Bare soil and Paved surfaces 0.5-2.5 Medium shrub/bush, Buildings 2.5-5 Medium bush/tree, Buildings >5.0 Tall trees The same was applied for the paved surfaces class in order to separate buildings from paved surfaces.From the bare soil class, any pixel that had a height of 0.5m and below was considered to be bare soil and any pixel with a height value greater than 0.5m was classified as buildings.The urban vegetation class was also separated according to the described height classes.The integration of height information allowed for the separation of the urban vegetation class which then resulted to an extra class, making it nine LULC classes compared to the eight found in the classification without height integration for both classifiers.

Accuracy
An accuracy assessment was carried out to determine how accurate the classified products represented the actual land cover on the ground.From the classified images, stratified random sampling was employed to generate well-distributed validation points proportionate to each LULC class represented.Comparable to the ground truth verification method where random GPS points are visited in-field to verify the classification accuracy, this validation method attributed each point from the classified images as well as manually verified each against the true land cover at that location as interpreted on the digital 10cm resolution colour ortho-photos.A total of 80 points were generated using the Create Random Points tool in ArcGIS.The number of verification points per LULC class for classifications with and without height information is shown in Table 3.These points were allocated to each class according to the percentage area covered by that particular class.For instance, the waterbodies class would have fewer points compared to the grasses/ short vegetation class because of their different area coverages.For the classification with the integration of height information, the waterbodies class was not included in the verification because waterbodies are commonly artificially flattened on elevation models; therefore the nDSM will often give inconsistent height values for water.Furthermore, since the core focus of this study was to map heterogeneous urban land cover, the open woodland class was also not included in the 'with height' verification because it is a natural vegetation class which had no contribution in achieving the main objective of this study.In the end of the verification process, there were four error matrices prepared to summarise the classification accuracy.

Results and Discussion
Two classifications were carried out for the pixel-based classification approach; one without height and one with the integration of height information (Figure 3A and 3B, respectively).From the graphical representation showing the ML classification without height integration (Figure 3A), it can be observed that the classifier was able to distinguish between the eight LULC classes.
Although these are more defined in the classification with height integration (Figure 3B), the high spatial resolution alone was still capable of separating between the desired LULC classes.From the graphical representation in Figure 3B however, there is a clear distinction between buildings and paved surfaces.The integration of height also allowed for the separation between different vegetation heights.The spectral complexities of the urban landscape often result in the limitation of using the pixel-based method to separate LULC classes (Townshed et al. 2000), where part of the signal that is assumed to be coming from a given pixel may, in fact, be coming from surrounding terrain pixels and this is often overlooked in the pixel-based classification.This was the case in this study where roofs in the informal settlements of Soshanguve were classified as roads due to the similarities in spectral signatures of the two urban LULC classes.
Two classifications were also carried out for the segmentation-based approach (Figure 4A and   4B).This classifier used both spatial and spectral information to distinguish between various LULC classes.Although careful attention was paid to the training of pixels, the issue of pixel confusion due to the spatial resolution of the imagery could not be totally eliminated in the early stages of pixel training.This can be seen on the results shown from the classification without height integration (Figure 4A), where many of the paved surfaces in the study area were classified as buildings.However, the integration of height information allowed for a better separation between these LULC classes.Overall, the segmentation-based approach with height integration produced a more meaningful graphical representation of the overall LULC class distribution across the study area (Figure 4B).In this classification, the buildings and roads were visually better defined than in the classification without height integration.Four error matrices were created to determine the accuracy of the four classification routines (Tables 4, 5, 6 and 7).The overall results indicated that the segmentation-based classification approach outperformed the pixel-based approach in the identification of primary and secondary LULC classes without height integration, where pixel-based had an overall accuracy of 57.5 % (Table 4) and the segmentation-based approach had an overall accuracy of 67.5 % (Table 5).These results correspond to those of Myint et al. (2011), where they did a comparison between the extraction of urban land cover using per-pixel and object oriented methods of classification.From the results obtained, they reported that the object-based classifier obtained an accuracy of 90.40 % whereas the per-pixel method using ML had an overall accuracy of 67.60 %.Integrating height information in both classifiers allowed for an improved overall accuracy, particularly in separating buildings from paved surfaces that had similar spectral signatures.
Furthermore, the confusion between bare soil and clay rooftops was minimised by adding height metrics to separate between these two LULC classes.In the vegetated areas, the integration of height data allowed for separation between short, medium and tall vegetation in the urban vegetation superclass.The integration of height information significantly improved the overall accuracies of both classifiers, with pixel-based increasing to 73.8 % (Table 6) and segmentationbased increasing to 78.8 % (Table 7).Furthermore, it can be observed that adding the vertical dimension significantly improved the identification of existing buildings.The Buildings class had a producer's accuracy of 84.2 % and a user's accuracy of 94.1 %.This result indicated that of the 84.2 % of the sites identified as buildings, 94.1 % of them are indeed buildings, according to what could be interpreted as reality on the reference material.Although the overall accuracies for the two classifiers was satisfactory, they were below the minimum standard stipulated 85 % by the United States Geological Survey (USGS) general classification scheme (Anderson et al., 1976).The pixelbased classification without the integration of height data had an overall Kappa of 0.49 (Table 4) which means that there was a fair agreement between the classification and the verification data.
Pixel-based with height and segmentation-based both with and without height had Kappa values of 0.62 (Table 5), 0.68 and 0.74 (Tables 6 and 7, respectively).These values are between 0.61 and 0.80, which means that there was a substantial agreement between the classification and verification data according to Cohen (1960).From the results obtained, a total of eight LULC classes were obtained from the single SPOT 6 subset.One additional class was obtained from the integration of height data which then led to nine primary and secondary classes being recorded in those two cases.

Conclusion
The aim of this study was to use a comparison of pixel-based versus segmentation-based classifiers to classify imagery into urban vegetation (grasses, shrubs/bushes and trees), bare soil, waterbodies and built-up (buildings and paved surfaces) LULC classes over the Soshanguve Height data also played a significant role in separating between the normally confused classes; such as the confusion that exists between exposed soils and certain building rooftops because they have very similar spectra (low separability).
The classification results obtained from this single multispectral SPOT 6 image were satisfactory and can, to also take full advantage of the high temporal resolution offered by modern satellite constellations, be used further in time series analysis and change detection.For future studies, it is suggested that integrating height earlier on the workflow could prove advantageous.In that way, the differentiation of height for the various LULC does not depend on the initial classification.For instance, the accuracy of the post-classification height integration method solely depended on how accurate the initial classification was.Therefore, it is recommended that further studies use classification algorithms that will classify the height data with high confidence.Furthermore, future studies can investigate the inclusion of all four bands (red, green, blue and NIR) for the classification to establish if selecting all four bands could further improve the classification results, as achieved with proven (or new) vegetation and soil indices on their own merits in the past.

Figure 1 .
Figure 1.Location of the study area in the City of Tshwane, Gauteng Province classifications were executed for the pixel-based approach; one with and one without the integration of height information.For the classification without height integration, a maximum of ten training sites was selected to represent each LULC class.A total of 80 training sites were selected for this classification.According toCongalton (1991), the larger the number of training sites, the more accurate the spectral signature of each LULC class becomes.The signatures obtained from these training sites were then used in the supervised ML classification.This classifier systematically divided the study area into eight classes based on the spectral signature of the selected training sites.The classification output was then used to obtain the second classification; ML classification with height integration.The semi-automated feature extraction tool from ERDAS Imagine Objective 2015 version was used for the segmentation-based classification approach.Similar to the pixel-based approach, two classifications were executed for this classifier; one with and one without the integration of height information.A feature model consisting of seven sequenced process nodes (Figure 2) formed basis for LULC extraction without height integration in the segmentation-based method.'Raster Pixel Processor' (RPP) was the first node of the feature model.Just like in the ML classification, training sites had to be defined for the eight LULC classes in this step.The trained pixels were then used in the pixel-based 'Single Feature Probability' (SPF) classification to create a pixel probability layer in which each pixel value represented the probability of that pixel being of an object of interest.

Figure 2 .
Figure 2. Objective Imagine segmentation-based classification steps in ERDAS

Figure 3 .
Figure 3. Pixel-based classification results; A) without height, B) with height Township, Tshwane.Maximum Likelihood classification was used for pixel-based and object-based feature extraction was used for the segmentation-based classification.Height data obtained from a qualified nDSM was then integrated into the two classifications to assess whether it would improve the classification accuracy.The overall results showed that the segmentation-based classifier outperformed the pixel-based supervised ML classification even though integrating height information proved to have significantly improved the classification results for both classifiers.
Pixel based and segmentation based classifiers were used in the study to classify the imagery into bare soil; urban vegetation (grasses, shrubs/bushes and trees); natural vegetation (open woodland);waterbodies; and built-up areas (buildings and paved surfaces) as listed and described in Table1.Although the superclass 'Urban Vegetation' was gazetted as such in the 2016 South African Land South Africa Cover Classes and Definitions document, for the purpose of this study this class has been subdivided into three different subclasses (grasses, trees and shrubs/bushes).Height classes were then integrated into the classification in order to assess its effect on the classification products.

Table 3 .
Number of verification points per LULC class

Table 4 .
Classification Error matrix for pixel-based without height

Table 5 .
Classification Error matrix for segmentation-based without height

Table 6 .
Classification Error matrix for pixel-based with height

Table 7 .
Classification Error matrix for segmentation-based with height