Building extraction for 3D city modelling using airborne laser scanning data and high-resolution aerial photo

Light detection and ranging (LiDAR) technology has become a standard tool for threedimensional mapping because it offers fast rate of data acquisition with unprecedented level of accuracy. This study presents an approach to accurately extract and model building in threedimensional space from airborne laser scanning data acquired over Universiti Putra Malaysia in 2015. First, the point cloud was classified into ground and non-ground xyz points. The ground points was used to generate digital terrain model (DTM) while digital surface model (DSM) was produced from the entire point cloud. From DSM and DTM, we obtained normalise DSM (nDSM) representing the height of features above the terrain surface. Thereafter, the DSM, DTM, nDSM, laser intensity image and orthophoto were combined as a single data file by layer stacking. After integrating the data, it was segmented into image objects using Object Based Image Analysis (OBIA) and subsequently, the resulting image object classified into four land cover classes: building, road, waterbody and pavement. Assessment of the classification accuracy produced overall accuracy and Kappa coefficient of 94.02% and 0.88 respectively. Then the extracted building footprints from the building class were further processed to generate 3D model. The model provides 3D visual perception of the spatial pattern of the buildings which is useful for simulating disaster scenario for emergency management.


Introduction
From visualization to functional solution goal oriented use, the need for three-dimensional (3D) building geometry has continued to grow over the last 3 decades.As a result of this, 3D city modelling has been a subject of research interest to geographic information system (GIS) and remote sensing community for a range of applications such as urban planning, 3D cadastre, utilities and telecommunication facility management, architecture, safety, marketing, et cetera, using different approaches and data sources (Biljecki et al. 2015).The complexity of environmental challenges in the phase of increasing rural-urban migration and its consequences on urban development, climate change and land use demand proper planning through 3D map updating.3D models enable identification of high-risk urban zones by providing additional physical parameters to the topography, socioeconomic variables and hazard indicators for initial assessment of emergency situation.Spatial information of buildings can be obtained from several sources, land surveying, airborne and space-borne platforms (Cheng et al. 2011;Sampath & Shan 2007), however, the accuracy varies accordingly and this explains why it is still a subject of intensive research the years.The traditional land surveying method of detailing building footprint is time and labour intensive (Nagai et al. 2008).Satellite image provides excellent source from which building footprints can be derived over wide coverage; however, small to medium scale geospatial enterprise may find the cost of high resolution satellite imagery prohibitive for their projects.Several studies have been conducted to automatically extract building from satellite images, nevertheless, limitation of the data to twodimension (2D) space hampers the ability to use them for 3D modelling (Lee et al. 2003;Shufelt & Mckeown 1993) (Zhang et al. 2006).Advances in photogrammetric engineering and software development enable generating elevation data of terrain features from aerial photos taken with aircraft or unmanned aerial vehicles (UAVs).Photogrammetric method has the benefit of medium to large aerial coverage, manoeuvrability in terms of time and weather and fast processing, but, the 3D data generated of low vertical accuracy (Mitchell & Macnabb 2010).
Today, LiDAR has become a standard geospatial data source for accurate 3D modeling.Laser scanners (airborne and terrestrial) provide precise xyz points that represent the 3D geometry of the surface imaged.In addition to the xyz points, the reflected data collected by the scanning laser sensor records gray levels intensity images that show the strength of the returned laser pulse reflected from the object (Liang et al. 2016).LiDAR and its derivatives such as digital terrain model (DTM), digital surface model (DSM) and normalised DSM (nDSM) have been widely used in 3D application domain (Turker & Koc-San 2015;Yu et al. 2010;Rottensteiner & Jansa 2002).Similarly, several algorithms have been used to aid the extraction of building footprints from laser scanning data (Zhang et al. 2006;Yan et al. 2015;Gilani et al. 2016).One of the challenges with using points data alone for building extraction is density (Sampath & Shan 2007).Point data itself is a discrete representation which does not offer consistent depiction of the building edges.The effect of building footprints by surrounding noise worsen where tall trees forms canopy that extends into the roof area.To solve this problem, focus has been shifted to improving accuracy of building extraction by combining high-resolution 2D imagery and laser scanning data (Rottensteiner & Jansa 2002;Tomljenovic et al. 2016).
Deciding the best method to extract building from diverse dataset available is a difficult task.
Building, especially at the roof top, are heterogeneous in terms of geometry, material types, colors, chemical properties, and even climatic setting.This makes it difficult to have a unique approach that could be applicable to all situations.Moreover, point density plays a major role when using laser scanning data for building extraction.It is a fact that low point density produces irregular boundaries for linear features, but the major issue with integrating high density point and images is that it leads to confusion between classes during the classification process even with object-based classification method (Blaschke 2010;Blaschke et al. 2014).So, in this study, we improved building edge extraction by integrating four airborne laser scanning (ALS) derived datasets and high resolution orthophoto using object-based image analysis (OBIA) techniques and created accurate 3D models of buildings for visualization and spatial planning.

Study area and data
This study was conducted over the faculty of Engineering, Universiti Putra Malaysia.
Geographically, the faculty is located in the north of the campus precisely between Latitude 3 o 00' 11.95" N to 3 o 00' 32.72 '' N and Longitude 101° 43' 06.41" E to 101° 43' 23.64" E (Figure 1).The study area is selected because it is accessible and represents a typical dense urban landscape with a mix of low and high-rise buildings, sparse vegetation and open water (lake).The LiDAR data was collected in 2015 by Ground Data Solution Bhd over University Putra Malaysia using Riegl scanner aboard EC-120 Helicopter flown at an average altitude of 600m above the terrain surface.The point cloud acquired has an average point density of 6 points per square meters with vertical accuracy of 15cm on non-vegetated terrain and horizontal accuracy of 25cm.
In addition to the xyz data, the scanner also records the intensity of all the pulses of light that bounce off the target and stores them as a grey scale image.Hence, intensity images are made of pixels representative of the energy of laser pulses returning back to the system (Hinks et al. 2015).
Concurrently, RGB colour image of the scanned area were acquired using Canon EOS5D MARK III camera with focal length of 35mm mounted on the aircraft.The camera has horizontal and vertical resolution of 72Dpi respectively and exposure time of 1/2500sec.

Methodology
Three primary data sets (xyz point cloud, laser intensity image and aerial photo) acquired over the study area and their derivatives were used in this study.Initial task involves generating surface and terrain model from the ALS data and the derivation of other datasets from intensity image and aerial photo.Subsequently, all the primary data and their derivatives were combined as a single image file, each representing a layer, for classification.Lastly, the buildings were extracted and modelled in 3D space.The overall methodological workflow is shown in figure 2.
Figure 2. Overall data processing workflow

Data processing
Data processing started with sub-setting point clouds that belong to the selected site which was subsequently filtered using curvature filter, a slope-of-the-slope analysis (ESRI 2016) to obtain the terrain points.Record of laser returns is from any target stroke; ground and non-ground ones.To derive the required digital elevation model, only the returns from bare-earth are needed and hence the data needs to be separated by filtration (Zhao et al. 2008).The two sets of points (terrain and surface points) were further processed to generate the DTM and DSM respectively using inverse distance weighing (IDW) interpolator.DTM represents the digital model of the bare earth's surface while DSM depicts elevation information of land cover including the terrain and surface features (Bater & Coops 2009;Yu et al. 2010).Subtracting the former from the latter produces nDSM (Figure 3b) that represents height of features above the ground surface (Yu et al. 2010).nDSM is mathematical expressed as: [1] These derived datasets (Figure 3) and the aerial photo (Figure 4) were layer stacked into a single image file where each of the aforementioned data represent image band for further analysis where each dataset represent a band.

Classification process
Consensus has been reached among the remote sensing community that land cover information can be extracted with better accuracy using data from multiple sources.This concept is called data fusion (Gilani et al. 2016;Li et al. 2013;Awrangjeb, Mohammad Zhangb & Clive 2013;Awrangjeb et al. 2010;Hermosilla et al. 2011;Blaschke 2010;Blaschke 2013).A recent study by Gibril et al. (2016) highlights that layer stacking data from different sources into classification process preserves the spatial and spectral information in the individual band and therefore increases the accuracy of the extracted feature with object based image analysis (OBIA).Today, literature is overwhelmed with report of the efficiency of OBIA which has currently earned wide acceptance in the field of remote sensing as a preferred technique for accurate object recognition, scene classification, and information retrieval (Blaschke 2010).As opposed to the pixel-based approach which utilizes only the spectral information in each pixel for information extraction, OBIA uses spectral information from a set of similar pixels assumed to belong to the same object by exploiting the spectral properties that include colour, size, texture, shape and contextual information (Demers et al. 2015).
The primary goal is to detect all buildings with minimum segments possible.So, the stacked image file was input into the segmentation process as the basis for feature extraction.Ordinarily, since the aim is to extract buildings, the segments should primarily divide the building rooftops accordingly.However, due to differences in composition of the roof surface material, the roof of a single building may be divided into several segments.Therefore, options leading to optimised segments depend on correct selection of algorithm and segmentation parameter values.This process was executed in ENVI5.3 using edge segment algorithm (Mavrantza & Argialas 2008) with scale level and merge level of 52 and 97 respectively, texture kernel of 3 and employing full lambda schedule.Edge segment algorithm generally detects objects with distinct boundaries using Sorbel edge detection (Mavrantza & Argialas 2008).The scale and merge level determine the size and shape of the segments.In the case of over-segmentation, Full lambda schedule merges small segments with larger ones.This results in the segmentation process which partitions the image into unclassified image objects; thus, classification is required to extract the features of interest.
The image was classified into four classes: buildings, roads, vegetation, and water bodies.The classes were defined using ground truth information collected prior to the data processing.Sample segments used as ground truth data (reference data) for training and accuracy assessment were selected with the aid of the high resolution orthophotos and guided by the general knowledge of the site.The reference data was divided into two parts, 70% and 30%, for the image classification and quality evaluation respectively.Care was taken to ensure that training samples selected for any particular class vary across the representative objects of the class in order to capture the different attributes of the specific class.This was particularly important for the vegetation class since both trees and low vegetation are classified as one class.After selection of training data set, the segmented image was classified using support vector machine (Haitao et al. 2007).Support vector machine (SVM) is a classification technique based on Vapnik-Chervonenkis dimension theory and Structural Risk Minimization (SRM) rule.It has been proved by several researchers that SVM is as good as or even better than other competing methods (Turker & Koc-San 2015;Haitao et al. 2007;Christopher 1998).SVM separates classes with a hyperplane surface to maximise the margin between the respective classes and this can be performed for non-linear and high-dimensional problems (Haitao et al. 2007).Classification accuracy was evaluated using confusion matrix (Lee et al. 2003).Confusion matrix compares the ground truth data and the classified results to determine the probability of omission and commission presented as a percentage of the overall accuracy.
For improved feature extraction, the classification result was subjected to post-classification editing to homogenize small irrelevant and unclassified classes.This was achieved using three postprocessing algorithms: sieving, clumping, and aggregation (Tomljenovic et al. 2016).Sieved classes locate isolated classified pixels using blob grouping after which the irrelevant/isolated pixels were termed as unclassified.Thus, clumping was necessary whereby the previously obtained unclassified pixels were clumped to surrounding classified areas using morphological.The last post-classifier aggregates small class region to a bigger one thus effectively cleaning up the classification results.

Building extraction and 3D modelling
One of the advantages of OBIA is that the output is always a vector data that can easily be transferred to any GIS software for further analysis (Haque et al. 2016).The classified image object was exported to ArcGIS10.2 to extract building footprints.Using simple attribute query, the building class was separated as a single vector file layer, followed by clean-up operation in preparation for 3D reconstruction.Before any editing was done, the percentage of detection was evaluated using completeness and correctness analysis (Xiao et al. 2012).One of the main tasks was to isolate buildings that have different heights but are represented by single feature.These polygons were manually edited into separate entities based on the nDSM to ensure that such building sections have the same heights using the average height value within the building polygon.Also, cars mistakenly classified as building because of the height consideration were manually deleted.Furthermore, edge smoothing operation was carried out to straighten jagged edges caused by obstructing tall trees that cover some roof sections Once this was completed, the heights associated with the polygons were automatically determined from the nDSM using the average height of the polygon area.The height values was subsequently used to generate the building block model (Idrees et al. 2013) that gives the desired 3D visualization of the buildings.

Classification result
The classification process produced was targeted at four land cover classes: buildings, vegetation, water bodies and roads (Figure 5).Vegetation class has the highest coverage area representing about 54.97% of the entire study area while the building and road classes considered impervious surfaces constitutes the remaining 40.14%.The lake within the faculty occupies 4.18% of the land area.It can be observed that the usual noisy appearance associated with pixel-based classification results is not present.OBIA allows extraction of features in their natural setting with discrete boundary for the respective classes.The vector output eases building extraction for 3D reconstruction.In the Figure 5, isolated features particularly within the pave surfaces can be seen misclassified as building.These are cars wrongly identified as object above the surface based on the height components of the DEM and nDSM.Aside that, the colour properties of those vehicles carry similar surface reflectance that is confused with the spectral reflectance of roof materials.This phenomenon was corrected using the post classification tools mentioned earlier.Moreover, the classification result has no issue with shadow affect because the imaging angle is near vertical and the resolution of the image is also high (Zhou et al. 2009).
For the accuracy assessment, 234 points, widely spread among the classes were selected in the image.More pixels were chosen within building class as the main focus of this study.Confusion matrix (or error matrix) depicts the degree of similarity between the classified image and the ground truth data (reference).The diagonal cells show the number of truly classified pixel between classified image and reference data while the non-diagonal cells shows the error and the number of pixels not matching their land cover classes (Green & Congalton, 2004).For example, from the detail analysis of the individual error (Table 1), it can be observed that 14 pixels out of 165 pixels classified as building do not actually belong to building class but objects such as vehicles on the road with height component similar to low building structures or building with roof-top that have similar spectral characteristics of vegetation.But for the other classes, misclassification did not occur.A measure of the individual land cover class performance (Table 2) indicates that 100% class accuracy is obtained in all the classes except the building class which produced 91.52%.This result further proves the advantage of the fusion process.Quantitative evaluation of the classification process yields overall accuracy of 94.02% and Kappa coefficient of 0.88 (Table 2).These classification accuracy indicators are good enough for the map to be used for decision making process.

Building footprints extraction and 3D modelling
According to Tomljenovic et al. (2016), regular building outline is difficult to obtained from ALS data alone, particularly with low density point cloud.In this study, the use of additional information in the classification process improved the accuracy of building detection.However, some building outlines still produced jagged and irregular boundary (Figure 5), primarily due to incoherent point samples along the building edges.Outline irregularities are much more pronounced in areas where tree canopies cover building roofs.However, adoption of a combination of the postclassification enhancements procedure (Cheng et al. 2011;Hermosilla et al. 2011;Tomljenovic et al. 2016) resulted to better footprint of the individual building (Figure 6).Sampath and Shan (2007) posit that regularity of building boundaries is proportional to point spacing with a precision of 18% to 21%.This study reveals that building outline horizontality increases slightly with the use of auxiliary data.Conversely, the usual curvilinearity along edges associated with building footprint extraction using LiDAR point data alone significantly reduces (Sampath & Shan 2007).

Conclusions
Laser scanning is a state-of-art technology that provides precise xyz coordinates of the imaged surface for variety of 3D applications.This study combines laser scanning derived datasets and aerial photo to extract and model buildings in 3D geometry.Integration of different datasets allows extraction of building footprints with high level of precision.Nevertheless, some level of manual editing is required to achieve better accuracy with respect to building edges.The use of intensity data as additional information is valuable, though, it also introduce some amount of noise along border lines especially where two different land cover classes share boundary with varying elevation.Overall, the extracted building is complete and of good quality to generate 3D model.
The building block model did represent the actual height but did not depict bevel-shaped roof facades.Nevertheless, the outcome demonstrates reliable 3D model for visualization, development planning and disaster scenario modelling to aid emergency preparedness and management.The limitation of the currently available free source GIS software for complete 3D modelling reflecting the true roof facades and the wall structure is the shortfall of this study.Future study will explore the interoperability with third-party packages for precise modelling of the roof top in their correct 3D representation of 3D city and for disaster modelling applications.

Figure 1 .
Figure 1.Location of University Putra Malaysis in Peninsula Malaysia (right) and Faculty of Engineering from Google Earth image (left)

Figure 4 .
Figure 4. Land use / land cover classification map

Figure 6 .
Figure 6.Extracted building footprint superimposed on the orthophoto

Figure 7 .
Figure 7. 3D model of the buildings with vertical exaggeration of 1.25.Average water level of the lake is 40.6 m

Table 1 .
Comparison between ground truth and classified image (Confusion matrix)