A multi-temporal phenology based classification approach for Crop Monitoring in Kenya

The SBAM (Satellite Based Agricultural Monitoring) project, funded by the Italian Space Agency aims at: developing a validated satellite imagery based method for estimating and updating the agricultural areas in the region of Central-Africa; implementing an automated process chain capable of providing periodical agricultural land cover maps of the area of interest and, possibly, an estimate of the crop yield. The project aims at filling the gap existing in the availability of high spatial resolution maps of the agricultural areas of Kenya. A high spatial resolution land cover map of Central-Eastern Africa including Kenya was compiled in the year 2000 in the framework of the Africover project using Landsat images acquired, mostly, in 1995. We investigated the use of phenological information in supporting the use of remotely sensed images for crop classification and monitoring based on Landsat 8 and, in the near future, Sentinel 2 imagery. Phenological information on crop condition was collected using time series of NDVI (Normalized Difference Vegetation Index) based on Landsat 8 images. Kenyan countryside is mainly characterized by a high number of fragmented small and medium size farmlands that dramatically increase the difficulty in classification; 30 m spatial resolution images are not enough for a proper classification of such areas. So, a pan-sharpening FIHS (Fast Intensity Hue Saturation) technique was implemented to increase image resolution from 30 m to 15 m. Ground test sites were selected, searching for agricultural vegetated areas from which phenological information was extracted. Therefore, the classification of agricultural areas is based on crop phenology, vegetation index behaviour retrieved from a time series of satellite images and on AEZ (Agro Ecological Zones) information made available by FAO (FAO, 1996) for the area of interest. This paper presents the results of the proposed classification procedure in comparison with land cover maps produced in the past years by other projects. The results refer to the Nakuru County and they were validated using field campaigns data. It showed a satisfactory overall accuracy of 92.66 % which is a significant improvement with respect to previous land cover maps.


Introduction
Climate change is caused by both natural processes and anthropogenic activities.These activities include industrialization, deforestation7reforestation activities, agriculture and energy production, waste management, urbanization and urban sprawl, change of land use, soil erosion, drought and desertification, and water degradation.The knowledge about the territory plays a key role in the management of territory and in the planning of many human activities.The relevance of land monitoring and mapping, that leads to quantifying changes in land cover, is widely recognized as a key element in the study of global changes (Hibbard et al. 2010;Herold 2011).Climate change will have a substantial impact on future food and crop production (Easterling et al., 2007), in turn affecting land use.Agriculture is the most important sector of many developing countries but long-term productivity is threatened by increasingly intensive soil use for cultivation and livestock production.
Agricultural land cover maps are critical for monitoring the current conditions and long-term changes of crop and pasture lands.
The UN (United Nations) FAO (Food and Agriculture Organization) Africover project (Di Gregorio, 2005) which was specifically dedicated to crop classification, generated the most recent and detailed land cover map for Central-Eastern African countries in the year 2000s.Since then, population growth and major policy changes have caused the increase in use of land for agropastoralism and systematic expansion of cropland area (Eckert et al., 2017, Kennedy et al., 2018).On the other hand, certain provinces of the region have been deeply affected by the loss of agriculture due to climatic change and human activities (Negussie et al., 2011, Roba & Oba, 2013, Nguru & Rono, 2013, Mganga et al., 2015, Balogh et al., 2016).Moreover the burning of grass and bush savannah results in loss of vegetation cover, that leads to increase of soil erosion and soil desiccation that can later be worsened by drought and the declining of groundwater levels (Malabuyoc et al., 1985).The need of updating the land cover maps of cropped areas arises from all these considerations.
One of the objectives of the SBAM project is to update the crop areas map of Kenya to better understand how much effective the land use change could be in causing productivity loss and food security alarm.Automated, low-cost remote sensing tools are well suited for continuously monitoring crop growth and providing farmers with timely information about crop.Vegetation Indices (VI) derived from satellite images are well correlated with the parameters that define crops status (Hatfield & Prueger, 2010, Panda et al., 2010), and for nearly four decades the Normalized Difference Vegetation Index (NDVI) has remained one of the most consistently and widely used across a wide variety of sampling spaceborne sensors since that provide us with plurality of information and historical archives.NDVI data at high temporal frequency have been widely used to track seasonal phenology of green-up and senescence over a large variety of ecosystems from space using NOAA's Advanced Very-High Resolution Radiometer (AVHRR) (Justice and Hiernaux, 1983;Justice and Townshead, 1985;Justice and Holben, 1986;Vrieling and de Beurs, 2011) and NASA's Moderate Resolution Imaging Spectrometer (MODIS) (Fisher, 2007;Hmimina, 2011).Thus, earth observation (EO) data are recognized as very important for monitoring the crops health status and productivity, and capable of providing the necessary information to food security and early warning systems.
The SBAM (Satellite Based Agricultural Monitoring) project, funded by the Italian Space Agency, is one of the initiatives resulting from the new Italian-Kenyan cooperation protocol.The project, proposed and managed by the Aerospace Engineering School of the University of Roma 'La Sapienza', has four main objectives: a) produce an updated map of the agricultural areas for Kenya based on Landsat8/OLI (Operational Land Imager) and Sentinel-2/MSI (Multi-Spectral Imager) imagery; b) develop an automatic monitoring system able to classify agricultural areas and detect land use changes; c) develop and deliver to the Kenyan partner of the project a system capable to download and process automatically Landsat8, Sentinel2 and MODIS images by providing standard products (vegetation indices, statistics, temporal analysis, etc.); d) provide a tool for assessing the agricultural changes in terms of stability and crop yield, and study the feasibility of a tool capable to forecast crop yields.
The first updated map of one of the main agricultural areas of Kenya was produced in November 2016.The classification process was carried out by implementing a decision tree (DT) algorithm on a pixel basis (Friedl et al., 1997, Fayyad et al., 1992, Li et al., 2015, Villa et al., 2015).The classification takes into account crop phenological characteristics, crop calendar and Agro-Ecological Zones (AEZ) defined by FAO.The procedure is described in Section 2. The agricultural areas classification scheme has been calibrated and validated through field campaigns carried out in August 2015 and October 2016 in the Kenyan Great Rift Valley with the support of University of Nairobi students and local authorities (Sugar Research Centre).The computed map has a 15-meter spatial resolution.Such a resolution is based on a pan-sharpening procedure which exploit the pan-chromatic channel of Landsat-8 (L8) imagery.The complete HW and SW system to download automatically freely available images (Landsat-8, Sentinel-2, Terra/MODIS and MSG/SEVIRI) was delivered to the University of Nairobi in March 2017 as per the SBAM project proposal.

Study area
The SBAM project focuses on developing crops classification maps for the whole Kenya.
However, the agricultural activities are restricted to a limited part of the country, in particular to the Kenya's Great Rift Valley.The study area currently under analysis is the Nakuru County located within the Western Highlands.Nakuru county covers most part of the Rift Valley and includes bordering escapements and plateaus.Livestock and agriculture lay by far the most important part in the economy of the county.The climate is cold and wet with a mean temperature of 10-15° C and annual average rainfall of about 1200-1400 mm.The area mainly falls in the agro-ecological zones UH (Upper Highlands zone, annual mean temperature, 10-15 C, occasional night frost ) and LH (Low Highlands zone, annual mean temperature, 15-18 C, annual mean minimum 8-11 C, normally no frost), and is an excellent county for sheep farming and to a lesser extent also good for dairy.It is mainly a Wheat/Maize Zone with the exception of the UH-2 zones because maize is affected by cold weather and frosts.We focus our attention on the classification of the agricultural species within the AEZ LH of the county.

Data
A total of 13 OLI Landsat-8 scenes (Path169/Row 60, listed in Table 1) level 1 (Fig. 1, right), terrain corrected and characterized by a cloud cover contamination lower than 30%, covering a time period from January to December 2015 were collected (L8 data courtesy of the U.S. Geological Survey, GloVis, 2005).
An automated pre-processing tool in IDL (Interactive Data Language) image analysis software, was used to extract the metadata, georeference each L8 OLI image and convert image Digital Numbers (DN) into radiance and Top of Atmosphere (TOA) reflectance values.The acquired scenes were mosaicked to create maps at a local and regional scale.The images were stacked together to organize a multi-temporal package.In this way, we were able to create OLI multi-temporal image mosaic at a 30-m spatial resolution.

Methods
The use of Landsat imagery for mapping the vegetation at species level within a heterogeneous environment could be a challenging task.For example, a vegetation classification at a species level in the Amanos Mountains (Southern Turkey), used Landsat images combined with the environmental variables and forest management maps to develop vegetation maps at a local scale (Domaç and Süzen, 2006).Another example is based on combined information of high resolution multi-spectral satellite imagery (ASTER and Landsat) and environmental factors (such as elevation, slope, aspect, topographic position, distance to the nearest stream, relative moisture index, and soils) derived from various geospatial layers in Harlan County, Kentucky, for Hemlock classification (Kong et al., 2008).
However, recent studies (e.g.Pena-Barragan, 2011;Zhong et al., 2012;Son et al., 2014)   The pre-processing and classification process workflow is shown in Fig. 2. Immediately after the pre-processing step, and just before the NDVI computation, an FHIS pan-sharpening technique was applied to the data.Then a multi-temporal layer was computed in order to retrieve the NDVI historical projection (which is the phenological profile), on a pixel basis, over the area of interest.Once the phenological profile is obtained we can extract information to properly characterize the phenological signature.
At the same time, decisional rules have been created to drive the classification process and assign a semantic characterization to each single phenological signature.Those rules were made to recognize the vegetation development through the season.Crop timing was specifically characterized for each single agro-ecological zone, assessing the agricultural potential and possibilities within the area of interest.Climatic, topographic data and soil type information are critical for agro-ecological zoning application, as well as the characterization of the typical vegetation for area under analysis.AEZ data needs to be coupled with crop calendars and it is known that, for the same crop species, can significantly differs from one agro-ecological zone to another.The agro-ecological zoning for the entire Rift Valley province was imported from the FAO Farm Management Handbook of Kenya Vol II, Part B ( Jaetzold & Schmidt, 1983).Reference profiles for the main crop species were imported from the FAO crop calendar (available at http://www.fao.org/agriculture/seed/cropcalendar/welcome.do) for each single agro-ecological zone.
The FAO crop calendar was updated using the knowledge of local farmers and experts from KALRO (Kenyan Agricultural and Livestock Research Organization), during the field campaigns carried out in August 2015 and October 2016.During the field campaigns ground truth data were collected to be used during the validation process.Validation sites for maize and wheat (both first and second season) was collected in the Nakuru county for the year 2015 and were used in validation process carried out by using a confusion matrix.

Pan-sharpening
A pan-sharpening technique was implemented to increase image spatial resolution from 30 m to 15 m; such techniques are well documented in literature for the ETM+ Landsat 7 (Molina et al., 2005) imagery, but the resolution enhancement of Landsat-8 OLI imagery intended for vegetation studies is quite challenging due to the lack of the near infrared band (NIR) within the OLI panchromatic channel.To overcome this issue the FIHS pan-sharpening technique was applied.
The spatial enhancement was obtained by using the equation developed by Johnson in 2014: where MShigh is the pan-sharpened multi-spectral band, MSlow is the original multispectral band resampled at the panchromatic image resolution using the Nearest Neighborhood method (in order to reduce the arbitrary assumptions and preserve as much as possible the original image information content) and I is an intensity image derived from the multi-spectral bands (Blue, Green and Red channels).The new image was intended to be a simulated high resolution panchromatic image (based on the low resolution multi-spectral image).Spatial resolution details were derived by subtracting the simulated panchromatic image from the original one then those details were directly added to the low resolution bands, in particular, Red and NIR (Johnson, 2014).
The FIHS is an additive technique.This means that introducing the new high-resolution Red/NIR bands into the NDVI computational process, will not alter or erase the parameters from which the resolution enhancement depends (as it could happen in case of multiplicative techniques).Moreover, the resolution enhancement strongly depends on the difference of Red-NIR bands reflectance and it increases with increase in difference value.Therefore, the FIHS algorithm was particularly suitable to be applied on vegetated areas where the reflectance difference between the Red and NIR bands is higher.

Phenological information extraction
Phenology is the study of periodical plants and animals cycle and their influential factors such as seasonal and inter-annual variations in climate as well as habitat factors like elevation (Demar & Rutishauser, 2011).Phenological data is little affected by cloud contamination and very useful in discriminating between different crop species, crop rotation cycles and biomass seasonal trends.The NDVI time series is stacked into a multi-layer pseudo image in which each single band relates to an NDVI specific date (Fig. 3).Ground reference sites have been selected to be representative of the country main crop types.This paved the way to information extraction: the crop seasonal evolution, the crop rotation schedule and the soil preservation technique were retrieved and identified for each test site.Starting from the land use category identification, the semantic classes were defined .For each vegetated species the following parameters were retrieved: a) the cultivation development characteristic, a temporal parameter that is related with phenological amplitude (Fig. 3); b) the maturation and health status peak, which were related with phenological intensity (Fig. 3); c) the onset of the vegetation development that defines the calendar date in which the start of season (SOS) takes place, and the onset of the senescence period, or the end of season (EOS), that is related to the calendar date for which the seasonal development will end (Fig. 3).d) both the growth and senescence rate, which are the slop of the phenological curve immediately before and after the SOS and EOS dates respectively, were taken into account.
All these parameters lead to the setup of decisional rules implemented within a decision tree algorithm capable of recognizing and grouping together similar crop species and then comparing the annual developmental cycle with the historical one, which could assess the crop reaction to stress conditions.

Classification
Traditional supervised and unsupervised classification approaches clearly show their limits when it comes to the crop species discrimination problem (Lu & Weng, 2007, Yichun et al., 2008).
Ultimately This is particularly true because of the high variability of the study area in terms of ground cover, which is directly related to the highly fragmented nature of the Kenyan landscape.The single date approach to the classification problem is very likely to result in multiple misclassification errors since the crop type discrimination only relies on vegetation spectral differences.Forest areas and evergreen cultivation are both characterized by high NDVI values and often represents another source of errors.
Agricultural areas show no vegetal cover during both crop senescence and soil preparation periods: but those periods also should be taken into account by the classification methodology.The coexistence of the same crop type at different developmental stage within a limited area, as well as inter-cropping practice, crop rotation and soil management techniques, they are all factors capable of introducing variability into the ground cover scenario; all of this contribute for the classification task to be more complex (Pena-Berragan et al. 2011).
The introduction of agricultural timing information into the classification process through multitemporal approach can support the crop discrimination process at species level.Geographic and climatic information can be introduced through the Agro-Ecological zoning in order to: 1) reduce the area involved into the classification process, and 2) calibrate the classification algorithm in accordance with the agricultural potential of each AEZ (Domac et al., 2006).A phenology-based classification process needs a priori definition of the semantic categories.This means that a previous study of the area of interest is necessary to make the classification properly run.
Decision trees were proposed for land cover classification in Pal and Mather (2003) due to their simplicity, interpretability, and fast computational model.
The decision tree was composed of a root node, which was related to the whole data set, a set of internal nodes, the splits, and a set of terminal nodes, the leaves.Algorithms for building a decision tree use the training data to split the original dataset into non-overlapping regions, which correspond to the terminal nodes (or leaves) of the tree.Each region was described by a set of rules, and these rules were used to assign a new observation to a particular region: thus, we could impose decisional rules besides the classification analysis based on a previous knowledge of the sample sites phenological characteristic within each single AEZ.It is known that decision trees show several advantages with respect to the more traditional classification algorithm: they can easily handle numeric and categorical inputs whatever is their distribution; they are not parametric and they can manage nonlinear relations between features and classes (Friedl and Brodley, 1997, Fayad et al., 1992, Li et al., 2015, Villa et al.).The whole data set was split into smaller groups by a binary recursively partitioning where the output of each step was a labelled class or another node and where each node is characterized by a phenology based decisional rule (Koziol & Wozniak, 2009).
Nakuru County was divided into several zones considering their different agro-ecological characteristics.Once a reference phenology was known, it was possible to select the salient features that significantly described the associated semantic classes.
A specific Multi-variate Decision Tree (MDT) implementing linear discriminant functions after the architecture strategy showed in Fig. 4, was developed for each AEZ.Tree models were built from training data for which the response values were known, and these models were subsequently used to classify response values for new data.The binary splitting tree (in which each parent node was split into two child nodes) construction process, was imposed by the topology of the events to be classified.
A Sequential Forward Selection (SFS) was performed: the SFS is a bottom up search method that starts with zero features and tries to add the feature that that will cause the largest increase of some partition-merit criterion (Brodlet & Utgoff, 1995).User driven decisional rules were generated from training sites through the analysis of crops temporal development within the related environmental scenario by applying the SFS strategy.At the very beginning of each MDT, a test was settled searching for vegetated areas.This first node opens the way to a series of exclusive tests, in order to: 1) recognize the phenological signature, if present, of the pixel under study and, 2) compare the pixel phenological signature just detected with a set of reference phenologies sampled from relevant crops.
Each rule is based on NDVI reference values (extracted from the training sites) to be reached within a specific period of the year.At the same time, the overall temporal crop development as well as the full maturation peak and the period over which this parameter is stable before the insurgence of the vegetation senescence, were considered to characterize each crop species.Another branch of the decision tree was in charge of revealing, on a pixel basis, phenological profiles related to an evergreen ground cover (associated with forested area, tree plantation, tea fields or isolated trees), or with vegetated areas that show a phenological development that cannot be related to agricultural areas (shrub land and pasture land tagged as 'No Agriculture' areas).A preliminary classification aiming to remove not-vegetated areas (that is, water basins, urban areas, etc.) was not necessary since each pixel that does not show a proper NDVI value (or a series) was automatically masked out.

Results
The classification tree algorithm, based on phenology information retrieved from a 12-month multi-temporal series of Landsat8 OLI images (2015), was used to characterize LULC in the Low Highlands (AEZ LH) in Nakuru County, Kenya.The result is shown in Fig. 5, where the final classification map was compared with the one provided in 2000 by the Africover project.Both first and second season's wheat, maize and evergreen areas (forest, tree plantation, tea fields or isolated tree) were classified; residual unclassified areas were characterized by a phenological profile that clearly identified them as agricultural class, at the same time, this significantly differed from the three major crop classes that we are taking into account into this study and for which validation sites on the ground are available.It is worth to note the significance of this classification details in comparison with others.In Fig.
5 two details taken from the same area were highlighted for both the classification products.The Africover project output map (Fig. 5a) shows a thematic classification in which each single polygon refers to a labeled code that give us information about the nature and the percentage extension of all the possible land cover (agricultural and not) that appears within the polygon boundaries.All these information do not actually provide the effective extension or the exact position and boundaries of the crop fields .The Africover shows a coarse classification at a field level (minimum polygons size of about 35 ha), not very useful in precision farming (sub-hectare scale) and agricultural monitoring activities.The updated classification map based on MDT procedure (Fig. 5b) shows crop fields boundaries, cultivations total extent and exact location.(Fig. 6) All the classified areas fall into Africover polygons for which a congruent land use tag has been indicated.
The reduction of misclassification errors that, for the case study, mainly occurs between classes characterized by higher NDVI values, leads to a significant improvement of the classification performance.As a matter of fact, if less than 20% of the Africover polygon area was covered by agricultural fields, this class would have been neglected as per Di Gregorio & Jansen, 2005.Therefore, in the case of a polygon covering an area of 35 ha and agricultural field of 7 ha would be neglected (omission error).In the present classification approach, the error (omission or commission) involves the single pixel corresponding to 0.022 ha.Ground truth data collected during the field campaign was used to validate the classification algorithm and the related confusion matrix is shown in Tab. 2. The high overall accuracy (92.66%) achieved demonstrated the high potential in distinguishing between major crop species with a relatively small multi-temporal set of images.The limited numbers of ground truth data can result in overestimating the goodness of the results achieved; nonetheless the classification algorithm has proved to be effective in discriminating different crop types, even if characterized by a very similar temporal development, as in the case of first season wheat and maize.
However, there is room for improvements: new decision rules could be implemented with the aim to widened the temporal window through which the crop calendar scheme is applied.The experience gained so far makes it possible to concentrate on land cover layers that might contain a specific land use class, not searching anymore a specific crop profile but proceeding step by step to identify major crop categories (cereals, vegetables and so on) and later easing the analysis and reducing processing time.

Discussion and conclusions
The SBAM (Satellite-Based Agricultural Monitoring) project focused on developing a validated satellite based method for estimating and updating the agricultural areas in the Central-Africa regions.
In this paper, a cropland classification study incorporating two different data, i.e. remotely sensed and ancillary data was presented.It investigated and analyzed the applicability of pixel-based classification approach for the particular study area of the Nakuru region in central Kenya, characterized by the very small size of crop fields, where usually agricultural is practiced across the country.The solution presented is based on the utilization of freely available data, i. e. Landsat-8 time series data in combination with Agro-Ecological maps that helped to adapt the classification procedure to the agricultural potential of the study area.Temporal resolution of Landsat-8 enabled the capturing of the misaligned phenological development of selected crop types.The classification method adopted in the study exploited spectral and temporal information to identify vegetated/agricultural areas and distinguish crop types by using their unique phenological characteristics.Image analysis based on Landsat-8 imagery, through MDT method has facilitated the achievement of an accurate crop fields classification in the study area.The method can be promoted for monitoring crops and land use.Results confirmed the advantage given by the use of pan-sharpened 15 m Landsat-8 data both in resolution and classification accuracy.The decision tree classification scheme showed promising results: decision tree classification based on NDVI multi-temporal dataset thresholding is rapid, flexible and computationally efficient.Using rule-based approach, it is possible to classify large areas can be classified in short period of time and the process is less sensible to errors in training data, on the other hand a large number of training data is required to reach high accuracy.
The rule-based approach is known for being intuitive and interpretable.Rules and conditions were derived from crops parameters and environmental descriptors; then, the rules were directly applied to the phenological based input dataset, which is representative of crops overall development and current conditions.
The improvement obtained by the suggested approach in the classification of the agricultural area, with respect to maps computed in the framework of previous projects (Africover, Globcover), has been shown.The achieved overall accuracy is considerably satisfying in the context of agricultural management applications.Expert-based tuning and assessment is required to calibrate the procedure and to widen the study area to country level.These preliminary results were encouraging and demonstrated the potential of phenological parameters obtained from time series of pan-sharpened OLI derived NDVI for agricultural land use classification.Further analysis are needed to 1) apply this approach in large areas and 2) to test different time span and different vegetation indices, 3) to combine Sentinel 2 data to generate consistent time series.

Figure 1 .
Figure 1.(on the left) Kenyan agricultural areas (colored in blue) and Nakuru county (highlighted in red); (on the right) Landsat8 tiles covering the agricultural areas of Kenya (blue tiles), Landsat8 tile covering the Nakuru county (red tile).
have shown that for an accurate crops classification both spectral (e.g.NDVI) and phenological information is needed.A further improvement in the discrimination of different crop species could be obtained by introducing a spatial resolution enhancement technique.The FIHS (Fast Intensity Hue Saturation) pan-sharpening technique (Johnson, 2014, Luciani et al., 2016) was used to increase Landsat 8 imagery resolution, from 30m to 15m, to overcome the limitation imposed by the relatively low resolution for the specific study objectives (crop classification and monitoring) in an area characterized by the small sizes of the agricultural fields.

Figure 2 .
Figure 2. Pre-processing and Classification process workflow
, supervised classifications generally attains greater accuracies because they are trained on 'user knowledge'.At the same time supervised classifiers are based entirely on the data they have seen during training: unseen data could result in hidden pattern.This limitation is particularly evident in the context of supervised crop classification.Phenological data can potentially help the classifiers perform better outside the available training dataset (Conţiu, & Groza, 2016).

Figure 4 .
Figure 4. Decision tree strategy (on the left), example of the DT applied to multi-temporal NDVI dataset thresholding in cropland classification (on the right).

Figure 5 .Figure 6 .
Figure 5. Nakuru county, AEZ LH -Comparison between the Africover classification map (5a; upper left) and a decision tree classification map for the year 2015 (5b; upper right); the same sample area has been expanded to highlight the different classification detail.

Table 1 .
List of Landsat-8 images used to obtain the results described in the paper

Table 2 .
Classification validation results