Comparative Assessment of Homogeneity Differences in Multi-Temporal NDVI Strata and the Currently Used Agricultural Area Frames in Rwanda

This study compared two methods used for agricultural statistics generation in Rwanda. The first method is area frame sampling, which is also the currently used method in Rwandan seasonal agricultural surveys; while the second method is the application of remote sensing technique using multi-temporal Normalised Difference Vegetation Index (NDVI) classes to stratify land into homogenous agriculture land classes. The analysis of the methodological flow of Rwanda area frames and the estimated homogeneity in the resulting frames was mainly based on literature review. For the delineation of homogeneous NDVI classes, the study used 10 years data from Moderate Resolution Imaging Spectroradiometer (MODIS) sensor (2004 – 2014). The NDVI data were classified using ISODATA clustering technique, and the focus was put on agriculture-dominated classes, obtained through the intersection with 2010 national land use and land cover data. Analysis of Variance (ANOVA) and Fisher’s Least Significant Difference (LSD) statistical methods were applied to investigate significant differences between and within NDVI classes and the currently used Rwanda strata in terms of area coverage of four (4) dominant crops in Rwanda – banana, maize, cassava, and beans. The results of the analysis revealed homogeneity of 85% within NDVI classes, and 69% within the current Rwanda strata, at p = 0.05. The NDVI classes were also used to improve the Rwanda strata, and the homogeneity has increased by 5%; reaching 74% after NDVI-based reclassification.


Introduction
Continuous growth of the world's population imposes countries to produce more food from limited land resources to ensure that "all people at all times have physical, social and economic access to sufficient, safe and nutritious food which meets their dietary needs and food preferences for an active and healthy life" (Committee on World Food Security, 2009). Agricultural statistics include all information contributing to agricultural development and food security in general; and are key to achieve food security goals (Boyko and Hill, 2009). In addition, agricultural statistics are required to support planning processes of a country; provide information for public policy analysis, debate and advice; observe agricultural sector performance in the country; monitor and evaluate the impact of policies and programmes; and enlighten decision-making processes (Kiregyera et al., 2007).
Moreover, agricultural statistics support the monitoring, evaluation, improvement, and strengthening of early warning systems regarding country's food security (FAO, 2016).
Finally, agricultural statistics contribute to ideas development and decision making of local people, private sector, and importantly, local farmers about their agricultural businesses. Maningas, Villagonzalo, and Macaraig (2004) indicated that agricultural information empowers farmers through having control over their resources and decision-making processes. Nevertheless, usefulness and high quality of the agricultural information depend on three key factors: accuracy -conforming exactly or almost exactly to the ground truth; reliability -showing all details about the reality they represent and errors they contain; and timeliness -availability on time of need (Cotter et al., 2010;Wigton and Bormann, 1978).
The generally used methods to generate agricultural statistics include census, registers, administrative data, and sampling frames (area frame sampling, list frame sampling and multiple frame sampling) (Benedetti et al., 2010;Eurostat, 2015;Kennel, 2008;Väisänen, 2009). However, remote sensing has recently come into use for land stratification to collect and generate agricultural statistics (Boryan et al., Hunt, 2014). This study has focused on two of the above-mentioned methods namely Area Frame Sampling, as currently applied in Rwandan Seasonal Agricultural Surveys (SAS), and Remote Sensing with the use of Normalized Difference Vegetation Index (NDVI) data.
The area frame sampling method is firstly based on construction of frames -sets of all agricultural holdings in an area of interest based on their homogeneity, using visual interpretation method (Wigton and Bormann, 1978). Secondly, the segments -homogeneous land units enumerable ideally in one day, are drawn from the frames. Finally, representative sample segments are randomly selected for the survey (Kennel, 2008). The method is popular in agricultural surveys because it is believed to be cheaper, reliable, save time, and suitable in case of a large population (Menza et al., 2008). It is used by renowned organizations such as Food and Agriculture Organization of the United Nations (FAO), National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) among others (Cotter et al., 2010). Although the method is claimed to be accurate with the ability to select representative sample segments for agricultural surveys, it might be an ideal situation different from reality, given the degree of heterogeneity existing in nature (Turner and Gardner, 1994). Remote sensing is also currently used for land stratification to generate strata for agricultural surveys (Carfagna and Gallego, 2005).
The aim of this research was to apply multi-temporal NDVI data from MODIS, to stratify land in Rwanda and generate agricultural statistics for the four (4) main crops in Rwandan agricultural season A (September -February of the following year) for 2015 -2016 agricultural year, and make a comparative assessment of homogeneity differences in both currently used area frames and the NDVI strata.

Study area
The study area for the present research was not based on administrative boundaries. The NDVI stratification determined the representative areas for field data collection. The study area was composed of four NDVI Classes: NDVI class 24, NDVI class 54, NDVI class 70 and NDVI class 82 ( Figure 1). Classes were spatially distributed into 7 administrative districts. Administratively, NDVI class 24 was mainly located in Bugesera district (Eastern Province); class 54 in Nyaruguru and Nyamagabe districts (Southern province); class 70 in Nyabihu district (Western Province) and Musanze district (Northern Province); and class 84 had a part in Musanze and Burera districts (Northern province), and another part in Nyagatare district (Eastern Province) as presented in Figure 1 below.

Figure 1. Selected Sample NDVI Classes for the Study Area
Class 24 covered an area of about 6,485 ha; class 54 covered 57,031 ha; class 70 covered 19,583 ha; and class 82 covered an area of about 20,769 ha, making a total area of 103,868 ha which is equal to 9.3% of entire cropland area in Rwanda, as of 2010.
Agro-ecologically, as illustrated by Figure 2, class 24 was majorly situated in Eastern Savannah of Rwanda: a region with an average altitude of 1,400 m above mean sea level (AMSL), 850 mm of average annual rainfall, mean annual temperature of around 21 o C, with old infertile soils with a variable texture. Class 54 was majorly located in both Congo-Nile divide (area with an average altitude of about 2,100 m AMSL, 1,600 mm of mean annual rainfall, mean annual temperature of around 19 o C, with soil dominated by humic soils at medium altitude), and central plateaus (area with average altitude of about 1,700 m AMSL, 1,200 mm of average annual rainfall, mean annual temperature of around 17 o C, with soil dominated by ultisols derived from volcanic materials). Class 70 was located majorly in volcanic summits and high plains (area with an average altitude of about 2,200 m AMSL, 1,500 mm of average annual rainfall, mean annual temperature of about 17 o C, with soil dominated by ultisols derived from volcanic materials), and Congo-Nile divide. Finally, class 82 was majorly located in both volcanic summits and high plains and Eastern Savannah.

NDVI data
Different NDVI products such as those from MODIS, AVHRR, MERIS sensors, etc., have been used with success for various studies; either for natural phenomena studies or agro-ecological studies (Barbosa et al., 2006;Benedetti and Rossini, 1993;Bolton and Friedl, 2013;Funk and Budde, 2009).
The rationale for selecting NDVI was that it is one of the radiometric measures of the amount of greenness on the land according to the photosynthetic activity present on that land (NASA, 2015;Tucker and Sellers, 1986). This attribute justifies the ability of NDVI to group together areas from which vegetation cover appears similar in terms of density and content, to maximise the homogeneity within the same stratum.
This study applied NDVI data from MODIS sensor onboard Terra satellite with 250 m spatial resolution (MOD13Q1 products). The main reasons were -quick revisit time, a reasonable spatial resolution given the extent of the study area, and wide swath covering a large area. Data are produced as 16 days maximum value composites to reduce effects of clouds, directional reflectance and offnadir viewing effects, sun angle and shadow effects, and aerosol and water vapour effects (Holben, 1986;King et al., 2003). In order to ensure continuous and long-term homogeneity consideration in sites within the study area, the data's temporal coverage considered in the present study is 10 years (2004 -2014).

NDVI data pre-processing and processing
In Where Rescaled NDVI refers to the obtained NDVI values after the rescaling process (ranging between 0 and 255, and later, normalised to the normal NDVI range from -1 to 1). Original NDVI refers to the original NDVI values as in originally acquired MOD13Q1 images (ranging between -3,000 and 10,000). Obtained images were stacked and the layer stack was cleaned and smoothed using Savitzky -Golay smoothing and differentiation filter, in order to fit the data points optimally to a polynomial in the least square logic, to eliminate noise in the data and fit them with optimal data (Luo et al., 2005).

Land stratification based on MODIS -NDVI and representative sampling
The unsupervised classification was performed on the 221 MOD13Q1 images in order to determine similar classes. These images were following the general phenological characteristics of crops in Rwanda: where ploughing, planting and germination occur in September, growing of most of the crops in October and November, and the maturity stage and harvesting in December and January, for Season A. Seasons B, goes with ploughing, planting and germination in February, growing of most of the crops happen in March and April, and the maturity stage and harvesting happen in May and June. This can be observed later in Figure 6. The classification applied the Iterative Self-Organizing Data Analysis Technique (ISODATA) (Ball and Hall, 1965;in Rescaled NDVI = Original NDVI * 0.02125 + 43 Memarsadeghi., 2007). Minimum number of classes taken was 10 and the maximum was 100, with 50 as maximum number of iterations, and 1.0 as convergence threshold (Ali et al., 2012). Separability analysis using average and minimum separability was used to find optimal number of classes, which was found to be 95 for the country ( Figure 5). The classes were overlaid with recent Rwanda land use data (2010 data), and it was revealed that only 24 classes contain more than 50% of agriculture. In the end, the sampling was drawn from the 24 agricultural NDVI classes.

Random-representative clustered sampling
The size of a sample area was equal to the pixel size of the used MODIS NDVI data; which was originally 231.66 * 231.66 (53,666 m 2 ~ 5.37 ha) after the data acquisition. After data conversion and projection to match with Rwandan land use data, the final pixel size became 231.92 * 230.37 m ~5.34 ha. Similarly, conversions and geometric transformations resulted into a pixel shift of 1.69 m to the right (for the upper corner of NDVI data pixels), and 2.51 m to the right too (for the lower corner).
However, the shift could not meaningfully affect the analysis, given the size of the pixel and aim of the study, considering the avoided possibility of half a pixel shift.
The available fieldwork time prompted the selection of 12 sample sites per sample NDVI class.
Nonetheless, one site was excluded after visiting the area and realized that there was no vegetation cover due to the recent conversion from agriculture into industrial zone construction (Figure 4a and 4b). Henceforth, the final number of sample areas was 47. Sample areas were selected randomly given that they fulfil three conditions. First, being located in the innermost part of a class, to avoid influences from outside. Second, being located close to the road, for accessibility. Finally, the spatial distribution of sample sites had to follow a pattern of four (4) sample sites (first cluster) in the North or West of a class, second cluster in the Centre, and last cluster in the South or East of the class ( Figure 2).

Current area frame sampling in Rwanda
The assessment of the current area frame sampling method, as applied in Rwanda, was carried out mainly based on literature review. The used documents explain not only basic principles, procedures and rationale for using the technique in an agricultural survey (Cotter et al., 2010;Wigton and Bormann, 1978) but also how the technique was specifically applied in 2012Rwanda Seasonal Agricultural Surveys (NISR, 2013, 2015c, 2015b, 2015a. Moreover, the used documents clarify that area frames for the surveys in Rwanda are still constructed using 2008 aerial photographs, by the support of visual interpretation technique, to delineate "homogeneous" strata (area frames) based on crops intensity. From the defined area frames, segments are delineated by the use of physical features such as roads, paths, rivers, etc.

Data collection
Land cover types and their area coverage were collected in the field for two months (October -November 2015). They were collected using a printed field map with the 2008 high-resolution aerial photographs, and the field observation (with recording of geographical coordinates using GPS), and unstructured interviews with local farmers found in their agricultural field during the fieldwork. The field data were entered into computer system through digitisation process to get final interpreted data as shown in the following maps (Figure 3a and 3b). After data entry, the area of each land cover type was calculated and presented in a table for further analysis. Table 1 is an example of land cover area data from the sample site 5411 in Figure 3. pairwise comparison, to identify specific differences between the classes. Thirdly, one-way ANOVA was carried out for clusters within the same NDVI class to evaluate the level of possible significant differences between clusters within the same class, to ensure that every single NDVI class is homogeneous (if no significant differences between its clusters), or otherwise heterogeneous.
Fourthly, only in case of significant differences within a class, Fisher's LSD was applied to identify specific differences among clusters pairs of the class, leading to the quantification of the level of homogeneity and heterogeneity of the class.
However, for Rwandan Strata, most of the sample sites (40 out of 47) were located in Rwandan stratum 1. Therefore, the analysis was limited to the Rwandan stratum 1, and two first stages of analysis, given the fact that there were not enough data to make statistical analysis for the rest of the strata.

Land stratification with the currently used area frame sampling in Rwanda
Currently, area frames for Rwanda seasonal agricultural surveys are designed using the 2008 aerial photographs. By the use of visual interpretation technique, interpreters consider areas with similar crops' intensity and delineate them as homogeneous strata.

Rwanda strata data types and level of up-to-date
The currently used area frame sampling in Rwanda uses the 2008 aerial photographs for the area frames design. The photographs have been in use for seasonal agricultural surveys (SAS) since 2012 till up to now (2020). The use of outdated spatial data for agricultural surveys may result in biased sample strata selection which may lead to erroneous data representativeness and error propagation into overall national agricultural aggregates. This is supported by the argument of Roser (2015), that agriculture is among the most rapidly changing land uses, consequently, it is necessary to rely on upto-date data for agricultural surveys.

Rwanda strata data's spatial and temporal richness
Aerial photographs used to define the currently used area frames in Rwanda are of high spatial resolution (0.25 m * 0.25 m) (MINIRENA and RNRA, 2009). However, the photographs are onetime (2008) data in terms of temporal coverage. Therefore, they have a weakness in encountering spatial-temporal land cover and land-use changes that took place after the aerial survey.

Rwanda strata and sampling process with consideration of natural phenomena
Land stratification for the currently used area frames in Rwanda is done through visual interpretation technique, which might introduce errors. In fact, the stratification depends on personal perception and professional experience of the interpreter (Baks et al., 2013). This is not only increasing the level of bias in stratification and sampling design but also it might ignore important natural phenomena, such as possible abrupt changes in the natural system given the rapid changes in agricultural land cover. This is exemplified by the area in figure 4b where agricultural activities were cleared 2 months prior to the fieldwork. The land clearance was for the favour of new industrial zone construction. However, these changes do not appear in the 2008 aerial photograph (Figure 4a) and yet, the area had been classified as agriculture for different seasonal agricultural surveys since 2012. Rwanda. Nevertheless, within this period, many changes occur and they might introduce errors in the subsequent surveys.

Land stratification using multi-temporal NDVI data
Separability analysis of NDVI classes using average and minimum separability values of the classification results from 10 to 100, revealed that an optimal number of classes to distinguish heterogeneous areas in Rwanda from 2004 to 2014 is 95 ( Figure 5). The 95 classes were overlaid with the 2010 Rwanda land use data, and 24 classes were found to be dominated by agriculture (≥ 50%). For 10 years, a class had had similar behaviour in terms of vegetation intensity, but with some changes in some classes. For representative sampling and time available for the research, four (4) sample NDVI classes were purposively selected given that all classes from those with low vegetation intensity to those with high vegetation intensity throughout agricultural seasons are represented ( Figure 6).

NDVI data types and level of up-to-date
NDVI classes are defined based on the amount of greenness present in a specific area over time (NASA, 2015). This indicates that, no matter how the vegetation cover of an area is rapidly changing, if it changes with a similar pattern throughout a year and repeatedly over time, the area should be considered as the same class. This makes it an important tool in agricultural surveys, which deals with rapidly changing land use. Furthermore, most of the NDVI data are regularly up-to-date. For instance, MODIS data used by the present study are available every two days since 1999 (NASA, 2002). Other data include NOAA-HVRR data which are available every day since 1981 (Tucker, 2009), but with coarser spatial resolution, thus suitable for surveys over a big sized land such as of a continental scale. Additionally, most of the multi-temporal NDVI data are open and free of charge.

NDVI data's spatial and temporal richness
NDVI data applied by the present study are not very rich in the spatial dimension, as they aggregate a 250 m 2 into one unit with the same reflectance value "a pixel" (Liew, 2001). However, they are very rich in the temporal dimension. Moreover, spatial richness was claimed back through integrating the data with recent land-use data (RNRA, 2010) with a high spatial richness to allow the knowledge of each land cover contained into a single pixel area and its coverage (in ha and %).

NDVI stratification process and sampling with consideration of natural phenomena
NDVI data consider events occurring in nature according to similarities in areas' vegetation photosynthetic activity (Tucker and Sellers, 1986). Nonetheless, the area recently converted for industrial zone construction (Figure 4a and 4b) was classified as agriculture because the present study did not include NDVI image data of 2015; a year when the changes took place. However, if fieldwork would take place the following year (2016), the 2015 image data would have been included, for the recent changes to be considered as well for the new stratification. Hence, representativeness is improved by the use of up-to-date data. Additionally, the land classification process is done with the use of the computer system after the user's instructions, through unsupervised classification, which reduces the bias from the interpreter and promotes replicability of the process.

NDVI strata time frame validity period
NDVI-based land stratification should be updated every agricultural year, thanks to the regular availability of up-to-date data. The regular data updates ensure that recent spatial-temporal heterogeneity is incorporated into new land stratification for the subsequent agricultural surveys. This would regularly improve homogeneity within the strata of new surveys.

Land cover types and area coverage in Rwandan strata
The land cover data collected from the field per sample NDVI class were overlaid with the currently used area frames. It was found that among the 47 total sample sites, 40 were fully located in Rwanda stratum 1 "Intensive hillside cropland"; 1 sample site fully in Rwanda stratum 5 "cities and towns", 2 sample sites fully in Rwanda stratum 9 "forest", 2 sample sites in Rwanda stratum 10 "tea plantation", 1 sample site partly in Rwanda stratum 1 and partly in Rwanda stratum 9, and 1 sample site partly in Rwanda stratum 1 and partly in Rwanda stratum 5 (Figure 7). In the end, only the data for sample sites in Rwanda stratum 1 were enough to make further statistical analyses.
The land cover types identified include main crops of agricultural season A (banana, maize, beans, and cassava), and other different crops and cover types. However, looking at the presence and intensity of crops and land cover types, the cover types found in Rwandan Strata do not present similar patterns in intensity, as it would be expected for cover types from the same stratum which is assumed "homogeneous". In addition, there are crops and cover types which were dominantly found in some sample sites but were not found in other sites from the same stratum. This might imply that the considered "same" stratum might be presenting a too low level of homogeneity to be considered entirely homogeneous.

Figure 7: Land Cover Types in Rwanda Strata
Note: Sample sites in frame were excluded from further analysis due to data unrepresentativeness

Land cover and area coverage in NDVI classes
There has been identified a remarkable patterns of land cover intensity according to their respective NDVI classes and clusters. Looking at dominant land cover types, for instance, cassava dominated class 24, but it was hardly present in other classes. The grass was abundant in class 24 and 54, but argued that forest segregation might be difficult to achieve in different areas of Rwanda using coarse resolution NDVI data, as the focus of this study was mainly on crops, it would be gladly recommended for future similar studies to try the relatively high (or moderate) resolution data such as Landsat OLI and/or Sentinel 2 NDVI data, for the forest cover differentiation in Rwandan heterogeneous landscapes.
Furthermore, some of the crops and other land cover types such as onions, carrots, sisal, cabbage, sugarcane, wheat, tea, coffee, groundnuts, peas, eggplants, tree tomatoes, taro, and soybean were not frequent and sometimes found only in one sample site, hence, these very dominated crops can also better be studied with high-resolution NDVI data, given their limited spatial coverage in most of the Rwandan landscapes.

Statistical analysis of variance of crops cover area in Rwandan strata
The following Table 3 presents area size of main season A crops cover per sample site, as collected from the field. Results of ANOVA showed that there were 75% of significant differences (heterogeneity), and 25% of non-significant differences (homogeneity) between clusters within Rwanda stratum 1 in terms of the main crops at 95% confidence level (p was 4e-4< α (0.05) in terms of Banana cover; and 0.07 > α (0.05) in terms of Maize cover; and 2e-4 < α (0.05) in terms of Beans cover; and 1e-16 < α (0.05) in terms of Cassava cover). However, the pairwise comparison using Fisher's LSD indicated that, overall, Rwanda stratum 1 was 31% heterogeneous (specific significant differences) and 69% homogeneous (specific non-significant differences) at 95% confidence level.

Statistical analysis of variance of crops cover area in NDVI classes
The ANOVA results revealed that NDVI classes were significantly different in NDVI classes ((pvalue was 0.02 < α (0.05) in terms of Banana; 0.04 < α (0.05) in terms of Maize; 2.02e-6 < α (0.05) in terms of Beans; and 2.25e-13 < α (0.05) in terms of Cassava)), at 95% confidence level. Therefore, the classes were significantly separable. To investigate specific significant differences within the classes, Fisher's LSD revealed that among total pairs of clusters within NDVI classes; there were 15% significant differences (heterogeneous), and 85% no significant differences (homogeneous), at 95% confidence level. All in all, NDVI classes presented higher homogeneity in terms of the four main crops, compared to the currently used Rwanda Stratum 1.

Use of NDVI stratification to improve homogeneity within Rwanda stratum 1
To analyse the possibility to improve currently used Rwanda strata, the misclassified sample sites within Rwanda strata were correctly classified according to the ground truth collected in accordance with NDVI classification. This resulted in 4 sample sites being reclassified as part of Rwanda stratum 1. ANOVA results for the reclassified Rwanda stratum 1 showed that, then, significant differences were lowered to 26%, and non-significant differences increased to 74%, at 95% confidence level.

Conclusion
The present study carried out a comparative assessment of homogeneity differences in both currently used area frame sampling in Rwandan Seasonal Agricultural Surveys, and multi-temporal NDVI stratification. The land use and land cover data collected from the field were used to assess homogeneity differences in both Rwanda stratum 1 and NDVI strata, in terms of Rwanda season A main crops (banana, maize, beans, and cassava). However, the assessment of other aspects of current Rwanda strata such as techniques used to stratify land, data types, etc., was mainly based on literature review.
NDVI stratification resulted in strata different in both temporal behaviour and spatial distribution.
Furthermore, sample NDVI classes were found to be significantly different in terms of Rwandan agricultural season A main crops at 95% confidence level, which confirmed their separability for effective representativeness in an agricultural survey.
Land stratification using NDVI and the currently used area frame sampling were found to have differences and similarities, but this study found NDVI stratification to be more reliable for improved homogeneity within classes while delineating homogeneous strata for agricultural survey. The most important difference lied in the homogeneity encountered in classes. NDVI classes were found to encounter a higher level of homogeneity compared to the current Rwanda strata, in terms of the main crops at 95% confidence level. ANOVA and Fisher's LSD results revealed 85% and 69% of homogeneity within NDVI classes and Rwanda stratum 1 respectively. Moreover, NDVI stratification was used to improve the homogeneity within the Rwandan Stratum 1, and this resulted in a 5% homogeneity improvement for the latter, reaching 74%.
Comparing aspects such as level of updating the data used for stratification, types and availability of data, and time of land stratification validity, it was found that land stratification for agricultural surveys using NDVI data has more potentials for accurate agricultural statistics and reduction of error propagation, compared to the currently used area frame sampling method.

Acknowledgements
The authors acknowledge the priceless effort and the contribution of various people and organizations for the success of this research. We are deeply thankful to the former Netherlands authorising and facilitating field data collection. Finally, authors would like to acknowledge the great effort of the anonymous reviewers who took their time and provided very insightful, thoughtful and instructive comments which surely helped to leverage the quality of this research output.

References
Ali