Detecting nutrient deficiencies in Eucalyptus grandis trees using hyperspectral remote sensing and random forest

Nutrient deficiencies in commercial forest trees often lead to stunted growth and reduced chances of field survival, resulting in a loss of time, productivity, and trees that can become more susceptible to a host of infections. While conventional foliar analytical methods provide accurate results, they are not time and cost-effective in a high productivity environment. This study aims to test the capability of remote sensing to detect macronutrient and micronutrient deficiencies rapidly in juvenile trees. We acquired full-waveform hyperspectral data (350nm-2500nm) from 135 young trees planted in individual pots in a controlled forestry nursery environment. We quantified nitrogen (N), phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), sodium (Na), manganese (Mn), iron (Fe), copper (Cu), zinc (Zn), and boron (B) in young commercially planted forest variety. This study identified the most critical wavebands for detecting nutrient deficiencies using built-in random forest (RF) measures of variable importance. The random forest algorithm's robustness significantly reduced the dataset's noise whilst producing promising results for certain macronutrients such as P and N (0.95 and 0.89, respectively) and micronutrients such as Mn and Cu (0.90 and 0.86, respectively). We identified the red-edge, near-infrared (NIR), visible and short-wave infrared-2 (SWIR-2) regions of the electromagnetic spectrum as the most effective regions for detecting macronutrients and micronutrients in this study. We recommend testing the use of strategic portions of the electromagnetic spectrum for reducing noise and enabling faster computing time, such as portable near-infrared technology. The prediction results ( R 2 ) of the most limiting growth N, P, in this the findings of previous studies which detected similar R 2 accuracies in foliar wheat and grass samples. The use of the RF improved detection accuracy when compared to previous studies that used VI’s and partial least squares (PLS) to detect N, P, K and Na in wheat samples For P predicted considerably better than previous with R 2 's above 0.90, while previous produced R 2 's below 0.50. The RF algorithm has built-in parameter fine-tuning, which permits the optimisation of the 𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡 function, providing more robust results than generalised VI's used in previous studies. RMSE values remained low; however, we found N samples to be considerably higher than the rest but still permissible. Our findings confirm that foliar micronutrients can be detected in E. grandis x E. urophylla at both deficient and not deficient levels using hyperspectral data and RF . While many studies generally predict macronutrients N, P, K, this study predicted a wide range of micronutrients. Furthermore, this study predicted micronutrients: Ca, Mg, Na, Mg at low and high concentrations. This study could predict micronutrients better with prediction accuracies ranging from 0.66 to 0.90. While other studies used VI's (Adams , 2000; Özyiğit Bilgen, Oliveira et al. this study achieved higher accuracies when using the RF algorithm. These results validate and promote the robustness and effectiveness of the RF ensemble to discriminate each micronutrient, especially when using high dimensional data.

micronutrient deficiencies often result in stunted growth, chlorosis, reduction in protein content, weak stem production and can cause early maturity in some plants (Silva and Uchida, 2000).
Traditional scientists obtain foliar nutrient information using destructive sampling methods such as wet chemistry analysis, which involve ground-based periodic surveys and tedious laboratory work that is costly and time-consuming (Pullanagari et al., 2016). However, researchers have made little progress using indirect spectral methods (Oliveira et al., 2017). Hyperspectral remote sensing offers a rapid, non-destructive and effective approach for detecting key nutrient levels in forest trees.
Hyperspectral data can provide benefits in high productivity environments, such as in younger plants in forest nurseries. The detection of foliar nutrients occurs through specific absorption features within the electromagnetic spectrum. Earlier research explains the physiological link between foliar nutrient content and remote sensing, e.g., (Dixit and Ram, 1985;Curran, 1989;Elvidge, 1990). More specifically, hyperspectral sensors (350nm-2500nm) capture detailed spectral information; however, they are often sensitive to the influence of spectral noise, which negatively impacts classification approaches (Agjee et al., 2018). The presence of spectral noise can significantly impact the quality of the data acquired; hence, classification approaches' performance will deteriorate (Agjee et al., 2018).
For example, Oliveira et al. (2017) successfully estimated the N content of 25-month Eucalyptus trees and compared a wide range of variable importance (VI) results. As a result, the authors obtained the best R 2 's of 0.97 using inflection point position (IPP), normalized difference red-edge (reNDVI) and modified red-edge normalized difference vegetation index (mNDI) in the 400-900nm range. A later study by Oliveira and Santana (2020) estimated the full range of macronutrients and micronutrients: N, P, K, S, Ca, Mg, Mn, B, Zn, Cu, and Fe in Eucalyptus clones using the NIR region (400-900nm) and PLSR. As a result, the authors predicted all nutrients using the coefficient of determination of cross-validation (RCV 2 ), with the lowest and highest estimate was Mg (0.22 RCV 2 ) and N (0.95 RCV 2 ), respectively. The authors found that PLSR and variable selection methods increased the accuracy of nutrient concentration estimates and suggest future studies use wavelength ranges above 900nm. Osco et al. (2020) tested machine learning algorithms: k-nearest neighbour (kNN), lasso regression, ridge regression, support vector machine (SVM), neural network (ANN), decision tree (DT), and random forest (RF) using a proximal hyperspectral sensor (380nm-1020nm) to predict nutrient content on a Valencia-orange orchard. The authors obtained high predictions (R 2 ) above 0.73 for all algorithms and found that RF was the most suitable algorithm.
In summary, many studies have predicted nutrient concentrations using hyperspectral data and a vast array of computational algorithms (Ferwerda et al., 2005;Abdel-Rahman et al., 2017;Wang et al., 2018;Oliveira and Santana, 2020). However, few studies predicted macronutrient and micronutrients deficiency using a full-waveform hyperspectral proximal sensor and RF. Therefore, this study aimed to predict macronutrients and micronutrients in Eucalyptus hybrid trees using fullwaveform hyperspectral data (350nm-2500nm) and the RF algorithm. Furthermore, to our knowledge, no studies have identified the most critical wavebands for detecting nutrient deficiencies in younger trees within a nursery setting. The outcomes of this study will promote the use of remote sensing scanning systems for rapid diagnosis of macronutrient and micronutrient deficiencies in high productivity commercial forestry environments.  (Myburg et al., 2014). Commercial forestry industries commonly grow Eucalyptus trees for their fast growth and superior wood properties (Myburg et al., 2014). Hence, more than 100 countries across six continents (>20 million ha) grow Eucalyptus trees as a timber resource (Myburg et al., 2014). The hybrid species Eucalyptus grandis x Eucalyptus urophylla used in this study are native to Newcastle, New South Wales to Bundaberg in Queensland and the Indonesian Archipelago Timor, respectively (Pinto et al., 2014;Pajares, 2015). The shape of hybrid Eucalyptus grandis x Eucalyptus urophylla leaves was lanceolate with the adaxial side dark green and the abaxial slightly paler than the adaxial side.

Experimental design
We conducted a pot trial experiment to develop more explicit diagnostic indicators and measures of changes in soil nutrient status. We planted 135 hybrid seeds Eucalyptus grandis x Eucalyptus urophylla obtained from a commercial plantation seed orchard in KwaZulu Natal, Midlands, South Africa, in June 2014. To minimize the effect of a microclimate, we randomly arranged pots in the designated nursery environment. The pots were under an open-sided plastic cover to exclude rainfall and kept under natural sunlight. This study's soil type was Inanda soil, the predominant soil type in the Midlands, South Africa (Mucina and Rutherford, 2006). The soil texture was silty clay (56% sand & silt; 44% clay). Distilled water was added automatically via drip irrigation to maintain optimal soil moisture conditions. Drippers had an output rate of 2.2L per hour programmed to water twice a day for three minutes. Fertilizer was added once per week over four weeks and then left to acclimatize for another four weeks ( Table 2). The canopy characteristics of the leaf material sampled in this study were at the sapling stage (Juvenile) of growth with leaf area was 6cm to 10cm long and 2cm to 3cm wide. The saplings grew to a height of 30cm to 60cm with a canopy width of approximately 30cm to 40cm. The root characteristics of the saplings had an elongated rooting structure. The hybrid Eucalyptus is designed with an extensive tap-root rooting system to anchor the trees and horizontal roots that keep the trees upright when planted in the field (Dye, 1996). The experiment continued until each nutrient reached its depletion threshold determined through foliar and growth diagnosis (Table 1). We determined depletion thresholds suggested by Reuter and Robinson (1997). This strategy enabled calibration of extractable soil nutrient levels with tree growth and foliar nutrient diagnostics to enable improved laboratory soil data interpretation ( Table 2). All samples were dried, weighed and analysed for physio-chemical properties and expressed by leaf concentration (%/dry weight or mg g-1) (Knox et al., 2011).

Spectral measurements
Spectral reflectance measurements were taken on the 2 nd of February 2017, using a handheld field analytical spectrometer device (ASD) (FieldSpec® three spectrometers) synchronously with foliar sampling. The ASD measures at a sampling interval range of 1.4nm for 350nm-1000nm and 2nm for 1000nm-2500nm. Reflectance measurements were taken within 1m above the pot, using the fibre optic cable set at 25 degrees field of view (FOV), pointed at the nadir position. We used a white reference panel coated with a barium sulphate of known reflectivity to calibrate the sensor every ten minutes (Spectralon Labsphere, Inc., Sutton, New Hampshire). We acquired ten measurements of each plant (per pot) to derive the representative reflectance spectra for each pot for a total of 135 pots.
The ASD instrument operator was positioned as far away from the area under observation to reduce interferences caused by anthropogenic shadow and reflection ( Figure 2). We placed each pot in direct sunlight on a cloudless day between 10:00h and 14:00h Central African Time.
Furthermore, pots were placed on a stable black platform in an open area to minimize bidirectional reflectance distribution function (BRDF) effects and background scattering. After averaging the spectra, all spectra were converted from radiance to reflectance using ViewSpec Pro software (ASD Inc., Boulder, Colorado, version 6.0.11). All spectra were radiometric and atmospheric corrected to reduce noise using the Natural Environment Research Council (NERC) field of spectroscopy templates (NERC, Undated).

Reference data t-test
In this study, we measured foliar chemistry before and after nutrient depletion. We used nutrient depletion thresholds defined in Reuter and Robinson (1997). We used the foliar chemistry results to test the significant difference in means ( ̅ ) of foliar macronutrient and micronutrient at both high and low levels of nutrient content using a paired -test for nutrient deficient plants (Equation 1). As a result, the paired -test provides a score of the significance (ρ-value) by calculating the difference in the ̅ of two groups data (Ruxton, 2006). We used the ρ-value to calculate the statistical hypothesis test results, assuming the null hypothesis is correct (Meng, 1994). If the -value is <= 0.05, the two datasets are significantly different, and if the -value is > 0.05, the result is insignificant; hence the trees did not deplete the targeted nutrient. We calculated the -test using the following formula metrics: Where is the paired -test, m and s are the mean and the standard deviation of the difference in samples, respectively. n is the size of the sample.

Random forest
This study used RF for regression based on the classification and regression trees (CARTs) to predict (Breiman, 2001). We implemented the RF ensemble using the "randomForest" package R in R statistical software (R Development Core Team). Compared to several other machine learning algorithms, the RF algorithm delivers the most consistent results, especially when using high dimensional hyperspectral data for predicting foliar nutrient data (Amirruddin et al., 2020). The RF algorithm produces decision trees by drawing a subset of training samples through a replacement method known as "bagging". The bagging process refers to selecting the same sample several times while the rest of the samples can remain unselected. During the training process, approximately twothirds of the samples from the training set are used as "in-bag" samples, while the remaining one third "out-of-bag" samples are used as an internal cross-validation technique to test the RF models performance (Belgiu and Drăguţ, 2016). The error produced through the cross-validation technique is known as the out-of-bag (OOB) error. The RF algorithm splits each node by a user-defined parameter called the function whilst each decision tree is independently produced without pruning (Breiman, 2001). Another user-defined parameter is the which grows the forest whilst the algorithm creates trees that have high variance and low bias (Breiman, 2001). The model grows trees until the final classification is taken by the average class assignment probabilities using the arithmetic mean. The RF algorithm evaluates the model using the final classification and produces a new unlabelled data input against all the decision trees, and each tree votes for class membership.
Here, the membership class that receives the maximum votes is the final model selected (Belgiu and Drăguţ, 2016). We parametrised the model using the and functions. RF recognizes that classification accuracy is more sensitive to the than the function. R sets the default value at 500, whereby errors stabilise as the R package grows trees (Belgiu and Drăguţ, 2016).
The value can be optimised to find the best detection accuracies and lowest error rates (Mutanga et al., 2012). After numerous runs through the RF algorithm, an value of 750 provided more accurate results with the data used in this study. We set the and functions to 46 and 750, respectively.

Variable importance
Determining the most critical variables is essential for improving model optimisation, simplification and robustness, especially when dealing with high dimensional data in this study (Liaw and Wiener, 2002). RF has three measures of variable importance. The first measure is based on the number of times a candidate variable is selected in the model. The second measure of importance is based on the Gini impurity when a variable is chosen to split a node as proposed by Breiman (2017).
Finally, the third measure is the permutation of a variable as an ensemble of variable importance (Breiman, 2001). In this study, we selected the third measure (permutation of variables) of variable importance, and we used the mean squared error (MSE) in percentage for determining variable importance. Variable importance was determined for each nutrient to generate a coherent account of the relevant variables used in the prediction models for nutrient deficiency.

Accuracy assessment
We split the final dataset into training (70%) and test (30%) (Breiman, 2017 Table 3 summarises descriptive statistics related to E. grandis x E. urophylla foliar macronutrient and micronutrient at low and high levels. We took a random subset (10 pots) for each nutrient on the repetitive measured pots before and after inducing deficiency to test if nutrient deficiency had occurred. The paired τ-test revealed that there was a significant difference ( ≤ 0.05) in ̅ whereby the average − = 0.0067.

Discussion
In commercial forestry, the supply of nutrients from root to shoot is vital for plant growth and forest productivity. Quantifying nutrient deficient trees in a compartment remains unworkable and could prove challenging when dealing with many younger plants in the nursery. Hence new methods are needed to be adaptable early, either at the nursery before planting, to provide rapid detections or out in the field (Quentin et al., 2017). The early detection of potential nutrient depletion at the nursery level could guide the optimisation of future forest management practices and lead to a more robust approach to nutrient measurement and assessment (Garcia et al., 2018). A recent study by Acevedo et al. (2020) suggests that the use of nutrient loading at the nursery level will improve seedling nutritional status, morphological attributes, and the growth of new roots. The authors suggest the need for modelling growth responses to improve their understanding of physiological processes further.
This study successfully determined foliar macronutrients in E. grandis x E. urophylla at low and high nutrient concentration levels using hyperspectral data and RF. The prediction results were high throughout all macronutrients, which is vital for improving remote sensing efficacy, particularly in deficient samples. The prediction results (R 2 ) of the most limiting growth nutrients N, P, K in this study explain the findings of previous studies (Adams et al., 2000;Axelsson et al., 2013;Özyiğit and Bilgen, 2013), which detected similar R 2 accuracies in foliar wheat and grass samples. The use of the RF algorithm significantly improved detection accuracy when compared to previous studies that used VI's and partial least squares (PLS) to detect N, P, K and Na in wheat samples (Pimstein et al., 2011;Mahajan et al., 2014;Oliveira et al., 2017). For example, P predicted considerably better than previous studies (Özyiğit and Bilgen, 2013;Mahajan et al., 2014) with R 2 's above 0.90, while previous studies produced R 2 's below 0.50. The RF algorithm has built-in parameter fine-tuning, which permits the optimisation of the function, providing more robust results than generalised VI's used in previous studies. RMSE values remained low; however, we found N samples to be considerably higher than the rest but still permissible. Our findings confirm that foliar micronutrients can be detected in E. grandis x E. urophylla at both deficient and not deficient levels using hyperspectral data and RF. While many studies generally predict macronutrients N, P, K, this study predicted a wide range of micronutrients. Furthermore, this study predicted micronutrients: Ca, Mg, Na, Mg at low and high concentrations. This study could predict micronutrients better with prediction accuracies ranging from 0.66 to 0.90. While other studies used VI's (Adams et al., 2000;Özyiğit and Bilgen, 2013;Oliveira et al., 2017), this study achieved higher accuracies when using the RF algorithm. These results validate and promote the robustness and effectiveness of the RF ensemble to discriminate each micronutrient, especially when using high dimensional data.
An important step was to determine which waveband regions correlate with the deficient nutrients.
To our knowledge, our study is the first to examine the critical wavebands for detecting macronutrient and micronutrient deficiencies in foliar tree material. Hence, we could not directly compare this study's results to previous studies, which mainly examined N deficiency in heterogeneous environments (Blackmer et al., 1996;Goel et al., 2003). However, we could correlate our results with other corresponding regions of the electromagnetic spectrum associated with general reflectance markers in foliar material. For example, this study found that most macronutrient deficiencies correlate with wavebands in the VIS (P, Ca, Mg) and NIR (N, Na) regions. Blackmer et al. (1996) found similar correlations in the VIS region when examining N deficiency in corn using a portable spectroradiometer (350-1100nm). Their study could not examine the NIR edge and SWIR regions in the absence of the latest instrumentation, which was essential for determining N deficiency in our study.
Furthermore, According to Goel et al. (2003), wavebands 498nm and 671nm in the VIS region correlated with N stress in foliar corn material. Similarly, in this study, we found waveband 675nm closely related to their study ( Figure 5). However, in this study higher correlations were found for N deficiencies in the red-edge and NIR region. Like N, most micronutrient deficiencies correlated with the NIR (Fe, Zn, B) regions related to Liew et al. (2008). Mn and Cu deficiencies were more closely related to the SWIR-2 region, which is a new finding. We emphasised obtaining lower RMSE values than higher R 2 values, which are highly crucial for generating more robust models for each nutrient during modelling. Lower RMSE values improve the technology's efficacy and provide confidence to the user (forester, nursery manager or technical staff), particularly during the system's implementation into commercial forestry nurseries. The RF algorithm helped provide variable importance measures, essential for identifying the most critical wavebands in the prediction amongst highly dimensional data. Deriving reference paired τ-test results formed an essential component of this study for deciphering between nutrient deficient trees and not nutrient deficient. The results from the reference paired τ-test showed that there was a significant difference ( ≤ 0.05) in samples with deficient and not deficient for all macronutrients and micronutrients.
This study provides a framework for proactive decision making about the nutrient health status of a tree. Nurseries could use this method for quality control and risk assessment purposes. Rapid spectroscopy is cost-effective, time-efficient and requires fewer resources for the chemical processing of samples. Furthermore, future studies should upscale this assessment to a live standing compartment. This study will help foresters, land managers, and commercial timber industries rapidly assess each tree's health status within a compartment. Upscaling to a satellite would be beneficial; however, problems of resolution accuracy may hinder successful detection.

Summary and Conclusion
To our knowledge, this is the first study that has explored remote sensing of a full range of tree macronutrients (N, P, K, Ca, Mg, Na) deficiencies and micronutrients (Fe, Mn, Cu, Zn, B) deficiencies, using full-waveform hyperspectral data (350nm-2500nm) and the RF algorithm. From this study, we conclude that: 1. The study successfully predicted N, P, K, Ca, Mg, S, Fe, Mn, Cu, Zn, and B in E. grandis x E. urophylla using hyperspectral data and RF analysis.
2. Variable importance results predicted wavebands for detecting nutrient deficiencies in E. grandis x E. urophylla.
3. The results improve the efficacy of using remote sensing methods for nutrient analysis in a high productivity forestry nursery environment.
We recommend using this study as a framework for rapid plant nutrient analysis in commercial forestry nurseries. Future research could upscale the results from this study from nursery to field level using remote sensing imagery.