
Soil Moisture Prediction Overview
IDM Solutions partnered with a small farm in Iowa to extend their technology and AI capabilities as they move into creating a seamless smart irrigation system for precision farming. IDM Solutions created a gradient boosted decision tree-based that predicted the percent soil moisture for any given latitude/longitude point with a mean absolute error of 2.45% based on daily weather and Sentinel-2A band data.
Seamless Integration for Soil Prediction for the Client
IDM Solutions provided the client the ability to now identify regions of farmland that are experiencing drought due to the weather and improper irrigation which was previously unachievable for them. This capability propelled the entire ecosystem of the farm forward relative to precession irrigation, AI technology, maximizing crop yield and profitability, labor requirements, and resource maximization.
Why is Soil Moisture Important to Farmers
Soil moisture is one of the largest driving factors in farms as it directly relates to crop yield, nutrient levels in the soil, it influences soil temperature, regulates pest control, and impacts soil structure. Additionally, when a farmer sows their fields, they must have knowledge of the soil moisture as different crops require different soil moisture for optimal growth. Currently, most farms still rely on visual inspection or the placement of soil moisture sensors which quickly becomes a tremendous undertaking especially for larger farms from an infrastructure, materials, cost, and labor perspective.
With the rapid developments of advanced analytics, artificial intelligence, unmanned systems, and Internet of Things, the adoption of technology is increasing at an exponential rate within the agriculture ecosystem. Farms are increasing their technology footprint and replacing manual irrigation systems with smart irrigation systems. It is estimated from Markets and Markets that the smart irrigation market is projected to grow from USD 1.2 billion in 2021 to reach USD 2.3 billion by 2026. Furthermore, it is expected to grow at a Compound Annual Growth Rate (CAGR) of 14.9 % from 2021 to 2026. This growth is attributed to not only agricultural but to smart cities, golf courses, and turf fields whose profitability is correlated to ideal soil conditions and yield.
Why are Smart Irrigation Systems Important
The water level in soil is a critical factor for plant and crop growth. If the water level is too low, the plants will be thirsty and the nutrients they need will not be able to travel through the plant or crop. On the other hand, if the water levels are too high, nutrients can be washed away, the roots can rot, and the plant or crop will not get the amount of oxygen needed to survive. Therefore, optimal irrigation scheduling systems are crucial for maximizing crop yield.
Informed irrigation systems with almost real-time updates allows farmers the ability to optimally and autonomously alter irrigation schedules. However, this requires careful placement of a soil moisture sensor system throughout the farm. These sensors cannot be placed haphazardly as different terrains, crop type, weather and climate changes, and developmental stages of the crop require different irrigation schedules.
Why use Meteorological and Hyperspectral Remote Sensed Satellite Imagery
As opposed to in-situ placed sensors, weather and hyperspectral remote sensed satellite imagery is readily available and easily obtainable. There are no material and infrastructure costs associated with this data or labor as the fields do not need to be instrumented with sensors. A soil sensor has roughly a 3% accuracy, can transmit data wirelessly around 2,000ft, and measures roughly 1-liter volume of soil. As the number of sensors are increased in the field, the more accurate soil estimates will be achievable. However, the number of required sensors grows with the size of the farm. There is tremendous cost associated with implementing these systems from a material, infrastructure, resource, and labor perspective with increasing farm size. A farmer must be careful when instrumenting their farm and analyzing the data as different sensors provide different data types. In comparison, weather data is easily obtainable, and the farmer does not need additional tools to analyze the data. In fact, the information can be gathered from the TV or internet. Infrared (IR) bands from satellite imagery along with calculated vegetation indexes, specifically the normalize difference vegetation index (NDVI), provides vast information to the farmer over a wide range of land. NDVI simplify measures plant greenness and is a function of the near infrared spectrum which is not part of the visual-light spectrum. Therefore, it can detect crop stress due to pests, drought, disease, and flooding at earlier stages then can be detected by a farmer or soil sensor.
Enabling a ML approach to predict percent soil moisture
In partnering with a small farm in Iowa, IDM Solutions created a soil moisture prediction framework using five (5) years of daily historical data. Table 1 depicts the recorded daily weather variables and their respective units of measure which are used in the feature space.
Table 1: Weather variables used in the feature space with corresponding units
Hyperspectral remote sensed Sentinel-2A satellite imagery was acquired from Google Earth Engine. Sentinel-2A is a multispectral satellite with 13 bands in the visible, near infrared, and short-wave infrared spart of the spectrum. Data was acquired at a spatial resolution of 10m every 5 days. A linear regression was fit to each band to up sample the data to a daily value to correlate with the daily weather data. Table 2 depicts the Sentinel-2A bands, the central wavelength, and bandwidth.
Table 2: Bands from Sentinel-2A satellite with associated central wavelength and bandwidth
The normalize difference vegetation index (NDVI) which is used to quantifying green vegetation by normalizing green leaf scattering in near infrared (NIR) wavelengths with chlorophyll absorption in red wavelengths was calculated using band 8 and band 4 from Sentinel-2A. Combining both the weather and satellite bands yields 26 features with a total of 2460 daily values.
For the ML model creation, the training data set was set to 80% of the sampled days (1574 instances) and the testing data set accounted for the remaining 20% (492 instances). The validation data set was derived from 20% of the training data set. A gradient boosted decision tree was created in Python using Gradient Boosted Regressor with the Scikit-learn toolbox. Before the tree structure was determined, the feature space and corresponding ground truth values (% soil moisture) were normalized and randomly shuffled. This was done to ensure that the testing and training data sets were not biased.
A grid search was used to find the optimal value (by minimizing the mean absolute error (MAE) between the predicted and ground truth percent soil moisture) for the learning rate, maximum depth of the tree, estimators, minimum number of leaves, minimum number of splits, and subsample ratio. The search space for each hyperparameter were defined as
learning rate: {0.09, 0.1, 0.2}
maximum depth: {4, 5, 6}
number of estimators: {1800, 2000, 2200, 2400}
subsample: {0.5, 0.6, 0.7}
minimum samples per leaf: {1, 2, 3}
minimum samples per split: {2, 3, 4}.
The optimal parameters were found to be
learning rate: 0.09
maximum depth: 6
number of estimators: 2400
subsample: 0.7
minimum samples per leaf: 3,
minimum samples per split: 4.
The results of the prediction framework are shown in Fig. 1 where the measured percent soil moisture points are depicted as black squares with a 3% error as grey bars (to align with the nominal measurement error of sensors) and the predicted % soil moisture is depicted as red lines. The model was able to capture the measured percent soil moisture with a MAE of 2.45%.
Figure 1: Observed % soil moisture values (black circles) with 3% error bars (grey line) and predicted % soil moisture from the gradient boosted decision tree (red line)
Figure 2 depicts the comparison of predicted values and measured values (blue circles). A linear regression curve was fit to the points in the least squares sense with a R2 value of 0.86 which far exceeds what is seen in the literature with typical R2 values around 0.7.
IDM Solutions further provided information to the client by extracting the relative feature importance for each feature. Figure 3 depicts the relative feature importance where the month is the most informative feature and the maximum wind speed is the least informative.
Figure 2: Predicted % soil moisture vs measured % soil moisture (blue circles) with a linear regression best fit line with R2 value of 0.86
Figure 3: Relative feature importance for all features in the development of the gradient boosted decision tree
Prediction Capability
Working with a small farm in Iowa, IDM Solutions was able to create a seamless custom ML-based framework for the client to predict percent soil moisture using only easily obtainable readily available weather and satellite data. The client gained the ability to track and monitor regions of their farm for non-optimal soil moisture with limited to no impact to their existing infrastructure and material cost. The client is now able to blend this framework into creating optimal sowing times for crops, identify regions of drought or over irrigated areas, and provide feedback into smart irrigation systems for precision farming such that they maximize their crop yield and profitability.