Overview

IDM Solutions partnered with a DoD-focused client to develop a custom computer vision-based AI model capable of detecting the boundary between water and land in RGB specific overhead imagery. The model has a small footprint which provides the client the ability for deployment across several platforms of interest within an operational context and only uses RGB images which reduces the technological need for specific sensor suites and advanced processing algorithms. Optimally determining the weighting of four UNET models with different backbones using a metaheuristic search methodology, the segmentation model achieves a mean Intersection over Union (IoU) of 94% with only 2k RGB images.

The successful delivery of phase 1 of the project has provided the client with a baseline model for phase 2, which will focus on leveraging generative AI methods to develop synthetic data and create a more robust set of images for improved accuracy and performance in the presence of limited data.

Approach

Satellite images have become an indispensable source of information for various industries, from agriculture to urban planning to amphibious operations in the military. However, analyzing these images can be a challenging task, especially when it comes to segmenting them into meaningful regions. Further complications arise when there is limited training data for model development. Traditional segmentation methods often struggle to capture the complex and varied features present in satellite images, leading to suboptimal results.

To overcome this challenge, a new approach has emerged: using multiple segmentation algorithms and combining their outputs through ensemble learning. This approach takes advantage of the strengths of different algorithms and compensates for their weaknesses, resulting in more accurate and robust segmentations. To further optimize this process, particle swarm optimization (PSO) can be applied to find the best combination of segmentation algorithms for a given dataset.

Satellite Image Segmentation

Satellite image segmentation is a crucial task in remote sensing applications that aims to extract meaningful information from satellite images. Several image segmentation algorithms have been proposed in the literature to address this problem, each with their own strengths and weaknesses. However, no single algorithm is optimal for all types of images or applications, and the choice of algorithm depends on various factors such as the size and complexity of the image, the desired level of detail in the segmentation, and the computational resources available.

One approach to improve the accuracy of satellite image segmentation is to use ensemble learning, which combines multiple segmentation algorithms to produce a more robust and accurate segmentation. Ensemble learning has been shown to be effective in various machine learning applications, including image segmentation. One common ensemble learning approach is to combine the outputs of different algorithms using simple averaging or voting techniques. However, these methods may not always lead to optimal results, as they do not take into account the specific strengths and weaknesses of each algorithm.

To overcome this limitation, optimization techniques such as particle swarm optimization (PSO) can be used to find the optimal combination of algorithms for a given image segmentation task. PSO is a metaheuristic optimization algorithm inspired by the social behavior of birds flocking or fish schooling. It has been successfully applied in various optimization problems, including feature selection and model selection in machine learning. By using PSO to optimize the combination of different segmentation algorithms, we can achieve a more accurate and robust segmentation of satellite images.

Capturing Various Size Features

When it comes to detecting small to larger features in satellite images, the choice of backbone architecture plays a crucial role in determining the accuracy of the segmentation model. VGG, Inception, EfficientNet, and ResNet are some of the commonly used backbone architectures in the UNET framework. VGG is a simple architecture with multiple layers of convolutional neural networks (CNNs) that have a small receptive field. However, VGG has a high number of parameters and requires significant computational resources. Inception, on the other hand, uses multiple filter sizes to capture different levels of features, making it more efficient in terms of parameter count. Inception is also known for its ability to reduce overfitting, making it a good choice for datasets with limited samples.

EfficientNet is a newer architecture that is known for its superior performance in image classification tasks. It uses a compound scaling method to balance the trade-off between model complexity and accuracy. EfficientNet achieves high accuracy with fewer parameters, making it computationally efficient. ResNet, another widely used architecture, is known for its ability to train very deep networks. ResNet uses skip connections to mitigate the vanishing gradient problem, which can occur when training very deep neural networks. This allows ResNet to be deeper than other architectures without sacrificing performance.

Particle Swarm Optimization

Particle swarm optimization (PSO) is a stochastic search methodology whereby n particles traverse a D-dimensional search space based on the social learning of living societies. Each n particle represents a solution to the optimization problem. The particles update their position in the search space based on their current inertia and on two learning aspects: cognitive and social learning. For the cognitive aspect, members of the group update their position according to how well they have performed in the past with respect to a predefined objective function. For the social aspect, members of the group update their position based on the group's best performance with respect to a predefined objective function. Mathematically, this can be expressed as:

Land/Water Segmentation

Figure 1 depicts the overall methodology that was used in this effort. First, the four UNET models were trained using the same set of images. Then, PSO was used to find the optimal weight to combine the four trained UNET models. Lastly, the optimally weighted UNET model was applied to the test image dataset.

Figure 1: Land/water boundary detection model training pipeline

There was a total of 2k images used in this effort. In the figure below, two examples are depicted of the satellite image with their respective corresponding masks where water is shown in yellow and land in magenta.

Figure 2: Example satellite image with corresponding mask

The four different backbones that were used in the UNET architecture were VGG19, InceptionV3, ResNet34, and EfficientNetb4. The images and masks were all resized to 512 x 512. Each UNET model used pretrained weights from the ImageNet image database.

Each UNET model is trained using the Adam optimizer with a batch size of 8, trained for 40 epochs, a sigmoid function for the activation, and a learning rate of 0.001. The loss function was a binary focal dice loss function. The metrics were an equally weighted linear combination of the IoU score and F1 score.

Two different examples are depicted in Figures 3 and 4. The overall performance of each model is able to capture the pixel-by-pixel land/water boundary as depicted in Figures 3 and 4. The backbone efficientnetb4 yields the most accurate model relative to the composite metric of the mean IoU and F1 score with the Vgg19 backbone yield the lowest accurate model. The results are summarized in the table below.

By taking the weighted linear combination of each model using the weights shown in the table above, the ensemble of all four models are able to achieve a mean IoU of 94% which is 1% better than the most accurate model.

Figure 3: Example of land/water boundary detection on the satellite image shown in the top left image

Figure 4: Example of land/water boundary detection on the satellite image shown in the top left image