Advancements in Greenhouse Spike Detection: Leveraging Deep Learning and Attention Mechanisms for Enhanced Phenotypic Trait Analysis

Mar 19, 2024

Accurate extraction of phenotypic traits from image data is essential for cereal crop research, but spike detection in greenhouses is challenging due to the environmental and physical similarities between spikes and leaves. Recent efforts include increasing image resolution and feature dimensionality, and developing neural networks such as SpikeSegNet to improve spike detection. However, these methods struggle to accurately localise small spikes,and further advances in neural network tuning and novel detection models are needed to efficiently overcome these spike detection challenges.

In January 2024, Plant Phenomics published a research article entitled by “High-throughput spike detection in greenhouse cultivated grain crops with attention mechanisms based deep learning models”.

In this study, three deep neural networks (DNNs) – FRCNN, FRCNN-A, and Swin Transformer were implemented and trained for spike detection in cereal crops. The networks were optimized using the SGD optimizer, with training times varying between the models; FRCNN required 900 to 1200 epochs, FRCNN-A 800 to 1000 epochs, and Swin Transformer 2500 to 3000 epochs. A dynamic learning rate strategy was used to optimize model convergence, demonstrating the effectiveness of the models in detecting spikes of varying difficulty, particularly within dense leaf mass.

The results showed that the Swin Transformer outperformed the other models in terms of accuracy without data transformation or augmentation. The FRCNN-A model, augmented with an attention module, showed significant improvement over the original FRCNN, highlighting the potential for further improvements in the FRCNN-A architecture. The ability of the attention module to capture the hierarchical context of regions of interest was particularly noted for its effectiveness in detecting challenging spike patterns.

Training on nine datasets from two phenotyping facilities showed that all models improved in accuracy as the original image content in the training sets increased. The Swin Transformer demonstrated the highest mean average precision (mAP) across different training sets, indicating its superior ability to extract features and detect spikes. However, the study also highlighted that while the Swin Transformer provides high accuracy, the FRCNN-A provides a more efficient and faster training alternative, especially beneficial for datasets with similar characteristics.

Click here to see more...