Enhanced Target Detection: Fusion of SPD and CoTC3 Within YOLOv5 Framework

J Zhang, S Li, T Long - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org
J Zhang, S Li, T Long
IEEE Transactions on Geoscience and Remote Sensing, 2024ieeexplore.ieee.org
High-definition remote sensing image recognition is of great significance and can be widely
used in urban planning, land use, and other fields. Existing models focus excessively on
texture and fine-grained features of the image during training, ignoring broader contextual
information in the feature map, which reduces the ability of the model to recognize the target.
In order to improve the detection accuracy and robustness, we introduce an enhanced
version of the YOLOv5 framework specifically tailored for analyzing remote sensing imagery …
High-definition remote sensing image recognition is of great significance and can be widely used in urban planning, land use, and other fields. Existing models focus excessively on texture and fine-grained features of the image during training, ignoring broader contextual information in the feature map, which reduces the ability of the model to recognize the target. In order to improve the detection accuracy and robustness, we introduce an enhanced version of the YOLOv5 framework specifically tailored for analyzing remote sensing imagery characterized by diminutive targets and reduced resolution. The model’s perceptual capacity is augmented through the inclusion of the space-to-depth (SPD) element, alongside the adoption of null and depth-wise separable convolutions to better capture target attributes. Furthermore, we introduce the contextual transformer-concentrated comprehensive convolution (CoTC3) module, seamlessly integrated into the YOLOv5’s core architecture. This innovative module enables the model to capitalize on rich contextual information between neighboring keys, leading to more optimal feature representations and subsequently enhancing detection accuracy. The model undergoes refinements, particularly in the loss function, to bolster its efficacy further. Evaluated across three distinct datasets encompassing 34 categories of targets and a collective count of 382221 objects, the revised model exhibits notable enhancements in accuracy and robustness in target detection. Notably, the accuracy in identifying vehicles and bridges improves by 10.1% and 11.3%, respectively, with the overall accuracy rates ascending to 93.5% (an increment of 2.3%), 88.1% (an increment of 2.9%), and 71.2% (an increment of 6.3%) in comparison to the baseline model.
ieeexplore.ieee.org
Showing the best result for this search. See all results