Abstract
The identification of minuscule objects in remote sensing data presents a formidable challenge in computer vision, where objects may occupy a mere handful of pixels. The lack of unique shape features in such small objects hinders the effectiveness of established object detection algorithms. Remote sensing of small object detection plays an important role in areas such as environmental monitoring and estimating agricultural production. To address this challenge, in this study, we introduce YOLO-SS, an enhanced version of the YOLO algorithm tailored specifically for small object detection in remote sensing imagery. YOLO-SS incorporates an optimized backbone network, a restructured loss function and an asymmetric training sample weighting strategy. These improvements prioritize the model’s attention toward high-quality positive samples of small objects while reducing sensitivity to complex backgrounds. Evaluation on the AI-TOD dataset demonstrates YOLO-SS’s exceptional performance, achieving an AP50 score of 0.535, surpassing YOLOv6L by 13.4% and other popular object detection algorithms. Our findings offer a novel pathway for advancing small object detection capabilities in diverse remote sensing applications.





Similar content being viewed by others
Data availability
The AI-TOD dataset is publicly available. The AI-TOD dataset is ethical, and the paper is not conflict of interest. The link to the dataset is https://github.com/jwwangchn/AI-TOD. The datasets used and analyzed during the current study are available from the corresponding author or first author on reasonable request.
References
Deng Z, Sun H, Zhou S, Zhao J, Lei L, Zou H (2018) Multi-scale object detection in remote sensing imagery with convolutional neural networks. ISPRS J Photogramm Remote Sens 145:3–22
Zhang W, Wang S, Thachan S, Chen J, Qian Y (2018) Deconv r-cnn for small object detection on remote sensing images. In: IGARSS 2018-2018 IEEE International Geoscience and Remote Sensing Symposium, pp. 2483–2486 . IEEE
Yu J, Jiang Y, Wang Z, Cao Z, Huang T (2016) Unitbox: an advanced object detection network. In: proceedings of the 24th ACM International Conference on Multimedia, pp. 516–520
Ma W, Wu Y, Cen F, Wang G (2020) Mdfn: multi-scale deep feature learning network for object detection. Pattern Recognit 100:107149
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: Unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788
Redmon J, Farhadi A (2017) Yolo9000: better, faster, stronger. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7263–7271
Redmon J, Farhadi A (2018) Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767
Bochkovskiy, A, Wang, C-Y, Liao, H-YM.: Yolov4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Mahendrakar T, White RT, Wilde M, Kish B, Silver I (2021) Real-time satellite component recognition with yolo-v5. In: Small Satellite Conference
Li C, Li L, Jiang H, Weng K, Geng Y, Li L, Ke Z, Li Q, Cheng M, Nie W, et al. (2022) Yolov6: a single-stage object detection framework for industrial applications. arXiv preprint arXiv:2209.02976
Wang C-Y, Bochkovskiy A, Liao H-YM (2023) Yolov7: trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7464–7475
Liu B, Wang M, Foroosh H, Tappen M, Pensky M (2015) Sparse convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 806–814
Lin T-Y, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL. (2014) Microsoft coco: Common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer
Chen C, Liu M-Y, Tuzel O, Xiao J (2017) R-cnn for small object detection. In: Computer Vision–ACCV 2016: 13th Asian Conference on Computer Vision, Taipei, Taiwan, November 20-24, 2016, Revised Selected Papers, Part V 13, pp. 214–230. Springer
Zhang H, Li M, Miao D, Pedrycz W, Wang Z, Jiang M (2023) Construction of a feature enhancement network for small object detection. Pattern Recognit 143:109801
Graham S, Epstein D, Rajpoot N (2019) Rota-net: rotation equivariant network for simultaneous gland and lumen segmentation in colon histology images. In: Digital Pathology: 15th European Congress, ECDP 2019, Warwick, UK, April 10–13, 2019, Proceedings 15, pp. 109–116. Springer
Kisantal M, Wojna Z, Murawski J, Naruniec J, Cho K (2019) Augmentation for small object detection. arXiv preprint arXiv:1902.07296
Kim J-H, Hwang Y (2022) Gan-based synthetic data augmentation for infrared small target detection. IEEE Trans Geosci Remote Sens 60:1–12
Wang K, Liew JH, Zou Y, Zhou D, Feng J (2019) Panet: Few-shot image semantic segmentation with prototype alignment. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9197–9206
Chen J, Mai H, Luo L, Chen X, Wu K (2021) Effective feature fusion network in bifpn for small object detection. In: 2021 IEEE International Conference on Image Processing (ICIP), pp. 699–703. IEEE
Qiao S, Chen L-C, Yuille A (2021) Detectors: detecting objects with recursive feature pyramid and switchable atrous convolution. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10213–10224
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6054–6063
Fan D, Liu D, Chi W, Liu X, Li Y (2020) Improved ssd-based multi-scale pedestrian detection algorithm. In: advances in 3D Image and Graphics Representation, Analysis, Computing and Information Technology: algorithms and Applications, Proceedings of IC3DIT 2019, Volume 2, pp. 109–118. Springer
Singh B. Davis LS (2018) An analysis of scale invariance in object detection snip. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3578–3587
Singh B. Najibi M. Davis LS (2018) Sniper: efficient multi-scale training. Advances in neural information processing systems 31
Li Y, Chen Y, Wang N, Zhang Z (2019) Scale-aware trident networks for object detection. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6054–6063
Bai X, Bi Y (2018) Derivative entropy-based contrast measure for infrared small-target detection. IEEE Trans Geosci Remote Sens 56(4):2452–2466
Huang S, Liu Y, He Y, Zhang T, Peng Z (2019) Structure-adaptive clutter suppression for infrared small target detection: chain-growth filtering. Remote Sens 12(1):47
Liang X, Zhang J, Zhuo L, Li Y, Tian Q (2019) Small object detection in unmanned aerial vehicle images using feature fusion and scaling-based single shot detector with spatial context analysis. IEEE Trans Circuits Syst Video Technol 30(6):1758–1770
Lu X, Ji J, Xing Z, Miao Q (2021) Attention and feature fusion ssd for remote sensing object detection. IEEE Trans Instrument Measure 70:1–9
Chen F, Gao C, Liu F, Zhao Y, Zhou Y, Meng D, Zuo W (2022) Local patch network with global attention for infrared small target detection. IEEE Trans Aerospace Electron Syst 58(5):3979–3991
Hong M, Li S, Yang Y, Zhu F, Zhao Q, Lu L (2021) Sspnet: scale selection pyramid network for tiny person detection from uav images. IEEE Geosci Remote Sens Lett 19:1–5
Yang T-Y, Chen Y-T, Lin Y-Y, Chuang Y-Y (2019) Fsa-net: Learning fine-grained structure aggregation for head pose estimation from a single image. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1087–1096
Du L, Wu W, Li C (2024) Super-resolution-assisted feature refined extraction for small objects in remote sensing images. In: International Conference on Multimedia Modeling, pp. 296–309. Springer
Wu J, Xu S (2021) From point to region: accurate and efficient hierarchical small object detection in low-resolution remote sensing images. Remote Sens 13(13):2620
Xu C, Wang J, Yang W, Yu H, Yu L, Xia G-S (2022) Rfla: Gaussian receptive field based label assignment for tiny object detection. In: European Conference on Computer Vision, pp. 526–543. Springer
Zheng Z, Wang P, Liu W, Li J, Ye R, Ren D (2020) Distance-iou loss: faster and better learning for bounding box regression. In: proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, pp. 12993–13000
Zhang H, Wang Y, Dayoub F, Sunderhauf N (2021) Varifocalnet: an iou-aware dense object detector. In: proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8514–8523
Wang J. Xu C. Yang W. Yu L (2021) A normalized gaussian wasserstein distance for tiny object detection. arXiv preprint arXiv:2110.13389
Wang J. Yang W. Guo H. Zhang R. Xia G-S (2021) Tiny object detection in aerial images. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 3791–3798. IEEE
Chen X, Liang C, Huang D, Real E, Wang K, Pham H, Dong X, Luong T, Hsieh C-J, Lu Y, et al. (2024) Symbolic discovery of optimization algorithms. Advances in Neural Information Processing Systems 36
Liu H-I, Tseng Y-W, Chang K-C, Wang P-J, Shuai H-H, Cheng W-H (2024) A denoising fpn with transformer r-cnn for tiny object detection. IEEE Transactions on Geoscience and Remote Sensing
Sunkara R, Luo T (2022) No more strided convolutions or pooling: A new cnn building block for low-resolution images and small objects. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 443–459. Springer
Jeong J, Park H, Kwak N (2017) Enhancement of ssd by concatenating feature maps for object detection. arXiv preprint arXiv:1705.09587
Yang Z, Liu S, Hu H, Wang L, Lin S (2019) Reppoints: Point set representation for object detection. In: proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9657–9666
Tian Z, Shen C, Chen H, He, T (2019) Fcos: Fully convolutional one-stage object detection. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9626–9635. 10.1109/ICCV.2019.00972
Zhou X, Wang D, Krähenbühl P (2019) Objects as points. arXiv preprint arXiv:1904.07850
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2980–2988
Zhang S, Chi C, Yao Y, Lei Z, Li SZ (2020) dging the gap between anchor-based and anchor-free detection via adaptive training sample selection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9759–9768
Rezatofighi H, Tsoi N, Gwak J, Sadeghian A, Reid I, Savarese S (2019) Generalized intersection over union: A metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666
Cai Z, Vasconcelos N (018) Cascade r-cnn: Delving into high quality object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6154–6162
Qiao S, Chen L, Yuille A (2020) b16: Detecting objects with recursive feature pyramid and switchable atrous convolution. CoRR
Wang C, Yeh I, Liao H (2018) You only learn one representation: Unified network for multiple tasks. arXiv preprint arXiv:2105.04206
Author information
Authors and Affiliations
Contributions
Qiang Tang assisted with methodology, algorithm, writing and editing, Chang Shang helped with algorithm and writing, Kai Yang, Yuan Tian and Shibin Zhao carried out ablation study, and Wei Hao, Xubin Feng and Meilin Xie were responsible for financial support, laboratory equipment and experiment guidance.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tang, Q., Su, C., Tian, Y. et al. YOLO-SS: optimizing YOLO for enhanced small object detection in remote sensing imagery. J Supercomput 81, 303 (2025). https://doi.org/10.1007/s11227-024-06765-8
Accepted:
Published:
DOI: https://doi.org/10.1007/s11227-024-06765-8