Abstract
This paper addresses the challenge of efficient detection of densely arranged unordered items under varying illumination. Specifically, a novel convolutional neural network-based method is proposed for item vector detection, recognition, and classification, termed Self-Attention and Concatenation-Based Detector (ACDet). In a benchmark pharmaceutical case study, rapid and accurate detection of pharmaceutical package contours is achieved, enabling the automatic and fast verification of both the quantity and types of pharmaceuticals during distribution. At the input stage, a combined image augmentation method is applied to improve the detection model’s ability to learn the appearance features of items from multiple angles. Based on YOLOv8 model, integrating computational module C2F with Attention (C2F-A), multidimensional self-attention reinforcement is applied to the outputs of multiple gradient streams. The designed Weighted Concatenation (WConcat) module self-learns to weight and concatenate multi-level feature maps, enhancing the model’s cognitive capability. Finally, simulation experiments are conducted to determine the optimal timing for utilizing each module. Simulation experiments compared the proposed ACDet with several state-of-the-art YOLO architecture models utilizing the benchmark Comprehensive Pharmaceutical Package Dataset (CPPD). ACDet achieved 81.0% mAP and 79.5% Smooth mAP on the CPPD dataset, outperforming other models by an average of 5.5% to 16.6%. On public datasets, the results were 52.2% and 51.0%, respectively. The impact of utilizing C2F-A at different stages on performance was also tested, concluding that the WConcat module does not necessitate spatial attention. Finally, in zero-shot testing, the verification success rate reached 99.91%. Our work shows that the proposed ACDet can overcome many challenges in complex object detection scenarios, enhancing robustness while maintaining a lightweight design. The proposed model can serve as a new benchmark.
















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
No datasets were generated or analysed during the current study.
Abbreviations
- ACDet :
-
Self-Attention and Concatenation Based Detector
- C2F-A :
-
C2F with Attention
- WConcat :
-
Weighted Concatenation
- CPPD :
-
Comprehensive Pharmaceutical Package Dataset
- CNNs :
-
Convolutional Neural Networks
- VGG :
-
Visual Geometry Group
- FPN :
-
Feature Pyramid Network
- PAN :
-
Path Aggregation Network
- BiFPN :
-
Bi-directional Feature Pyramid Network
- R-CNN :
-
Region-based Convolutional Neural Network
- SS :
-
The Selective Search
- ROI :
-
Region of Interest
- RPN :
-
Region Proposal Network
- SSD :
-
Single Shot MultiBox Detector
- FCOS :
-
Fully Convolutional One-Stage Object Detection
- CPPD :
-
Comprehensive Pharmaceutical Package Dataset
- OBB :
-
Oriented Bounding Boxes
- HSV :
-
Hue, Saturation, Value
- SOTA :
-
State of the Art
- C2F :
-
Coarse-to-Fine
- ELAN :
-
Efficient Layer Aggregation Networks
- NMS :
-
Non-Maximum Suppression
- En-CBSA :
-
Enhanced Convolutional Block Self-attention Mechanism
- PAFPN :
-
Path Aggregation Feature Pyramid Network
- IoU :
-
Intersection over Union
- DFL :
-
Distribution Focal Loss
- BCE :
-
Binary Cross-Entropy
References
Marques CM, Moniz S, de Sousa JP, et al. Decision-support challenges in the chemical-pharmaceutical industry: Findings and future research directions[J]. Comput Chem Eng. 2020;134: 106672.
Kumar G Pharmaceutical Drug Packaging and Traceability: A Comprehensive Review[J]. Universal Journal of Pharmacy and Pharmacology, 2023; 19–25.
Duan R, Feng Y, Wen CY. Deep pose graph-matching-based loop closure detection for semantic visual SLAM[J]. Sustainability. 2022;14(19):11864.
Chhabra M, Ravulakollu KK, Kumar M, et al. Improving automated latent fingerprint detection and segmentation using deep convolutional neural network[J]. Neural Comput Appl. 2023;35(9):6471–97.
Kim S, Lee A, Ju H, et al. Transformer-based channel parameter acquisition for terahertz ultra-massive MIMO Systems[J]. IEEE Trans Veh Technol. 2023;72(11):15127–32.
Zhao Y, Zhao J, Jiang L, et al. Privacy-preserving blockchain-based federated learning for IoT devices[J]. IEEE Internet Things J. 2020;8(3):1817–29.
Wu J, Kim S, Shim B. Energy-efficient power control and beamforming for reconfigurable intelligent surface-aided uplink IoT networks[J]. IEEE Trans Wireless Commun. 2022;21(12):10162–76.
Kim S, Son J, Shim B. Energy-efficient ultra-dense network using LSTM-based deep neural networks[J]. IEEE Trans Wireless Commun. 2021;20(7):4702–15.
Lin TY, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014; 740-755
Iandola FN, Han S, Moskewicz MW, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arxiv preprint arxiv:1602.07360, 2016
Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arxiv preprint arxiv:1704.04861, 2017.
Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019; 1314–1324.
Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018; 4510–4520.
Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018; 116–131.
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018; 6848–6856,.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770–778.
Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700–4708.
Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117–2125.
Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation. Proc IEEE Conf Comput Vis Pattern Recognit. 2018;8759–68.
Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10781–10790.
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.
Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.
Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.
Yuan HS, Chen SB, Luo B, et al. Multi-branch bounding box regression for object detection[J]. Cogn Comput. 2023;15(4):1300–7.
Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29.
Pang J, Chen K, Shi J, et al. Libra r-cnn: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 821–830.
Yan D, Huang J, Sun H, et al. Few-shot object detection with weight imprinting[J]. Cogn Comput. 2023;15(5):1725–35.
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263–7271.
Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arxiv preprint arxiv:1804.02767, 2018.
Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016; 21-37
Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017; 2980–2988.
Duan K, Bai S, **e L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019; 6569–6578.
Law H, Deng J. Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European conference on computer vision (ECCV). 2018; 734–750.
Law H, Teng Y, Russakovsky O, et al. Cornernet-lite: Efficient keypoint based object detection[J]. arxiv preprint arxiv:1904.08900, 2019
Tian Z, Chu X, Wang X, et al. Fully convolutional one-stage 3d object detection on lidar range images[J]. Adv Neural Inf Process Syst. 2022;35:34899–911.
GS, Bai X, Ding J, et al. DOTA: A large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 3974–3983.
Bochkovskiy A, Wang CY, Liao HYM Yolov4: Optimal speed and accuracy of object detection[J]. arxiv preprint arxiv:2004.10934, 2020.
Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023; 7464–7475.
Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. arxiv preprint arxiv:2405.14458, 2024.
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.
Li X, Lv C, Wang W, et al. Generalized focal loss: towards efficient representation learning for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2022;45(3):3139–53.
Acknowledgements
we extend our heartfelt thanks to Lizhu Chen and Weisi Xie for their invaluable suggestions and guidance throughout the preparation of this article.
Funding
The work was supported by the research on the construction and application of an intelligent outpatient medical record integration platform,the Sichuan Science and Technology Program(Granted No. 24SYSX0210); the Natural Science Foundation of Xinjiang Uygur Autonomous Region (Granted No. 2023D01A63); the research on smart medical system (Granted No. 211282) from the UESTC-Houngfuh Joint Laboratory of Smart Logistics.
Author information
Authors and Affiliations
Contributions
Writing—original draft preparation, C.L, T.J, J.W and L.Y; writing—review and editing, L.Y, M.H and F.S; supervision, H.W, and L.G. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Ethical Approval
This article does not contain any studies with human or animal subjects performed by any of the authors.
Informed Consent
Informed consent was not required as no humans or animals were involved.
Conflict of Interest
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, L., Yang, L., Jie, T. et al. Enhanced Self-Attention-Based Rapid CNN for Detecting Dense Objects in Varying Illumination. Cogn Comput 17, 26 (2025). https://doi.org/10.1007/s12559-024-10376-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12559-024-10376-z