Enhanced Self-Attention-Based Rapid CNN for Detecting Dense Objects in Varying Illumination

Chen, Lu; Yang, Li; Jie, Tan; Haoyuan, Ma; Yu, Liu; Shenbing, Fu; Wang, Junkang; Wu, Hao; Li, Gun

doi:10.1007/s12559-024-10376-z

Enhanced Self-Attention-Based Rapid CNN for Detecting Dense Objects in Varying Illumination

Research
Published: 09 December 2024

Volume 17, article number 26, (2025)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Lu Chen¹,
Li Yang¹,
Tan Jie¹,
Ma Haoyuan¹,
Liu Yu¹,
Fu Shenbing¹,
Junkang Wang²,
Hao Wu^1,3 &
…
Gun Li¹

172 Accesses
Explore all metrics

Abstract

This paper addresses the challenge of efficient detection of densely arranged unordered items under varying illumination. Specifically, a novel convolutional neural network-based method is proposed for item vector detection, recognition, and classification, termed Self-Attention and Concatenation-Based Detector (ACDet). In a benchmark pharmaceutical case study, rapid and accurate detection of pharmaceutical package contours is achieved, enabling the automatic and fast verification of both the quantity and types of pharmaceuticals during distribution. At the input stage, a combined image augmentation method is applied to improve the detection model’s ability to learn the appearance features of items from multiple angles. Based on YOLOv8 model, integrating computational module C2F with Attention (C2F-A), multidimensional self-attention reinforcement is applied to the outputs of multiple gradient streams. The designed Weighted Concatenation (WConcat) module self-learns to weight and concatenate multi-level feature maps, enhancing the model’s cognitive capability. Finally, simulation experiments are conducted to determine the optimal timing for utilizing each module. Simulation experiments compared the proposed ACDet with several state-of-the-art YOLO architecture models utilizing the benchmark Comprehensive Pharmaceutical Package Dataset (CPPD). ACDet achieved 81.0% mAP and 79.5% Smooth mAP on the CPPD dataset, outperforming other models by an average of 5.5% to 16.6%. On public datasets, the results were 52.2% and 51.0%, respectively. The impact of utilizing C2F-A at different stages on performance was also tested, concluding that the WConcat module does not necessitate spatial attention. Finally, in zero-shot testing, the verification success rate reached 99.91%. Our work shows that the proposed ACDet can overcome many challenges in complex object detection scenarios, enhancing robustness while maintaining a lightweight design. The proposed model can serve as a new benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AParC-DETR: Accelerate DETR training by introducing Adaptive Position-aware Circular Convolution

Article 07 May 2024

MobileViT Based Lightweight Model for Prohibited Item Detection in X-Ray Images

MAN and CAT: mix attention to nn and concatenate attention to YOLO

Article 06 August 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

No datasets were generated or analysed during the current study.

Abbreviations

ACDet :: Self-Attention and Concatenation Based Detector
C2F-A :: C2F with Attention
WConcat :: Weighted Concatenation
CPPD :: Comprehensive Pharmaceutical Package Dataset
CNNs :: Convolutional Neural Networks
VGG :: Visual Geometry Group
FPN :: Feature Pyramid Network
PAN :: Path Aggregation Network
BiFPN :: Bi-directional Feature Pyramid Network
R-CNN :: Region-based Convolutional Neural Network
SS :: The Selective Search
ROI :: Region of Interest
RPN :: Region Proposal Network
SSD :: Single Shot MultiBox Detector
FCOS :: Fully Convolutional One-Stage Object Detection
CPPD :: Comprehensive Pharmaceutical Package Dataset
OBB :: Oriented Bounding Boxes
HSV :: Hue, Saturation, Value
SOTA :: State of the Art
C2F :: Coarse-to-Fine
ELAN :: Efficient Layer Aggregation Networks
NMS :: Non-Maximum Suppression
En-CBSA :: Enhanced Convolutional Block Self-attention Mechanism
PAFPN :: Path Aggregation Feature Pyramid Network
IoU :: Intersection over Union
DFL :: Distribution Focal Loss
BCE :: Binary Cross-Entropy

References

Marques CM, Moniz S, de Sousa JP, et al. Decision-support challenges in the chemical-pharmaceutical industry: Findings and future research directions[J]. Comput Chem Eng. 2020;134: 106672.
Article MATH Google Scholar
Kumar G Pharmaceutical Drug Packaging and Traceability: A Comprehensive Review[J]. Universal Journal of Pharmacy and Pharmacology, 2023; 19–25.
Duan R, Feng Y, Wen CY. Deep pose graph-matching-based loop closure detection for semantic visual SLAM[J]. Sustainability. 2022;14(19):11864.
Article MATH Google Scholar
Chhabra M, Ravulakollu KK, Kumar M, et al. Improving automated latent fingerprint detection and segmentation using deep convolutional neural network[J]. Neural Comput Appl. 2023;35(9):6471–97.
Article MATH Google Scholar
Kim S, Lee A, Ju H, et al. Transformer-based channel parameter acquisition for terahertz ultra-massive MIMO Systems[J]. IEEE Trans Veh Technol. 2023;72(11):15127–32.
MATH Google Scholar
Zhao Y, Zhao J, Jiang L, et al. Privacy-preserving blockchain-based federated learning for IoT devices[J]. IEEE Internet Things J. 2020;8(3):1817–29.
Article MATH Google Scholar
Wu J, Kim S, Shim B. Energy-efficient power control and beamforming for reconfigurable intelligent surface-aided uplink IoT networks[J]. IEEE Trans Wireless Commun. 2022;21(12):10162–76.
Article MATH Google Scholar
Kim S, Son J, Shim B. Energy-efficient ultra-dense network using LSTM-based deep neural networks[J]. IEEE Trans Wireless Commun. 2021;20(7):4702–15.
Article MATH Google Scholar
Lin TY, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014; 740-755
Iandola FN, Han S, Moskewicz MW, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arxiv preprint arxiv:1602.07360, 2016
Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arxiv preprint arxiv:1704.04861, 2017.
Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019; 1314–1324.
Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018; 4510–4520.
Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018; 116–131.
Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018; 6848–6856,.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770–778.
Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700–4708.
Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117–2125.
Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation. Proc IEEE Conf Comput Vis Pattern Recognit. 2018;8759–68.
Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10781–10790.
Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.
Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.
Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.
Yuan HS, Chen SB, Luo B, et al. Multi-branch bounding box regression for object detection[J]. Cogn Comput. 2023;15(4):1300–7.
Article MATH Google Scholar
Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29.
Pang J, Chen K, Shi J, et al. Libra r-cnn: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 821–830.
Yan D, Huang J, Sun H, et al. Few-shot object detection with weight imprinting[J]. Cogn Comput. 2023;15(5):1725–35.
Article MATH Google Scholar
Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.
Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263–7271.
Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arxiv preprint arxiv:1804.02767, 2018.
Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016; 21-37
Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017; 2980–2988.
Duan K, Bai S, **e L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019; 6569–6578.
Law H, Deng J. Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European conference on computer vision (ECCV). 2018; 734–750.
Law H, Teng Y, Russakovsky O, et al. Cornernet-lite: Efficient keypoint based object detection[J]. arxiv preprint arxiv:1904.08900, 2019
Tian Z, Chu X, Wang X, et al. Fully convolutional one-stage 3d object detection on lidar range images[J]. Adv Neural Inf Process Syst. 2022;35:34899–911.
Google Scholar
GS, Bai X, Ding J, et al. DOTA: A large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 3974–3983.
Bochkovskiy A, Wang CY, Liao HYM Yolov4: Optimal speed and accuracy of object detection[J]. arxiv preprint arxiv:2004.10934, 2020.
Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023; 7464–7475.
Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. arxiv preprint arxiv:2405.14458, 2024.
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.
Li X, Lv C, Wang W, et al. Generalized focal loss: towards efficient representation learning for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2022;45(3):3139–53.
MATH Google Scholar

Download references

Acknowledgements

we extend our heartfelt thanks to Lizhu Chen and Weisi Xie for their invaluable suggestions and guidance throughout the preparation of this article.

Funding

The work was supported by the research on the construction and application of an intelligent outpatient medical record integration platform,the Sichuan Science and Technology Program(Granted No. 24SYSX0210); the Natural Science Foundation of Xinjiang Uygur Autonomous Region (Granted No. 2023D01A63); the research on smart medical system (Granted No. 211282) from the UESTC-Houngfuh Joint Laboratory of Smart Logistics.

Author information

Authors and Affiliations

University of Electronic Science and Technology of China, Chengdu, 611731, China
Lu Chen, Li Yang, Tan Jie, Ma Haoyuan, Liu Yu, Fu Shenbing, Hao Wu & Gun Li
Sichuan Academy of Medical Sciences and , Sichuan Provincial People’s Hospital, Chengdu, 610072, China
Junkang Wang
Institute of Electronics and Information Industry Technology of Kash, Kash, 844000, China
Hao Wu

Authors

Lu Chen
View author publications
You can also search for this author inPubMed Google Scholar
Li Yang
View author publications
You can also search for this author inPubMed Google Scholar
Tan Jie
View author publications
You can also search for this author inPubMed Google Scholar
Ma Haoyuan
View author publications
You can also search for this author inPubMed Google Scholar
Liu Yu
View author publications
You can also search for this author inPubMed Google Scholar
Fu Shenbing
View author publications
You can also search for this author inPubMed Google Scholar
Junkang Wang
View author publications
You can also search for this author inPubMed Google Scholar
Hao Wu
View author publications
You can also search for this author inPubMed Google Scholar
Gun Li
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

Writing—original draft preparation, C.L, T.J, J.W and L.Y; writing—review and editing, L.Y, M.H and F.S; supervision, H.W, and L.G. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Gun Li.

Ethics declarations

Ethical Approval

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed Consent

Informed consent was not required as no humans or animals were involved.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, L., Yang, L., Jie, T. et al. Enhanced Self-Attention-Based Rapid CNN for Detecting Dense Objects in Varying Illumination. Cogn Comput 17, 26 (2025). https://doi.org/10.1007/s12559-024-10376-z

Download citation

Received: 01 August 2024
Accepted: 11 September 2024
Published: 09 December 2024
DOI: https://doi.org/10.1007/s12559-024-10376-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhanced Self-Attention-Based Rapid CNN for Detecting Dense Objects in Varying Illumination

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

AParC-DETR: Accelerate DETR training by introducing Adaptive Position-aware Circular Convolution

MobileViT Based Lightweight Model for Prohibited Item Detection in X-Ray Images

MAN and CAT: mix attention to nn and concatenate attention to YOLO

Explore related subjects

Data Availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Informed Consent

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now