Skip to main content

Advertisement

Log in

Enhanced Self-Attention-Based Rapid CNN for Detecting Dense Objects in Varying Illumination

  • Research
  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

This paper addresses the challenge of efficient detection of densely arranged unordered items under varying illumination. Specifically, a novel convolutional neural network-based method is proposed for item vector detection, recognition, and classification, termed Self-Attention and Concatenation-Based Detector (ACDet). In a benchmark pharmaceutical case study, rapid and accurate detection of pharmaceutical package contours is achieved, enabling the automatic and fast verification of both the quantity and types of pharmaceuticals during distribution. At the input stage, a combined image augmentation method is applied to improve the detection model’s ability to learn the appearance features of items from multiple angles. Based on YOLOv8 model, integrating computational module C2F with Attention (C2F-A), multidimensional self-attention reinforcement is applied to the outputs of multiple gradient streams. The designed Weighted Concatenation (WConcat) module self-learns to weight and concatenate multi-level feature maps, enhancing the model’s cognitive capability. Finally, simulation experiments are conducted to determine the optimal timing for utilizing each module. Simulation experiments compared the proposed ACDet with several state-of-the-art YOLO architecture models utilizing the benchmark Comprehensive Pharmaceutical Package Dataset (CPPD). ACDet achieved 81.0% mAP and 79.5% Smooth mAP on the CPPD dataset, outperforming other models by an average of 5.5% to 16.6%. On public datasets, the results were 52.2% and 51.0%, respectively. The impact of utilizing C2F-A at different stages on performance was also tested, concluding that the WConcat module does not necessitate spatial attention. Finally, in zero-shot testing, the verification success rate reached 99.91%. Our work shows that the proposed ACDet can overcome many challenges in complex object detection scenarios, enhancing robustness while maintaining a lightweight design. The proposed model can serve as a new benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

No datasets were generated or analysed during the current study.

Abbreviations

ACDet :

Self-Attention and Concatenation Based Detector

C2F-A :

C2F with Attention

WConcat :

Weighted Concatenation

CPPD :

Comprehensive Pharmaceutical Package Dataset

CNNs :

Convolutional Neural Networks

VGG :

Visual Geometry Group

FPN :

Feature Pyramid Network

PAN :

Path Aggregation Network

BiFPN :

Bi-directional Feature Pyramid Network

R-CNN :

Region-based Convolutional Neural Network

SS :

The Selective Search

ROI :

Region of Interest

RPN :

Region Proposal Network

SSD :

Single Shot MultiBox Detector

FCOS :

Fully Convolutional One-Stage Object Detection

CPPD :

Comprehensive Pharmaceutical Package Dataset

OBB :

Oriented Bounding Boxes

HSV :

Hue, Saturation, Value

SOTA :

State of the Art

C2F :

Coarse-to-Fine

ELAN :

Efficient Layer Aggregation Networks

NMS :

Non-Maximum Suppression

En-CBSA :

Enhanced Convolutional Block Self-attention Mechanism

PAFPN :

Path Aggregation Feature Pyramid Network

IoU :

Intersection over Union

DFL :

Distribution Focal Loss

BCE :

Binary Cross-Entropy

References

  1. Marques CM, Moniz S, de Sousa JP, et al. Decision-support challenges in the chemical-pharmaceutical industry: Findings and future research directions[J]. Comput Chem Eng. 2020;134: 106672.

    Article  MATH  Google Scholar 

  2. Kumar G Pharmaceutical Drug Packaging and Traceability: A Comprehensive Review[J]. Universal Journal of Pharmacy and Pharmacology, 2023; 19–25.

  3. Duan R, Feng Y, Wen CY. Deep pose graph-matching-based loop closure detection for semantic visual SLAM[J]. Sustainability. 2022;14(19):11864.

    Article  MATH  Google Scholar 

  4. Chhabra M, Ravulakollu KK, Kumar M, et al. Improving automated latent fingerprint detection and segmentation using deep convolutional neural network[J]. Neural Comput Appl. 2023;35(9):6471–97.

    Article  MATH  Google Scholar 

  5. Kim S, Lee A, Ju H, et al. Transformer-based channel parameter acquisition for terahertz ultra-massive MIMO Systems[J]. IEEE Trans Veh Technol. 2023;72(11):15127–32.

    MATH  Google Scholar 

  6. Zhao Y, Zhao J, Jiang L, et al. Privacy-preserving blockchain-based federated learning for IoT devices[J]. IEEE Internet Things J. 2020;8(3):1817–29.

    Article  MATH  Google Scholar 

  7. Wu J, Kim S, Shim B. Energy-efficient power control and beamforming for reconfigurable intelligent surface-aided uplink IoT networks[J]. IEEE Trans Wireless Commun. 2022;21(12):10162–76.

    Article  MATH  Google Scholar 

  8. Kim S, Son J, Shim B. Energy-efficient ultra-dense network using LSTM-based deep neural networks[J]. IEEE Trans Wireless Commun. 2021;20(7):4702–15.

    Article  MATH  Google Scholar 

  9. Lin TY, Maire M, Belongie S, et al. Microsoft coco: Common objects in context[C]//Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. Springer International Publishing, 2014; 740-755

  10. Iandola FN, Han S, Moskewicz MW, et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size[J]. arxiv preprint arxiv:1602.07360, 2016

  11. Howard A G, Zhu M, Chen B, et al. Mobilenets: Efficient convolutional neural networks for mobile vision applications[J]. arxiv preprint arxiv:1704.04861, 2017.

  12. Howard A, Sandler M, Chu G, et al. Searching for mobilenetv3[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019; 1314–1324.

  13. Sandler M, Howard A, Zhu M, et al. Mobilenetv2: Inverted residuals and linear bottlenecks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018; 4510–4520.

  14. Ma N, Zhang X, Zheng H T, et al. Shufflenet v2: Practical guidelines for efficient cnn architecture design[C]//Proceedings of the European conference on computer vision (ECCV). 2018; 116–131.

  15. Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2018; 6848–6856,.

  16. Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

  17. He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770–778.

  18. Huang G, Liu Z, Van Der Maaten L, et al. Densely connected convolutional networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 4700–4708.

  19. Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117–2125.

  20. Liu S, Qi L, Qin H, et al. Path aggregation network for instance segmentation. Proc IEEE Conf Comput Vis Pattern Recognit. 2018;8759–68.

  21. Tan M, Pang R, Le Q V. Efficientdet: Scalable and efficient object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10781–10790.

  22. Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2014: 580–587.

  23. Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440–1448.

  24. Ren S, He K, Girshick R, et al. Faster r-cnn: Towards real-time object detection with region proposal networks[J]. Advances in neural information processing systems, 2015, 28.

  25. Yuan HS, Chen SB, Luo B, et al. Multi-branch bounding box regression for object detection[J]. Cogn Comput. 2023;15(4):1300–7.

    Article  MATH  Google Scholar 

  26. Dai J, Li Y, He K, et al. R-fcn: Object detection via region-based fully convolutional networks[J]. Advances in neural information processing systems, 2016, 29.

  27. Pang J, Chen K, Shi J, et al. Libra r-cnn: Towards balanced learning for object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 821–830.

  28. Yan D, Huang J, Sun H, et al. Few-shot object detection with weight imprinting[J]. Cogn Comput. 2023;15(5):1725–35.

    Article  MATH  Google Scholar 

  29. Redmon J, Divvala S, Girshick R, et al. You only look once: Unified, real-time object detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 779–788.

  30. Redmon J, Farhadi A. YOLO9000: better, faster, stronger[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 7263–7271.

  31. Redmon J, Farhadi A. Yolov3: An incremental improvement[J]. arxiv preprint arxiv:1804.02767, 2018.

  32. Liu W, Anguelov D, Erhan D, et al. Ssd: Single shot multibox detector[C]//Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14. Springer International Publishing, 2016; 21-37

  33. Lin TY, Goyal P, Girshick R, et al. Focal loss for dense object detection[C]//Proceedings of the IEEE international conference on computer vision. 2017; 2980–2988.

  34. Duan K, Bai S, **e L, et al. Centernet: Keypoint triplets for object detection[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2019; 6569–6578.

  35. Law H, Deng J. Cornernet: Detecting objects as paired keypoints[C]//Proceedings of the European conference on computer vision (ECCV). 2018; 734–750.

  36. Law H, Teng Y, Russakovsky O, et al. Cornernet-lite: Efficient keypoint based object detection[J]. arxiv preprint arxiv:1904.08900, 2019

  37. Tian Z, Chu X, Wang X, et al. Fully convolutional one-stage 3d object detection on lidar range images[J]. Adv Neural Inf Process Syst. 2022;35:34899–911.

    Google Scholar 

  38. GS, Bai X, Ding J, et al. DOTA: A large-scale dataset for object detection in aerial images[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 3974–3983.

  39. Bochkovskiy A, Wang CY, Liao HYM Yolov4: Optimal speed and accuracy of object detection[J]. arxiv preprint arxiv:2004.10934, 2020.

  40. Wang CY, Bochkovskiy A, Liao HYM. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2023; 7464–7475.

  41. Wang A, Chen H, Liu L, et al. Yolov10: Real-time end-to-end object detection[J]. arxiv preprint arxiv:2405.14458, 2024.

  42. Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 3–19.

  43. Li X, Lv C, Wang W, et al. Generalized focal loss: towards efficient representation learning for dense object detection. IEEE Trans Pattern Anal Mach Intell. 2022;45(3):3139–53.

    MATH  Google Scholar 

Download references

Acknowledgements

we extend our heartfelt thanks to Lizhu Chen and Weisi Xie for their invaluable suggestions and guidance throughout the preparation of this article.

Funding

The work was supported by the research on the construction and application of an intelligent outpatient medical record integration platform,the Sichuan Science and Technology Program(Granted No. 24SYSX0210); the Natural Science Foundation of Xinjiang Uygur Autonomous Region (Granted No. 2023D01A63); the research on smart medical system (Granted No. 211282) from the UESTC-Houngfuh Joint Laboratory of Smart Logistics.

Author information

Authors and Affiliations

Authors

Contributions

Writing—original draft preparation, C.L, T.J, J.W and L.Y; writing—review and editing, L.Y, M.H and F.S; supervision, H.W, and L.G. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Gun Li.

Ethics declarations

Ethical Approval

This article does not contain any studies with human or animal subjects performed by any of the authors.

Informed Consent

Informed consent was not required as no humans or animals were involved.

Conflict of Interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, L., Yang, L., Jie, T. et al. Enhanced Self-Attention-Based Rapid CNN for Detecting Dense Objects in Varying Illumination. Cogn Comput 17, 26 (2025). https://doi.org/10.1007/s12559-024-10376-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12559-024-10376-z

Keywords