Abstract
With the widespread application of deep learning, general object detectors have become increasingly popular in our daily lives. Extensive research, however, has shown that existing detectors are vulnerable to patch-based adversarial attacks, which fool such detectors by crafting adversarial patches. Although existing methods have made significant progress in terms of attack success rate, they still suffer from a highly perceptible problem, making it easy for humans to distinguish these evil examples. To address this issue, in this paper, we propose a novel spatial transform-based end-to-end patch attack method, called IPAttack, to synthesize imperceptible adversarial patches. Our approach estimates a flow field \(\varvec{f}\) to formulate adversarial examples rather than introduce small \(L_p\)-norm constrained external perturbations. Besides, to improve the imperceptibility and maintain a high attack performance, we propose the Object Detector Class Activation Map (OD-CAM) for objectors to extract the most interesting region, which will be applied to spatial transform to generate the final adversarial examples. Extensive experiments demonstrate that the proposed IPAttack can generate patch-wised adversarial examples with high imperceptibility while achieving the best attack performance compared to existing methods.







Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
The datasets used or analyzed during the current study are available from the link provided in this paper.
The datasets used in this paper is available online publicly.
References
Farfade SS, Saberian MJ, Li L-J (2015) Multi-view face detection using deep convolutional neural networks. In: ICMR pp 643–650. https://doi.org/10.1145/2671188.2749408
Sun Y, Liang D, Wang X, Tang X (2015) Deepid3: face recognition with very deep neural networks. arXiv:1502.00873
Bajpai S, Mishra G, Jain R, Jain DK, Saini D, Hussain A (2025) Ri-l1a pprox: a novel resnet-inception-based fast l1-approximation method for face recognition. Neurocomputing. https://doi.org/10.1016/J.NEUCOM.2024.128708
Gim T, Sohn K (2024) Regularization using noise samples identified by the feature norm for face recognition. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3453030
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR pp 770–778
Bochkovskiy A, Wang C-Y, Liao H-YM (2020) Yolov4: optimal speed and accuracy of object detection. arXiv:2004.10934
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. NeurIPS,
Zhang G, Chen J, Gao G, Li J, Liu S, Hu X (2024) Safdnet: a simple and effective network for fully sparse 3d object detection. In: CVPR pp 14477–14486. https://doi.org/10.1109/CVPR52733.2024.01372
Kim J, Cho H, Kim J, Tiruneh YY, Baek S (2024) SDDGR: stable diffusion-based deep generative replay for class incremental object detection. In: CVPR. https://doi.org/10.1109/CVPR52733.2024.02718
Zhou Z, Rahman Siddiquee MM, Tajbakhsh N, Liang J (2018) Unet++: a nested u-net architecture for medical image segmentation. In: MICCAI pp 3–11. https://doi.org/10.1007/978-3-030-00889-5_1
Nirkin,Y, Wolf L, Hassner T (2021) Hyperseg: patch-wise hypernetwork for real-time semantic segmentation. In: CVPR pp 4061–4070. https://doi.org/10.1109/CVPR46437.2021.00405
Fan X, Wang X, Gao J, Wang J, Luo Z, Liu R (2024) Bi-level learning of task-specific decoders for joint registration and one-shot medical image segmentation. In: CVPR. https://doi.org/10.1109/CVPR52733.2024.01114
Jiang S, Wu H, Chen J, Zhang Q, Qin J (2024) Ph-net: semi-supervised breast lesion segmentation via patch-wise hardness. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. https://doi.org/10.1109/CVPR52733.2024.01085
Goodfellow IJ, Shlens J, Szegedy C (2015) Explaining and harnessing adversarial examples. In: ICLR
Liu R, Zhang J, Li H, Zhang J, Wang Y, Zhou W (2023) AFLOW: developing adversarial examples under extremely noise-limited settings. In: ICICS, vol 14252, pp 502–518. https://doi.org/10.1007/978-981-99-7356-9_30
Li L, Guan H, Qiu J, Spratling MW (2024) One prompt word is enough to boost adversarial robustness for pre-trained vision-language models. In: IEEE/CVF conference on computer vision and pattern recognition, CVPR 2024, Seattle, WA, USA, June 16-22, 2024. https://doi.org/10.1109/CVPR52733.2024.02304
Brown TB, Mané D, Roy A, Abadi M, Gilmer J (2017) Adversarial patch. arXiv:1712.09665
Liu T, Yang C, Liu X, Han R, Ma J (2024) RPAU: fooling the eyes of uavs via physical adversarial patches. IEEE Trans Intell Transp Syst 2586–2598. https://doi.org/10.1109/TITS.2023.3317054
Guo F, Sun Z, Chen Y, Ju L (2023) Towards the universal defense for query-based audio adversarial attacks on speech recognition system. Cybersecurity 40. https://doi.org/10.1186/S42400-023-00177-6
Guo F, Sun Z, Chen Y, Ju L (2023) Towards the transferable audio adversarial attack via ensemble methods. Cybersecurity 44. https://doi.org/10.1186/S42400-023-00175-8
Chen T, Liu J, Xiang Y, Niu W, Tong E, Han Z (2019) Adversarial attack and defense in reinforcement learning-from AI security view. Cybersecurity 11. https://doi.org/10.1186/S42400-019-0027-X
Dong Y, Su H, Zhu J, Bao F (2017) Towards interpretable deep neural networks by leveraging adversarial examples. arXiv:1708.05493
Pang T, Du C, Dong Y, Zhu J (2018) Towards robust detection of adversarial examples. In: NeurIPS, pp 4584–4594
Wang J, Chang X, Wang Y, Rodríguez RJ, Zhang J (2021) LSGAN-AT: enhancing malware detector robustness against adversarial examples. Cybersecurity 38. https://doi.org/10.1186/S42400-021-00102-9
Garaev R, Rasheed B, Khan AM (2024) Not so robust after all: evaluating the robustness of deep neural networks to unseen adversarial attacks. Algorithms. https://doi.org/10.3390/A17040162
Mao J, Shi S, Wang X, Li H (2022) 3d object detection for autonomous driving: a review and new outlooks. arXiv:2206.09474
Xie C, Wang J, Zhang Z, Zhou Y, Xie L, Yuille AL (2017) Adversarial examples for semantic segmentation and object detection. In: ICCV, pp 1378–1387. https://doi.org/10.1109/ICCV.2017.153
Wei X, Liang S, Chen N, Cao X (2018) Transferable adversarial attacks for image and video object detection. arXiv:1811.12641
Liang S, Wei X, Cao X (2021) Generate more imperceptible adversarial examples for object detection. In: ICML
Liu X, Yang H, Liu Z, Song L, Chen Y, Li H (2019) DPATCH: an adversarial patch attack on object detectors. In: AAAI
Mirsky Y (2023) Ipatch: a remote adversarial patch. Cybersecurity 18
Wu S, Dai T, Xia S-T (2020) Dpattack: diffused patch attacks against universal object detection. arXiv:2010.11679
Huang H, Wang Y, Chen Z, Tang Z, Zhang W, Ma K (2021) Rpattack: refined patch attack on general object detectors. In: ICME, pp 1–6. https://doi.org/10.1109/ICME51207.2021.9428443
Xiao C, Zhu J, Li B, He W, Liu M, Song D (2018) Spatially transformed adversarial examples. In: ICLR
Liu R, Jin X, Hu D, Zhang J, Wang Y, Zhang J, Zhou W (2023) Dualflow: generating imperceptible adversarial examples by flow field and normalize flow-based model. Front Neurorobot. https://doi.org/10.3389/FNBOT.2023.1129720
Liu R, Zhou W, Wu S, Zhao J, Lam K-Y (2023) Ssta: salient spatially transformed attack. arXiv:2312.07258
Chattopadhay A, Sarkar A, Howlader P, Balasubramanian VN (2018) Grad-cam++: generalized gradient-based visual explanations for deep convolutional networks. In: WACV, pp 839–847. https://doi.org/10.1109/WACV.2018.00097
Selvaraju RR, Cogswell M, Das A, Vedantam R, Parikh D, Batra D (2017) Grad-cam: visual explanations from deep networks via gradient-based localization. In: ICCV, pp 618–626. https://doi.org/10.1007/S11263-019-01228-7
Zhou B, Khosla A, Lapedriza A, Oliva A, Torralba A (2016) Learning deep features for discriminative localization. In: CVPR. https://doi.org/10.1109/CVPR.2016.319
Neubeck A, Van Gool L (2006) Efficient non-maximum suppression. In: ICPR, pp 850–855
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, Berg AC (2016) Ssd: single shot multibox detector. In: ECCV, pp 21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587. https://doi.org/10.1109/CVPR.2014.81
Chindaudom A, Siritanawan P, Sumongkayothin K, Kotani K (2020) Adversarialqr: an adversarial patch in qr code format. In: ICIEV & icIVPR, pp 1–6
Kurakin A, Goodfellow IJ, Bengio S (2017) Adversarial examples in the physical world. In: ICLR
Zhao Y, Yan H, Wei X (2020) Object hider: adversarial patch attack against object detectors. arXiv:2010.14974
Jaderberg M, Simonyan K, Zisserman A, et al (2015) Spatial transformer networks. NeurIPS
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv:1412.6980
Lin T, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: common objects in context. In: ECCV, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48
Everingham M (2007) The pascal visual object classes challenge 2007 (voc2007) results. http://www.Pascal-network.org/challenges/VOC/voc2008/year=Workshop/index.Html
Everingham M, Van Gool L, Williams C, Winn J, Zisserman A (2012) The Pascal visual object classes challenge 2012 results, vol 5
Funding
This work is supported in part by the National Natural Science Foundation of China under Grant 62162067 and 62101480, Research and Application of Object detection based on Artificial Intelligence, in part by the Yunnan Province expert workstations under Grant202305AF150078, in part by Yunnan Fundamental Research Projects under Grant Nos. 202401AT070474.
Author information
Authors and Affiliations
Contributions
Yongming Wen: Methodology; Writing-Original draft. Peiyuan Si: Investigation; Software; Revise, Investigation. Wei Zhou: Validation; Funding acquisition. Zongheng Zhao: Data curation; Resources. Chao Yi: Supervision; Project administration. Renyang Liu: Formal analysis, Writing, Review, Revision.
Corresponding authors
Ethics declarations
Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Ethical and informed consent for data used
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wen, Y., Si, P., Zhou, W. et al. IPAttack: imperceptible adversarial patch to attack object detectors. Appl Intell 55, 462 (2025). https://doi.org/10.1007/s10489-025-06246-2
Accepted:
Published:
DOI: https://doi.org/10.1007/s10489-025-06246-2