Abstract
Information asymmetry situation amongst image and video features in image-to-video (I2V) person re-identification (Re-ID) refers to the difficulty in extracting consistent and dependable features from both image and video data in order to accurately match and identify a person in both modalities. The problem arises because images and videos have different characteristics and represent different aspects of a person. This difference in representation can result in inconsistent and unreliable features being extracted from image and video data, leading to difficulty in accurately matching and re-identifying a person between modalities. The temporal information provided by videos can also boost the accuracy of person re-identification, especially in crowded and cluttered environments. To address the information asymmetry problem, a generative adversarial cross-modal network–based I2V Person Re-ID (I2V-CMGAN) is proposed, which works by using a generator to transform the features learned from the video network into an image network with an additional loss function to improve the consistency and reliability of features extracted from both image and video data and also preserve identity information. Extensive studies show the efficacy of the proposed approach, and the aggregate results on the MARS dataset outperform the state-of-the-art methods by a substantial margin and achieved rank-1 accuracy of 88.9% (+ 2.9), rank-5 accuracy of 95.5% (+ 2.3), rank-10 accuracy of 97.1% (+ 2.9), and mean average precision of 81.2% (+ 1.1) for I2V Re-ID. On iLIDS-VID and PRID2011 datasets, the proposed method attains outstanding margins with rank-1, rank-5, rank-10, mAP of 64.7%, 89.3%, 92.7%, 74.2%, and 80.9%, 93.3%, 98.9%, and 86.8% respectively.















Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data Availability
No datasets were generated or analysed during the current study.
References
Ming Z, Zhu M, Wang X, Zhu J, Cheng J, Gao C, Yang Y, Wei X. Deep learning-based person re-identification methods: a survey and outlook of recent works. Image Vis Comput. 2022;1(119):104394.
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC. Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell. 2021;44(6):2872–93.
Ma H, Zhang C, Zhang Y, Li Z, Wang Z, Wei C. A review on video person re-identification based on deep learning. Neurocomputing. 2024;28:128479.
Liu M, Zhang Y, Li H. Survey of cross-modal person re-identification from a mathematical perspective. Mathematics. 2023;11(3):654.
Gu X, Ma B, Chang H, Shan S, Chen X. Temporal knowledge propagation for image-to-video person re-identification. InProceedings of the IEEE/CVF international conference on computer vision 2019 (pp. 9647–9656).
Shim M, Ho HI, Kim J, Wee D. Read: Reciprocal attention discriminator for image-to-video re-identification. InEuropean Conference on Computer Vision 2020 Aug 23 (pp. 335–350). Cham: Springer International Publishing.
Wu W, Liu J, Zheng K, Sun Q, Zha ZJ. Temporal complementarity-guided reinforcement learning for image-to-video person re-identification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022 (pp. 7319–7328).
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Advances in neural information processing systems. 2014;27.
Deng W, Zheng L, Ye Q, Kang G, Yang Y, Jiao J. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 994–1003).
Wei L, Zhang S, Gao W, Tian Q. Person transfer GAN to bridge domain gap for person re-identification. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 79–88).
Zhu X, Jing XY, You X, Zuo W, Shan S, Zheng WS. Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. IEEE Trans Inf Forensics Secur. 2017;13(3):717–32.
Zhang D, Wu W, Cheng H, Zhang R, Dong Z, Cai Z. Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circuits Syst Video Technol. 2017;28(10):2622–32.
Wang G, Lai J, Xie X. P2snet: Can an image match a video for person re-identification in an end-to-end way? IEEE Trans Circuits Syst Video Technol. 2017;28(10):2777–87.
Shi W, Liu H, Liu M. Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning. Pattern Recogn. 2022;1(122):108314.
Zhu X, Jing XY, Wu F, Wang Y, Zuo W, Zheng WS. Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. InProceedings of the AAAI Conference on Artificial Intelligence 2017:31(1).
Dong H, Lu P, Zhong S, Liu C, Ji Y, Gong S. Person re-identification by enhanced local maximal occurrence representation and generalized similarity metric learning. Neurocomputing. 2018;13(307):25–37.
Zhang Z, Lan C, Zeng W, Chen Z. Densely semantically aligned person re-identification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019 (pp. 667–676).
Liu CT, Wu CW, Wang YC, Chien SY. Spatially and temporally efficient non-local attention network for video-based person re-identification. arXiv preprint arXiv:1908.01683. 2019 Aug 5.
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. 2015 Mar 9.
Chen Y, Wang N, Zhang Z. Darkrank: accelerating deep metric learning via cross sample similarities transfer. InProceedings of the AAAI conference on artificial intelligence 2018 Apr 29 32(1).
Chen G, Lu J, Yang M, Zhou J. Learning recurrent 3D attention for video-based person re-identification. IEEE Trans Image Process. 2020;25(29):6963–76.
Li D, Chen Q. Deep reinforced attention learning for quality-aware visual recognition. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16 2020 (pp. 493–509). Springer International Publishing.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770–778).
Pan H, Chen Y, He Z. Multi-granularity graph pooling for video-based person re-identification. Neural Netw. 2023;1(160):22–33.
Porrello A, Bergamini L, Calderara S. Robust re-identification by multiple views knowledge distillation. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16 2020 (pp. 93–110). Springer International Publishing.
Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. 2014 Nov 6.
Ma L, Jia X, Sun Q, Schiele B, Tuytelaars T, Van Gool L. Pose guided person image generation. Advances in neural information processing systems. 2017;30.
Yan Y, Xu J, Ni B, Zhang W, Yang X. Skeleton-aided articulated motion generation. InProceedings of the 25th ACM international conference on Multimedia 2017 Oct 19 (pp. 199–207).
Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 1125–1134).
Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier GANs. InInternational conference on machine learning 2017 Jul 17 (pp. 2642–2651). PMLR.
Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. arXiv preprint arXiv:1605.09782. 2016 May 31.
Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M, Courville A. Adversarially learned inference. arXiv preprint arXiv:1606.00704. 2016 Jun 2.
Larsen AB, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond pixels using a learned similarity metric. InInternational conference on machine learning 2016 Jun 11 (pp. 1558–1566). PMLR.
Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE international conference on computer vision 2017 (pp. 2223–2232).
Li C, Xu T, Zhu J, Zhang B. Triple generative adversarial nets. Advances in neural information processing systems. 2017;30.
Zheng Z, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. InProceedings of the IEEE international conference on computer vision 2017 (pp. 3754–3762).
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. 2015 Nov 19.
Liu J, Ni B, Yan Y, Zhou P, Cheng S, Hu J. Pose transferrable person re-identification. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 4099–4108).
Kniaz VV, Knyaz VA, Moshkantsev PV. Multimodal person re-identification in aerial imagery based on conditional adversarial networks. Int Arch Photogramm Remote Sens Spat Inf Sci. 2023;12(48):121–8.
Khatun A, Denman S, Sridharan S, Fookes C. Semantic consistency and identity mapping multi-component generative adversarial network for person re-identification. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2020 (pp. 2267–2276).
Ge Y, Li Z, Zhao H, Yin G, Yi S, Wang X. Fd-GAN: pose-guided feature distilling GAN for robust person re-identification. Advances in neural information processing systems. 2018;31.
Zhao Z, Song R, Zhang Q, Duan P, Zhang Y. A framework for jointly training GAN with person re-identification model. InPattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part IV 2021 (pp. 36–51). Springer International Publishing.
Zhang X, Li S, Jing XY, Ma F, Zhu C. Unsupervised domain adaption for image-to-video person re-identification. Multimed Tools Appl. 2020;79:33793–810.
Zheng Z, Yang X, Yu Z, Zheng L, Yang Y, Kautz J. Joint discriminative and generative learning for person re-identification. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019 (pp. 2138–2147).
Chen H, Wang Y, Lagadec B, Dantcheva A, Bremond F. Joint generative and contrastive learning for unsupervised person re-identification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021 (pp. 2004–2013).
He Y, Chen L, Pan H. Pose‐guided adversarial video prediction for image‐to‐video person re‐identification. IET Image Processing. 2023 Dec 1.
Sharath S, Rangaraju HG, Leelavathi G. Person re-identification based on enhanced position-regularization generative adversarial network (EPR-GAN) using GLCM, radon transform, and DCT. Int J Intell Syst Appl Eng. 2023;11(9s):80–93.
Pan H, Pei W, Li X, He Z. Unified conditional image generation for visible-infrared person re-identification. IEEE Transactions on Information Forensics and Security. 2024 Jul 10.
Khaldi K, Nguyen VD, Mantini P, Shah S. Unsupervised person re-identification in aerial imagery. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) 2024 Jan 1 (pp. 260–269). IEEE.
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q. Mars: A video benchmark for large-scale person re-identification. InComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VI 14 2016 (pp. 868–884). Springer International Publishing.
Hirzer M, Beleznai C, Roth PM, Bischof H. Person re-identification by descriptive and discriminative classification. InImage Analysis: 17th Scandinavian Conference, SCIA 2011, Ystad, Sweden, May 2011. Proceedings 17 2011 (pp. 91-102). Springer Berlin Heidelberg
Wang T, Gong S, Zhu X, Wang S. Person re-identification by video ranking. InComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13 2014 (pp. 688–703). Springer International Publishing.
Zhou K, Xiang T. Torchreid: A library for deep learning person re-identification in PyTorch. arXiv preprint arXiv:1910.10093. 2019 Oct 22.
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 pp. 7794–7803.
Acknowledgements
The authors are grateful to UCOST, Dept. of Information and Science Technology, Govt. of Uttarakhand for providing financial support to complete this work under the approved project entitled “Person and Vehicle Re-identification: An Artificial Intelligence-based Surveillance System for Smart Cities in Uttarakhand” (Grant No: UCS&T/R&D-47/23-24/25309).
Funding
The authors received financial support from the UCOST, Dept. of Information and Science Technology, Govt. of Uttarakhand, to complete this work under the approved project entitled “Person and Vehicle Re-identification: An Artificial Intelligence-based Surveillance System for Smart Cities in Uttarakhand” (Grant No. UCS&T/R&D-47/23–24/25309).
Author information
Authors and Affiliations
Contributions
A.J.: Writing Original Draft. M.D.: writing, review and editing.
Corresponding author
Ethics declarations
Consent for Publication
The authors affirm that human research participants/potential human faces provided in the figures are taken from the dataset cited as [50] [51] [52]. These datasets are publicly accessible and permit usage for academic and scientific purposes.
Competing interests
The authors declare no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Joshi, A., Diwakar, M. I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification. Cogn Comput 17, 41 (2025). https://doi.org/10.1007/s12559-024-10389-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12559-024-10389-8