I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification

Joshi, Aditya; Diwakar, Manoj

doi:10.1007/s12559-024-10389-8

I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification

Research
Published: 14 January 2025

Volume 17, article number 41, (2025)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Aditya Joshi¹ &
Manoj Diwakar^1,2

159 Accesses
Explore all metrics

Abstract

Information asymmetry situation amongst image and video features in image-to-video (I2V) person re-identification (Re-ID) refers to the difficulty in extracting consistent and dependable features from both image and video data in order to accurately match and identify a person in both modalities. The problem arises because images and videos have different characteristics and represent different aspects of a person. This difference in representation can result in inconsistent and unreliable features being extracted from image and video data, leading to difficulty in accurately matching and re-identifying a person between modalities. The temporal information provided by videos can also boost the accuracy of person re-identification, especially in crowded and cluttered environments. To address the information asymmetry problem, a generative adversarial cross-modal network–based I2V Person Re-ID (I2V-CMGAN) is proposed, which works by using a generator to transform the features learned from the video network into an image network with an additional loss function to improve the consistency and reliability of features extracted from both image and video data and also preserve identity information. Extensive studies show the efficacy of the proposed approach, and the aggregate results on the MARS dataset outperform the state-of-the-art methods by a substantial margin and achieved rank-1 accuracy of 88.9% (+ 2.9), rank-5 accuracy of 95.5% (+ 2.3), rank-10 accuracy of 97.1% (+ 2.9), and mean average precision of 81.2% (+ 1.1) for I2V Re-ID. On iLIDS-VID and PRID2011 datasets, the proposed method attains outstanding margins with rank-1, rank-5, rank-10, mAP of 64.7%, 89.3%, 92.7%, 74.2%, and 80.9%, 93.3%, 98.9%, and 86.8% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Unsupervised domain adaption for image-to-video person re-identification

Article 16 January 2020

READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification

Multi-information Constraint Learning for Unsupervised Domain Adaptive Person Re-identification

Article 26 May 2022

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

No datasets were generated or analysed during the current study.

References

Ming Z, Zhu M, Wang X, Zhu J, Cheng J, Gao C, Yang Y, Wei X. Deep learning-based person re-identification methods: a survey and outlook of recent works. Image Vis Comput. 2022;1(119):104394.
Article MATH Google Scholar
Ye M, Shen J, Lin G, Xiang T, Shao L, Hoi SC. Deep learning for person re-identification: a survey and outlook. IEEE Trans Pattern Anal Mach Intell. 2021;44(6):2872–93.
Article MATH Google Scholar
Ma H, Zhang C, Zhang Y, Li Z, Wang Z, Wei C. A review on video person re-identification based on deep learning. Neurocomputing. 2024;28:128479.
Article MATH Google Scholar
Liu M, Zhang Y, Li H. Survey of cross-modal person re-identification from a mathematical perspective. Mathematics. 2023;11(3):654.
Article MATH Google Scholar
Gu X, Ma B, Chang H, Shan S, Chen X. Temporal knowledge propagation for image-to-video person re-identification. InProceedings of the IEEE/CVF international conference on computer vision 2019 (pp. 9647–9656).
Shim M, Ho HI, Kim J, Wee D. Read: Reciprocal attention discriminator for image-to-video re-identification. InEuropean Conference on Computer Vision 2020 Aug 23 (pp. 335–350). Cham: Springer International Publishing.
Wu W, Liu J, Zheng K, Sun Q, Zha ZJ. Temporal complementarity-guided reinforcement learning for image-to-video person re-identification. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022 (pp. 7319–7328).
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y. Generative adversarial nets. Advances in neural information processing systems. 2014;27.
Deng W, Zheng L, Ye Q, Kang G, Yang Y, Jiao J. Image-image domain adaptation with preserved self-similarity and domain-dissimilarity for person re-identification. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 994–1003).
Wei L, Zhang S, Gao W, Tian Q. Person transfer GAN to bridge domain gap for person re-identification. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 79–88).
Zhu X, Jing XY, You X, Zuo W, Shan S, Zheng WS. Image to video person re-identification by learning heterogeneous dictionary pair with feature projection matrix. IEEE Trans Inf Forensics Secur. 2017;13(3):717–32.
Article MATH Google Scholar
Zhang D, Wu W, Cheng H, Zhang R, Dong Z, Cai Z. Image-to-video person re-identification with temporally memorized similarity learning. IEEE Trans Circuits Syst Video Technol. 2017;28(10):2622–32.
Article MATH Google Scholar
Wang G, Lai J, Xie X. P2snet: Can an image match a video for person re-identification in an end-to-end way? IEEE Trans Circuits Syst Video Technol. 2017;28(10):2777–87.
Article MATH Google Scholar
Shi W, Liu H, Liu M. Image-to-video person re-identification using three-dimensional semantic appearance alignment and cross-modal interactive learning. Pattern Recogn. 2022;1(122):108314.
Article MATH Google Scholar
Zhu X, Jing XY, Wu F, Wang Y, Zuo W, Zheng WS. Learning heterogeneous dictionary pair with feature projection matrix for pedestrian video retrieval via single query image. InProceedings of the AAAI Conference on Artificial Intelligence 2017:31(1).
Dong H, Lu P, Zhong S, Liu C, Ji Y, Gong S. Person re-identification by enhanced local maximal occurrence representation and generalized similarity metric learning. Neurocomputing. 2018;13(307):25–37.
Article MATH Google Scholar
Zhang Z, Lan C, Zeng W, Chen Z. Densely semantically aligned person re-identification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019 (pp. 667–676).
Liu CT, Wu CW, Wang YC, Chien SY. Spatially and temporally efficient non-local attention network for video-based person re-identification. arXiv preprint arXiv:1908.01683. 2019 Aug 5.
Hinton G, Vinyals O, Dean J. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. 2015 Mar 9.
Chen Y, Wang N, Zhang Z. Darkrank: accelerating deep metric learning via cross sample similarities transfer. InProceedings of the AAAI conference on artificial intelligence 2018 Apr 29 32(1).
Chen G, Lu J, Yang M, Zhou J. Learning recurrent 3D attention for video-based person re-identification. IEEE Trans Image Process. 2020;25(29):6963–76.
Article MATH Google Scholar
Li D, Chen Q. Deep reinforced attention learning for quality-aware visual recognition. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XVI 16 2020 (pp. 493–509). Springer International Publishing.
He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition 2016 (pp. 770–778).
Pan H, Chen Y, He Z. Multi-granularity graph pooling for video-based person re-identification. Neural Netw. 2023;1(160):22–33.
Article MATH Google Scholar
Porrello A, Bergamini L, Calderara S. Robust re-identification by multiple views knowledge distillation. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part X 16 2020 (pp. 93–110). Springer International Publishing.
Mirza M, Osindero S. Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784. 2014 Nov 6.
Ma L, Jia X, Sun Q, Schiele B, Tuytelaars T, Van Gool L. Pose guided person image generation. Advances in neural information processing systems. 2017;30.
Yan Y, Xu J, Ni B, Zhang W, Yang X. Skeleton-aided articulated motion generation. InProceedings of the 25th ACM international conference on Multimedia 2017 Oct 19 (pp. 199–207).
Isola P, Zhu JY, Zhou T, Efros AA. Image-to-image translation with conditional adversarial networks. InProceedings of the IEEE conference on computer vision and pattern recognition 2017 (pp. 1125–1134).
Odena A, Olah C, Shlens J. Conditional image synthesis with auxiliary classifier GANs. InInternational conference on machine learning 2017 Jul 17 (pp. 2642–2651). PMLR.
Donahue J, Krähenbühl P, Darrell T. Adversarial feature learning. arXiv preprint arXiv:1605.09782. 2016 May 31.
Dumoulin V, Belghazi I, Poole B, Mastropietro O, Lamb A, Arjovsky M, Courville A. Adversarially learned inference. arXiv preprint arXiv:1606.00704. 2016 Jun 2.
Larsen AB, Sønderby SK, Larochelle H, Winther O. Autoencoding beyond pixels using a learned similarity metric. InInternational conference on machine learning 2016 Jun 11 (pp. 1558–1566). PMLR.
Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. InProceedings of the IEEE international conference on computer vision 2017 (pp. 2223–2232).
Li C, Xu T, Zhu J, Zhang B. Triple generative adversarial nets. Advances in neural information processing systems. 2017;30.
Zheng Z, Zheng L, Yang Y. Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. InProceedings of the IEEE international conference on computer vision 2017 (pp. 3754–3762).
Radford A, Metz L, Chintala S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. 2015 Nov 19.
Liu J, Ni B, Yan Y, Zhou P, Cheng S, Hu J. Pose transferrable person re-identification. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 (pp. 4099–4108).
Kniaz VV, Knyaz VA, Moshkantsev PV. Multimodal person re-identification in aerial imagery based on conditional adversarial networks. Int Arch Photogramm Remote Sens Spat Inf Sci. 2023;12(48):121–8.
Article Google Scholar
Khatun A, Denman S, Sridharan S, Fookes C. Semantic consistency and identity mapping multi-component generative adversarial network for person re-identification. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2020 (pp. 2267–2276).
Ge Y, Li Z, Zhao H, Yin G, Yi S, Wang X. Fd-GAN: pose-guided feature distilling GAN for robust person re-identification. Advances in neural information processing systems. 2018;31.
Zhao Z, Song R, Zhang Q, Duan P, Zhang Y. A framework for jointly training GAN with person re-identification model. InPattern Recognition. ICPR International Workshops and Challenges: Virtual Event, January 10–15, 2021, Proceedings, Part IV 2021 (pp. 36–51). Springer International Publishing.
Zhang X, Li S, Jing XY, Ma F, Zhu C. Unsupervised domain adaption for image-to-video person re-identification. Multimed Tools Appl. 2020;79:33793–810.
Article Google Scholar
Zheng Z, Yang X, Yu Z, Zheng L, Yang Y, Kautz J. Joint discriminative and generative learning for person re-identification. Inproceedings of the IEEE/CVF conference on computer vision and pattern recognition 2019 (pp. 2138–2147).
Chen H, Wang Y, Lagadec B, Dantcheva A, Bremond F. Joint generative and contrastive learning for unsupervised person re-identification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition 2021 (pp. 2004–2013).
He Y, Chen L, Pan H. Pose‐guided adversarial video prediction for image‐to‐video person re‐identification. IET Image Processing. 2023 Dec 1.
Sharath S, Rangaraju HG, Leelavathi G. Person re-identification based on enhanced position-regularization generative adversarial network (EPR-GAN) using GLCM, radon transform, and DCT. Int J Intell Syst Appl Eng. 2023;11(9s):80–93.
Google Scholar
Pan H, Pei W, Li X, He Z. Unified conditional image generation for visible-infrared person re-identification. IEEE Transactions on Information Forensics and Security. 2024 Jul 10.
Khaldi K, Nguyen VD, Mantini P, Shah S. Unsupervised person re-identification in aerial imagery. In2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW) 2024 Jan 1 (pp. 260–269). IEEE.
Zheng L, Bie Z, Sun Y, Wang J, Su C, Wang S, Tian Q. Mars: A video benchmark for large-scale person re-identification. InComputer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VI 14 2016 (pp. 868–884). Springer International Publishing.
Hirzer M, Beleznai C, Roth PM, Bischof H. Person re-identification by descriptive and discriminative classification. InImage Analysis: 17th Scandinavian Conference, SCIA 2011, Ystad, Sweden, May 2011. Proceedings 17 2011 (pp. 91-102). Springer Berlin Heidelberg
Wang T, Gong S, Zhu X, Wang S. Person re-identification by video ranking. InComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part IV 13 2014 (pp. 688–703). Springer International Publishing.
Zhou K, Xiang T. Torchreid: A library for deep learning person re-identification in PyTorch. arXiv preprint arXiv:1910.10093. 2019 Oct 22.
Wang X, Girshick R, Gupta A, He K. Non-local neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition 2018 pp. 7794–7803.

Download references

Acknowledgements

The authors are grateful to UCOST, Dept. of Information and Science Technology, Govt. of Uttarakhand for providing financial support to complete this work under the approved project entitled “Person and Vehicle Re-identification: An Artificial Intelligence-based Surveillance System for Smart Cities in Uttarakhand” (Grant No: UCS&T/R&D-47/23-24/25309).

Funding

The authors received financial support from the UCOST, Dept. of Information and Science Technology, Govt. of Uttarakhand, to complete this work under the approved project entitled “Person and Vehicle Re-identification: An Artificial Intelligence-based Surveillance System for Smart Cities in Uttarakhand” (Grant No. UCS&T/R&D-47/23–24/25309).

Author information

Authors and Affiliations

CSE Department, Graphic Era Deemed to Be University, Dehradun, Uttarakhand, India
Aditya Joshi & Manoj Diwakar
Graphic Era Hill University, Dehradun, Uttarakhand, India
Manoj Diwakar

Authors

Aditya Joshi
View author publications
You can also search for this author inPubMed Google Scholar
Manoj Diwakar
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

A.J.: Writing Original Draft. M.D.: writing, review and editing.

Corresponding author

Correspondence to Manoj Diwakar.

Ethics declarations

Consent for Publication

The authors affirm that human research participants/potential human faces provided in the figures are taken from the dataset cited as [50] [51] [52]. These datasets are publicly accessible and permit usage for academic and scientific purposes.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Joshi, A., Diwakar, M. I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification. Cogn Comput 17, 41 (2025). https://doi.org/10.1007/s12559-024-10389-8

Download citation

Received: 29 January 2024
Accepted: 01 December 2024
Published: 14 January 2025
DOI: https://doi.org/10.1007/s12559-024-10389-8

Keywords

Part of a collection:

Generative AI for Cognitive Computation

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

I2V-CMGAN: Generative Adversarial Cross-Modal Network-Based Image-to-Video Person Re-identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised domain adaption for image-to-video person re-identification

READ: Reciprocal Attention Discriminator for Image-to-Video Re-identification

Multi-information Constraint Learning for Unsupervised Domain Adaptive Person Re-identification

Explore related subjects

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Consent for Publication

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now