Multimodal contrastive learning for radiology report generation

Wu, Xing; Li, Jingwen; Wang, Jianjia; Qian, Quan

doi:10.1007/s12652-022-04398-4

Multimodal contrastive learning for radiology report generation

Original Research
Published: 10 September 2022

Volume 14, pages 11185–11194, (2023)
Cite this article

Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Xing Wu ORCID: orcid.org/0000-0001-5331-022X^1,2,
Jingwen Li¹,
Jianjia Wang^1,2 &
…
Quan Qian^1,2

1043 Accesses
9 Citations
Explore all metrics

Abstract

Automated radiology report generation can not only lighten the workload of clinicians but also improve the efficiency of disease diagnosis. However, it is a challenging task to generate semantically coherent radiology reports that are also highly consistent with medical images. To meet the challenge, we propose a Multimodal Recursive model with Contrastive Learning (MRCL). The proposed MRCL method incorporates both visual and semantic features to generate “Impression” and “Findings” of radiology reports through a recursive network, in which a contrastive pre-training method is proposed to improve the expressiveness of both visual and textual representations. Extensive experiments and analyses prove the efficacy of the proposed MRCL, which can not only generate semantically coherent radiology reports but also outperform state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Self-guided Framework for Radiology Report Generation

Denoising Multi-Level Cross-Attention and Contrastive Learning for Chest Radiology Report Generation

Article 31 January 2025

Prior tissue knowledge-driven contrastive learning for brain CT report generation

Article 27 March 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

The datasets analysed during the current study are available on reasonable request from https://openi.nlm.nih.gov/, and https://physionet.org/content/mimic-cxr/2.0.0/.

References

Alfarghaly O, Khaled R, Elkorany A et al (2021) Automated radiology report generation using conditioned transformers. Inform Med Unlock 24(100):557
Google Scholar
Anderson P, He X, Buehler C, et al (2018) Bottom–up and top–down attention for image captioning and visual question answering. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6077–6086
Banerjee S, Lavie A (2005) Meteor: An automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization@ACL 2005, pp 65–72
Chen Z, Song Y, Chang TH, et al (2020) Generating radiology reports via memory-driven transformer. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, pp 1439–1449
Demner-Fushman D, Kohli MD, Rosenman MB et al (2016) Preparing a collection of radiology examinations for distribution and retrieval. J Am Med Inform Assoc 23(2):304–310
Article Google Scholar
Devlin J, Chang M, Lee K, et al (2019) BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 4171–4186
Donahue J, Hendricks LA, Guadarrama S, et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition, pp 2625–2634
Gao T, Yao X, Chen D (2021) SimCSE: Simple contrastive learning of sentence embeddings. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp 6894–6910
Jing B, Xie P, Xing E (2018) On the automatic generation of medical imaging reports. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, pp 2577–2586
Johnson AE, Pollard TJ, Greenbaum NR, et al (2019) Mimic-cxr-jpg, a large publicly available database of labeled chest radiographs. Preprint at https://doi.org/10.48550/arXiv.1901.07042
Krause J, Johnson J, Krishna R, et al (2017) A hierarchical approach for generating descriptive image paragraphs. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3337–3345
Li CY, Liang X, Hu Z, et al (2018) Hybrid retrieval-generation reinforced agent for medical image report generation. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, p 1537–1547
Lin CY (2004) Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out, Post-Conference Workshop of ACL 2004, Barcelona, Spain, pp 74–81
Liu F, Ge S, Wu X (2021) Competence-based multimodal curriculum learning for medical report generation. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp 3001–3012
Lu J, Xiong C, Parikh D, et al (2017) Knowing when to look: Adaptive attention via a visual sentinel for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3242–3250
Papineni K, Roukos S, Ward T, et al (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting of the Association for Computational Linguistics, p 311-318
Rennie SJ, Marcheret E, Mroueh Y, et al (2017) Self-critical sequence training for image captioning. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1179–1195
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: consensus-based image description evaluation. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4566–4575
Vinyals O, Toshev A, Bengio S, et al (2015) Show and tell: a neural image caption generator. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 3156–3164
Xue Y, Xu T, Rodney Long L et al (2018) Multimodal recurrent model with attention for automated radiology report generation. Med Image Comput Comput Assist Intervent MICCAI 2018:457–466. https://doi.org/10.1007/978-3-030-00928-1_52
Article Google Scholar
Yan Y, Li R, Wang S, et al (2021) ConSERT: A contrastive framework for self-supervised sentence representation transfer. In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, pp 5065–5075. https://doi.org/10.18653/v1/2021.acl-long.393
Zhang Y, Jiang H, Miura Y, et al (2020a) Contrastive learning of medical visual representations from paired images and text. Preprint at https://doi.org/10.48550/arXiv.2010.00747
Zhang Y, Wang X, Xu Z, et al (2020b) When radiology report generation meets knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 12910–12917. https://doi.org/10.1609/aaai.v34i07.6989

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (Grant No. 62172267), the National Key R &D Program of China (Grant No. 2019YFE0190500), the Natural Science Foundation of Shanghai, China (Grant No. 20ZR1420400), the State Key Program of National Natural Science Foundation of China (Grant No. 61936001), the Shanghai Pujiang Program (Grant No. 21PJ1404200), the Key Research Project of Zhejiang Laboratory (No. 2021PE0AC02).

Author information

Authors and Affiliations

School of Computer Engineering and Science, Shanghai University, Shanghai, 200444, China
Xing Wu, Jingwen Li, Jianjia Wang & Quan Qian
Shanghai Institute for Advanced Communication and Data Science, Shanghai University, Shanghai, 200444, China
Xing Wu, Jianjia Wang & Quan Qian

Authors

Xing Wu
View author publications
You can also search for this author inPubMed Google Scholar
Jingwen Li
View author publications
You can also search for this author inPubMed Google Scholar
Jianjia Wang
View author publications
You can also search for this author inPubMed Google Scholar
Quan Qian
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Xing Wu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wu, X., Li, J., Wang, J. et al. Multimodal contrastive learning for radiology report generation. J Ambient Intell Human Comput 14, 11185–11194 (2023). https://doi.org/10.1007/s12652-022-04398-4

Download citation

Received: 25 April 2022
Accepted: 01 September 2022
Published: 10 September 2022
Issue Date: August 2023
DOI: https://doi.org/10.1007/s12652-022-04398-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multimodal contrastive learning for radiology report generation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Self-guided Framework for Radiology Report Generation

Denoising Multi-Level Cross-Attention and Contrastive Learning for Chest Radiology Report Generation

Prior tissue knowledge-driven contrastive learning for brain CT report generation

Explore related subjects

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now