GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph

Liu, Shaonan; Chen, Wenting; Liu, Jie; Luo, Xiaoling; Shen, Linlin

doi:10.1007/978-3-031-72378-0_49

Shaonan Liu¹⁴,
Wenting Chen¹⁵,
Jie Liu¹⁵,
Xiaoling Luo^14,17 &
…
Linlin Shen^14,16,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15001))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

2702 Accesses

Abstract

Gaze estimation is pivotal in human scene comprehension tasks, particularly in medical diagnostic analysis. Eye-tracking technology facilitates the recording of physicians’ ocular movements during image interpretation, thereby elucidating their visual attention patterns and information-processing strategies. In this paper, we initially define the context-aware gaze estimation problem in medical radiology report settings. To understand the attention allocation and cognitive behavior of radiologists during the medical image interpretation process, we propose a context-aware Gaze EstiMation (GEM) network that utilizes eye gaze data collected from radiologists to simulate their visual search behavior patterns throughout the image interpretation process. It consists of a context-awareness module, visual behavior graph construction, and visual behavior matching. Within the context-awareness module, we achieve intricate multimodal registration by establishing connections between medical reports and images. Subsequently, for a more accurate simulation of genuine visual search behavior patterns, we introduce a visual behavior graph structure, capturing such behavior through high-order relationships (edges) between gaze points (nodes). To maintain the authenticity of visual behavior, we devise a visual behavior-matching approach, adjusting the high-order relationships between them by matching the graph constructed from real and estimated gaze points. Extensive experiments on four publicly available datasets demonstrate the superiority of GEM over existing methods and its strong generalizability, which also provides a new direction for the effective utilization of diverse modalities in medical image interpretation and enhances the interpretability of models in the field of medical imaging. https://github.com/Tiger-SN/GEM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

AI-Based Extraction of Radiologists Gaze Patterns Corresponding to Lung Regions

Changes in Radiologists’ Gaze Patterns Against Lung X-rays with Different Abnormalities: a Randomized Experiment

Article 09 January 2023

Enhancing Human-Computer Interaction in Chest X-Ray Analysis Using Vision and Language Model with Eye Gaze Patterns

References

Aresta, G., et al.: Automatic lung nodule detection combined with gaze information improves radiologists’ screening performance. IEEE J. Biomed. Health Inform. 24(10), 2894–2901 (2020)
Article Google Scholar
Boecking, B., et al.: Making the most of text semantics to improve biomedical vision-language processing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13696, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_1
Chapter Google Scholar
Brunyé, T.T., Nallamothu, B.K., Elmore, J.G.: Eye-tracking for assessing medical image interpretation: a pilot feasibility study comparing novice vs expert cardiologists. Perspect. Med. Educ. 8, 65–73 (2019)
Article Google Scholar
Chen, W., Li, X., Shen, L., Yuan, Y.: Fine-grained image-text alignment in medical imaging enables cyclic image-report generation. arXiv preprint arXiv:2312.08078 (2023)
Chen, W., et al.: Medical image synthesis via fine-grained image-text alignment and anatomy-pathology prompting. arXiv preprint arXiv:2403.06835 (2024)
Chong, E., Ruiz, N., Wang, Y., Zhang, Y., Rozga, A., Rehg, J.M.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 397–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_24
Chapter Google Scholar
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)
Article Google Scholar
Deng, J., Yang, Z., Chen, T., Zhou, W., Li, H.: TransVG: end-to-end visual grounding with transformers. In: ICCV, pp. 1769–1779 (2021)
Google Scholar
Fu, K., Liu, S., Luo, X., Wang, M.: Robust point cloud registration framework based on deep graph matching. In: CVPR, pp. 8893–8902 (2021)
Google Scholar
Van der Gijp, A., et al.: How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. Adv. Health Sci. Educ. Theory Pract. 22, 765–787 (2017)
Article Google Scholar
Henderson, J.M.: Human gaze control during real-world scene perception. Trends Cogn. Sci. 7(11), 498–504 (2003)
Article Google Scholar
Hsieh, C., Ouyang, C., Nascimento, J.C., Pereira, J., Jorge, J., Moreira, C.: Mimic-eye: integrating mimic datasets with reflacx and eye gaze for multimodal deep learning applications (2023)
Google Scholar
Ikeda, A., et al.: Objective evaluation of gaze location patterns using eye tracking during cystoscopy and artificial intelligence-assisted lesion detection. J. Endourol. 38, 865–870 (2024)
Article Google Scholar
Lian, D., Yu, Z., Gao, S.: Believe it or not, we know what you are looking at! In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 35–50. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_3
Chapter Google Scholar
Liu, J., Guo, X., Yuan, Y.: Graph-based surgical instrument adaptive segmentation via domain-common knowledge. IEEE Trans. Med. Imag. 41(3), 715–726 (2021)
Article Google Scholar
Liu, J., et al.: Clip-driven universal model for organ segmentation and tumor detection. In: ICCV, pp. 21152–21164 (2023)
Google Scholar
Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. In: NeurIPS, vol. 31 (2018)
Google Scholar
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)
Google Scholar
Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876–879 (1964)
Article MathSciNet Google Scholar
Soda, P., et al.: AiforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study. Med. Image Anal. 74, 102216 (2021)
Google Scholar
Tonini, F., Dall’Asen, N., Beyan, C., Ricci, E.: Object-aware gaze target detection. In: ICCV, pp. 21860–21869 (2023)
Google Scholar
Tu, D., Min, X., Duan, H., Guo, G., Zhai, G., Shen, W.: End-to-end human-gaze-target detection with transformers. In: CVPR, pp. 2192–2200. IEEE (2022)
Google Scholar
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)
Waite, S., et al.: Analysis of perceptual expertise in radiology-current knowledge and a new perspective. Front. Hum. Neurosci. 13, 213 (2019)
Article Google Scholar
Wang, B., Hu, T., Li, B., Chen, X., Zhang, Z.: GaTector: a unified framework for gaze object prediction. In: CVPR, pp. 19588–19597 (2022)
Google Scholar
Wang, S., Ouyang, X., Liu, T., Wang, Q., Shen, D.: Follow my eye: using gaze to supervise computer-aided diagnosis. IEEE Trans. Med. Imag. 41(7), 1688–1698 (2022)
Article Google Scholar
Wenting, C., Jie, L., Yixuan, Y.: Bi-VLGM: Bi-level class-severity-aware vision-language graph matching for text guided medical image segmentation. arXiv preprint arXiv:2305.12231 (2023)
Yang, X., et al.: TCEIP: text condition embedded regression network for dental implant position prediction. In: Greenspan, H., et al. (eds.) MICCAI 2023. LNCS, vol. 14225, pp. 317–326. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43987-2_31
Chapter Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2012)
Article Google Scholar
Zhang, X., Wang, W., Chen, Z., Xu, Y., Zhang, J., Tao, D.: CLAMP: prompt-based contrastive learning for connecting language and animal pose. In: CVPR, pp. 23272–23281 (2023)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Natural Science Foundation of China under Grant 82261138629 and 12326610; Guangdong Basic and Applied Basic Research Foundation under Grant 2023A1515010688; Guangdong Provincial Key Laboratory under Grant 2023B1212060076; Shenzhen Municipal Science and Technology Innovation Council under Grant JCYJ20220531101412030.

Author information

Authors and Affiliations

Computer Vision Institute, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, China
Shaonan Liu, Xiaoling Luo & Linlin Shen
Department of Electrical Engineering, City University of Hong Kong, Kowloon, Hong Kong
Wenting Chen & Jie Liu
AI Research Center for Medical Image Analysis and Diagnosis, Shenzhen University, Shenzhen, China
Linlin Shen
Guangdong Provincial Key Laboratory of Intelligent Information Processing, Guangzhou, China
Xiaoling Luo & Linlin Shen

Authors

Shaonan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenting Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jie Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoling Luo
View author publications
You can also search for this author in PubMed Google Scholar
Linlin Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Wenting Chen or Linlin Shen .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, S., Chen, W., Liu, J., Luo, X., Shen, L. (2024). GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15001. Springer, Cham. https://doi.org/10.1007/978-3-031-72378-0_49

Download citation

DOI: https://doi.org/10.1007/978-3-031-72378-0_49
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72377-3
Online ISBN: 978-3-031-72378-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph