Abstract
Gaze estimation is pivotal in human scene comprehension tasks, particularly in medical diagnostic analysis. Eye-tracking technology facilitates the recording of physicians’ ocular movements during image interpretation, thereby elucidating their visual attention patterns and information-processing strategies. In this paper, we initially define the context-aware gaze estimation problem in medical radiology report settings. To understand the attention allocation and cognitive behavior of radiologists during the medical image interpretation process, we propose a context-aware Gaze EstiMation (GEM) network that utilizes eye gaze data collected from radiologists to simulate their visual search behavior patterns throughout the image interpretation process. It consists of a context-awareness module, visual behavior graph construction, and visual behavior matching. Within the context-awareness module, we achieve intricate multimodal registration by establishing connections between medical reports and images. Subsequently, for a more accurate simulation of genuine visual search behavior patterns, we introduce a visual behavior graph structure, capturing such behavior through high-order relationships (edges) between gaze points (nodes). To maintain the authenticity of visual behavior, we devise a visual behavior-matching approach, adjusting the high-order relationships between them by matching the graph constructed from real and estimated gaze points. Extensive experiments on four publicly available datasets demonstrate the superiority of GEM over existing methods and its strong generalizability, which also provides a new direction for the effective utilization of diverse modalities in medical image interpretation and enhances the interpretability of models in the field of medical imaging. https://github.com/Tiger-SN/GEM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aresta, G., et al.: Automatic lung nodule detection combined with gaze information improves radiologists’ screening performance. IEEE J. Biomed. Health Inform. 24(10), 2894–2901 (2020)
Boecking, B., et al.: Making the most of text semantics to improve biomedical vision-language processing. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds.) ECCV 2022. LNCS, vol. 13696, pp. 1–21. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20059-5_1
Brunyé, T.T., Nallamothu, B.K., Elmore, J.G.: Eye-tracking for assessing medical image interpretation: a pilot feasibility study comparing novice vs expert cardiologists. Perspect. Med. Educ. 8, 65–73 (2019)
Chen, W., Li, X., Shen, L., Yuan, Y.: Fine-grained image-text alignment in medical imaging enables cyclic image-report generation. arXiv preprint arXiv:2312.08078 (2023)
Chen, W., et al.: Medical image synthesis via fine-grained image-text alignment and anatomy-pathology prompting. arXiv preprint arXiv:2403.06835 (2024)
Chong, E., Ruiz, N., Wang, Y., Zhang, Y., Rozga, A., Rehg, J.M.: Connecting gaze, scene, and attention: generalized attention estimation via joint modeling of gaze and scene saliency. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11209, pp. 397–412. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01228-1_24
Demner-Fushman, D., et al.: Preparing a collection of radiology examinations for distribution and retrieval. J. Am. Med. Inform. Assoc. 23(2), 304–310 (2016)
Deng, J., Yang, Z., Chen, T., Zhou, W., Li, H.: TransVG: end-to-end visual grounding with transformers. In: ICCV, pp. 1769–1779 (2021)
Fu, K., Liu, S., Luo, X., Wang, M.: Robust point cloud registration framework based on deep graph matching. In: CVPR, pp. 8893–8902 (2021)
Van der Gijp, A., et al.: How visual search relates to visual diagnostic performance: a narrative systematic review of eye-tracking research in radiology. Adv. Health Sci. Educ. Theory Pract. 22, 765–787 (2017)
Henderson, J.M.: Human gaze control during real-world scene perception. Trends Cogn. Sci. 7(11), 498–504 (2003)
Hsieh, C., Ouyang, C., Nascimento, J.C., Pereira, J., Jorge, J., Moreira, C.: Mimic-eye: integrating mimic datasets with reflacx and eye gaze for multimodal deep learning applications (2023)
Ikeda, A., et al.: Objective evaluation of gaze location patterns using eye tracking during cystoscopy and artificial intelligence-assisted lesion detection. J. Endourol. 38, 865–870 (2024)
Lian, D., Yu, Z., Gao, S.: Believe it or not, we know what you are looking at! In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11363, pp. 35–50. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20893-6_3
Liu, J., Guo, X., Yuan, Y.: Graph-based surgical instrument adaptive segmentation via domain-common knowledge. IEEE Trans. Med. Imag. 41(3), 715–726 (2021)
Liu, J., et al.: Clip-driven universal model for organ segmentation and tumor detection. In: ICCV, pp. 21152–21164 (2023)
Liu, R., et al.: An intriguing failing of convolutional neural networks and the CoordConv solution. In: NeurIPS, vol. 31 (2018)
Radford, A., et al.: Learning transferable visual models from natural language supervision. In: ICML, pp. 8748–8763 (2021)
Sinkhorn, R.: A relationship between arbitrary positive matrices and doubly stochastic matrices. Ann. Math. Stat. 35(2), 876–879 (1964)
Soda, P., et al.: AiforCOVID: predicting the clinical outcomes in patients with COVID-19 applying AI to chest-X-rays. An Italian multicentre study. Med. Image Anal. 74, 102216 (2021)
Tonini, F., Dall’Asen, N., Beyan, C., Ricci, E.: Object-aware gaze target detection. In: ICCV, pp. 21860–21869 (2023)
Tu, D., Min, X., Duan, H., Guo, G., Zhai, G., Shen, W.: End-to-end human-gaze-target detection with transformers. In: CVPR, pp. 2192–2200. IEEE (2022)
Ulyanov, D., Vedaldi, A., Lempitsky, V.: Instance normalization: the missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022 (2016)
Waite, S., et al.: Analysis of perceptual expertise in radiology-current knowledge and a new perspective. Front. Hum. Neurosci. 13, 213 (2019)
Wang, B., Hu, T., Li, B., Chen, X., Zhang, Z.: GaTector: a unified framework for gaze object prediction. In: CVPR, pp. 19588–19597 (2022)
Wang, S., Ouyang, X., Liu, T., Wang, Q., Shen, D.: Follow my eye: using gaze to supervise computer-aided diagnosis. IEEE Trans. Med. Imag. 41(7), 1688–1698 (2022)
Wenting, C., Jie, L., Yixuan, Y.: Bi-VLGM: Bi-level class-severity-aware vision-language graph matching for text guided medical image segmentation. arXiv preprint arXiv:2305.12231 (2023)
Yang, X., et al.: TCEIP: text condition embedded regression network for dental implant position prediction. In: Greenspan, H., et al. (eds.) MICCAI 2023. LNCS, vol. 14225, pp. 317–326. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43987-2_31
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. IEEE Trans. Pattern Anal. Mach. Intell. 35(12), 2878–2890 (2012)
Zhang, X., Wang, W., Chen, Z., Xu, Y., Zhang, J., Tao, D.: CLAMP: prompt-based contrastive learning for connecting language and animal pose. In: CVPR, pp. 23272–23281 (2023)
Acknowledgement
This work was supported by the National Natural Science Foundation of China under Grant 82261138629 and 12326610; Guangdong Basic and Applied Basic Research Foundation under Grant 2023A1515010688; Guangdong Provincial Key Laboratory under Grant 2023B1212060076; Shenzhen Municipal Science and Technology Innovation Council under Grant JCYJ20220531101412030.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Liu, S., Chen, W., Liu, J., Luo, X., Shen, L. (2024).
GEM: Context-Aware Gaze EstiMation with Visual Search Behavior Matching for Chest Radiograph.
In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15001. Springer, Cham. https://doi.org/10.1007/978-3-031-72378-0_49
Download citation
DOI: https://doi.org/10.1007/978-3-031-72378-0_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72377-3
Online ISBN: 978-3-031-72378-0
eBook Packages: Computer ScienceComputer Science (R0)