Abstract
Integrating external knowledge into neural models has been extensively studied to improve the performance of pre-trained language models, especially in the biomedical domain. In this paper, we explore the contribution of graph embeddings to relation extraction (RE) tasks. Given a pair of candidate entity mentions in a text, we hypothesize that the relations between them in an external knowledge base (KB) help predict whether a relation exists in the text, even if the KB relations are different from those of the RE task. Our approach consists of computing KB graph embeddings and estimating the plausibility that a KB relation exists between the candidate entities to better predict the target relation in the text. Experiments conducted on three biomedical RE tasks show that our method outperforms the baseline model PubMedBERT and achieves comparable performance to state-of-the-art methods. Our code is available at https://github.com/Bibliome/KBPubMedBERT.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
We use two public pre-trained models: biosyn-sapbert-bc5cdr-chemical for chemicals; biosyn-sapbert-bc2gn for genes.
- 3.
- 4.
Official evaluation kits: https://codalab.lisn.upsaclay.fr/competitions/8293#participate (DrugProt); http://bibliome.jouy.inra.fr/demo/BioNLP-OST-2019-Evaluation/index.html (\(\text {BB-Rel}_p\)).
References
Alrowili, S., Vijay-Shanker, K.: BioM-transformers: building large biomedical language models with BERT, ALBERT and ELECTRA. In: BioNLP workshop, pp. 221–227, Online, June 2021. ACL (2021)
Asada, M., Gunasekaran, N., Miwa, M., Sasaki, Y.: Representing a heterogeneous pharmaceutical knowledge-graph with textual information. Front. Res. Metrics Anal. 6, 670206 (2021)
Asada, M., Miwa, M., Sasaki, Y.: Integrating heterogeneous knowledge graphs into drug-drug interaction extraction from the literature. Bioinformatics 39(1), btac754, (2022)
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–270 (2004)
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NEURIPS, pp.787–2795, Red Hook, NY, USA, 2013. Curran Associates, Inc (2013)
Bossy, R., Deléger, L., Chaix, E., Ba, M., Nédellec, C.: Bacteria biotope at BioNLP open shared tasks 2019. In: Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, pp. 121–131 Hong Kong, China, November 2019. ACL (2019)
Chithrananda, S., Grand, G., Ramsundar, B., Chemberta: large-scale self-supervised pretraining for molecular property prediction. ArXiv:abs/2010.09885 (2020)
Davis, A.P., et al.: Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res. 51(D1):D1257–D1262 (2023)
Dérozier, S., et al.: Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach. PloS one 18(1), e0272473 (2023)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186, Minneapolis, Minnesota, June 2019. ACL (2019)
Federhen, S.: The NCBI Taxonomy database. Nucleic Acids Res. 40(D1), D136–D143 (2011)
Ferré, A., Deléger, L., Bossy, R., Zweigenbaum, P., Nédellec, C.: C-Norm: a neural approach to few-shot entity normalization. BMC Bioinform. 21(23), 579 (2020)
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: SIGKDD, pp. 855–864, New York, NY, USA, 2016. ACM (2016)
Yu, G., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare (HEALTH) 3(1), 1–23 (2021)
Hao, B., Zhu, H., Paschalidis, I.C.: Enhancing clinical BERT embedding using a biomedical knowledge base. In: COLING, pp. 657–661, Barcelona, Spain (Online), December (2020)
Iinuma, N., Miwa, M., Sasaki, Y.: Improving supervised drug-protein relation extraction with distantly supervised models. In: BioNLP workshop, pp. 161–170, Dublin, Ireland, May 2022. ACL (2022)
Krallinger, M., et al.: Overview of the BioCreative VI chemical-protein interaction track. In: BioCreative Workshop, vol. 1, pp. 141–146 (2017)
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Lewis, P., Ott, M., Du, J., Stoyanov, V.: Pretrained language models for biomedical and clinical tasks: understanding and extending the state-of-the-art. In: Clinical NLP Workshop, pp. 146–157, Online, 2020. ACL (2020)
Mao, J., Liu, W.: Integration of deep learning and traditional machine learning for knowledge extraction from biomedical literature. In: BioNLP Open Shared Tasks Workshop, pp. 168–173, Hong Kong, China, November 2019. ACL (2019)
Michalopoulos, G., Wang, Y., Kaka, H., Chen, H., Wong, A.: UmlsBERT: Clinical domain knowledge augmentation of contextual embeddings using the Unified Medical Language System Metathesaurus. In: NAACL-HLT, pp. 1744–1753, Online, June 2021. ACL (2021)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (workshop poster) (2013)
Miranda, A., Mehryary, F., Luoma, J., Pyysalo, S., Valencia, A., Krallinger, M.: Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene/protein relations. In: BioCreative Workshop, pp. 11–21 (2021)
Nédellec, C., Bossy, R., Chaix, E., Deléger, L.: Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity. arXiv preprint arXiv:1805.04107 (2018)
Papaluca, A., Krefl, D., Suominen, H., Lenskiy, A.: Pretrained knowledge base embeddings for improved sentential relation extraction. In: ACL: Student Research Workshop, pp. 373–382, Dublin, Ireland, May 2022. ACL (2022)
Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: SIGKDD, pp. 385–394. ACM (2017
Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J.: RotatE: Knowledge graph embedding by relational rotation in complex space. In: ICLR, New Orleans, LA, USA (2019) OpenReview.net
Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. In: ACL, pp. 3641–3650, Online, July (2020)
Vaswani, A., et al. Attention is all you need. In: NEURIPS, vol. 30, pp. 6000–6010, Red Hook, NY, USA, Curran Associates, Inc. (2017)
Wang, R., et al.: K-Adapter: infusing knowledge into pre-trained models with adapters. In: Findings of ACL-IJCNLP, pp. 1405–1418, Online, August 2021. ACL (2021)
Weber, L., Sänger, M., Garda, S., Barth, F., Alt, C., Leser, U.: Chemical–protein relation extraction with ensembles of carefully tuned pretrained language models. Database Nov 18 2022
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. ArXiv: abs/1609.08144 (2016)
Yuan, Z., Liu, Y., Tan, C., Huang, S., Huang, F.: Improving biomedical pretrained language models with knowledge. In: BioNLP Workshop, pp. 180–190, Online, 2021. ACL (2021)
Zhang, N., et al.: Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. In: NAACL-HLT, pp. 3016–3025, Minneapolis, Minnesota, June 2019. ACL (2019)
Zhang, Q., Liu, C., Chi, Y., Xie, X., Hua, X.: A multi-task learning framework for extracting bacteria biotope information. In: BioNLP Open Shared Tasks workshop, pp. 105–109, Hong Kong, China, November 2019. ACL (2019)
Zhang, Q., Liu, C., Chi, Y., Xie, X., Hua, X.: ERNIE: enhanced language representation with informative entities. In: ACL, pp. 1441–1451, Florence, Italy, July (2019)
Acknowledgements
We are grateful to the Saclay-IA platform of Université Paris-Saclay for providing computing and storage resources through its Lab-IA GPU cluster.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Ethics declarations
Disclosure of Interests
The authors have no competing interests to declare that are relevant to the content of this article.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tang, A., Deléger, L., Bossy, R., Zweigenbaum, P., Nédellec, C. (2024). Exploiting Graph Embeddings from Knowledge Bases for Neural Biomedical Relation Extraction. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14762. Springer, Cham. https://doi.org/10.1007/978-3-031-70239-6_28
Download citation
DOI: https://doi.org/10.1007/978-3-031-70239-6_28
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70238-9
Online ISBN: 978-3-031-70239-6
eBook Packages: Computer ScienceComputer Science (R0)