Exploiting Graph Embeddings from Knowledge Bases for Neural Biomedical Relation Extraction

Tang, Anfu; Deléger, Louise; Bossy, Robert; Zweigenbaum, Pierre; Nédellec, Claire

doi:10.1007/978-3-031-70239-6_28

Anfu Tang¹¹,
Louise Deléger¹²,
Robert Bossy¹²,
Pierre Zweigenbaum¹³ &
…
Claire Nédellec¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14762))

Included in the following conference series:

International Conference on Applications of Natural Language to Information Systems

401 Accesses

Abstract

Integrating external knowledge into neural models has been extensively studied to improve the performance of pre-trained language models, especially in the biomedical domain. In this paper, we explore the contribution of graph embeddings to relation extraction (RE) tasks. Given a pair of candidate entity mentions in a text, we hypothesize that the relations between them in an external knowledge base (KB) help predict whether a relation exists in the text, even if the KB relations are different from those of the RE task. Our approach consists of computing KB graph embeddings and estimating the plausibility that a KB relation exists between the candidate entities to better predict the target relation in the text. Experiments conducted on three biomedical RE tasks show that our method outperforms the baseline model PubMedBERT and achieves comparable performance to state-of-the-art methods. Our code is available at https://github.com/Bibliome/KBPubMedBERT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 119.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.ncbi.nlm.nih.gov/gene.
2.
We use two public pre-trained models: biosyn-sapbert-bc5cdr-chemical for chemicals; biosyn-sapbert-bc2gn for genes.
3.
https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding.
4.
Official evaluation kits: https://codalab.lisn.upsaclay.fr/competitions/8293#participate (DrugProt); http://bibliome.jouy.inra.fr/demo/BioNLP-OST-2019-Evaluation/index.html ($\text {BB-Rel}_p$).

References

Alrowili, S., Vijay-Shanker, K.: BioM-transformers: building large biomedical language models with BERT, ALBERT and ELECTRA. In: BioNLP workshop, pp. 221–227, Online, June 2021. ACL (2021)
Google Scholar
Asada, M., Gunasekaran, N., Miwa, M., Sasaki, Y.: Representing a heterogeneous pharmaceutical knowledge-graph with textual information. Front. Res. Metrics Anal. 6, 670206 (2021)
Google Scholar
Asada, M., Miwa, M., Sasaki, Y.: Integrating heterogeneous knowledge graphs into drug-drug interaction extraction from the literature. Bioinformatics 39(1), btac754, (2022)
Google Scholar
Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–270 (2004)
Google Scholar
Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NEURIPS, pp.787–2795, Red Hook, NY, USA, 2013. Curran Associates, Inc (2013)
Google Scholar
Bossy, R., Deléger, L., Chaix, E., Ba, M., Nédellec, C.: Bacteria biotope at BioNLP open shared tasks 2019. In: Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, pp. 121–131 Hong Kong, China, November 2019. ACL (2019)
Google Scholar
Chithrananda, S., Grand, G., Ramsundar, B., Chemberta: large-scale self-supervised pretraining for molecular property prediction. ArXiv:abs/2010.09885 (2020)
Davis, A.P., et al.: Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res. 51(D1):D1257–D1262 (2023)
Google Scholar
Dérozier, S., et al.: Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach. PloS one 18(1), e0272473 (2023)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186, Minneapolis, Minnesota, June 2019. ACL (2019)
Google Scholar
Federhen, S.: The NCBI Taxonomy database. Nucleic Acids Res. 40(D1), D136–D143 (2011)
Google Scholar
Ferré, A., Deléger, L., Bossy, R., Zweigenbaum, P., Nédellec, C.: C-Norm: a neural approach to few-shot entity normalization. BMC Bioinform. 21(23), 579 (2020)
Article Google Scholar
Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: SIGKDD, pp. 855–864, New York, NY, USA, 2016. ACM (2016)
Google Scholar
Yu, G., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare (HEALTH) 3(1), 1–23 (2021)
Google Scholar
Hao, B., Zhu, H., Paschalidis, I.C.: Enhancing clinical BERT embedding using a biomedical knowledge base. In: COLING, pp. 657–661, Barcelona, Spain (Online), December (2020)
Google Scholar
Iinuma, N., Miwa, M., Sasaki, Y.: Improving supervised drug-protein relation extraction with distantly supervised models. In: BioNLP workshop, pp. 161–170, Dublin, Ireland, May 2022. ACL (2022)
Google Scholar
Krallinger, M., et al.: Overview of the BioCreative VI chemical-protein interaction track. In: BioCreative Workshop, vol. 1, pp. 141–146 (2017)
Google Scholar
Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)
Google Scholar
Lewis, P., Ott, M., Du, J., Stoyanov, V.: Pretrained language models for biomedical and clinical tasks: understanding and extending the state-of-the-art. In: Clinical NLP Workshop, pp. 146–157, Online, 2020. ACL (2020)
Google Scholar
Mao, J., Liu, W.: Integration of deep learning and traditional machine learning for knowledge extraction from biomedical literature. In: BioNLP Open Shared Tasks Workshop, pp. 168–173, Hong Kong, China, November 2019. ACL (2019)
Google Scholar
Michalopoulos, G., Wang, Y., Kaka, H., Chen, H., Wong, A.: UmlsBERT: Clinical domain knowledge augmentation of contextual embeddings using the Unified Medical Language System Metathesaurus. In: NAACL-HLT, pp. 1744–1753, Online, June 2021. ACL (2021)
Google Scholar
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (workshop poster) (2013)
Google Scholar
Miranda, A., Mehryary, F., Luoma, J., Pyysalo, S., Valencia, A., Krallinger, M.: Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene/protein relations. In: BioCreative Workshop, pp. 11–21 (2021)
Google Scholar
Nédellec, C., Bossy, R., Chaix, E., Deléger, L.: Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity. arXiv preprint arXiv:1805.04107 (2018)
Papaluca, A., Krefl, D., Suominen, H., Lenskiy, A.: Pretrained knowledge base embeddings for improved sentential relation extraction. In: ACL: Student Research Workshop, pp. 373–382, Dublin, Ireland, May 2022. ACL (2022)
Google Scholar
Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: SIGKDD, pp. 385–394. ACM (2017
Google Scholar
Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J.: RotatE: Knowledge graph embedding by relational rotation in complex space. In: ICLR, New Orleans, LA, USA (2019) OpenReview.net
Google Scholar
Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. In: ACL, pp. 3641–3650, Online, July (2020)
Google Scholar
Vaswani, A., et al. Attention is all you need. In: NEURIPS, vol. 30, pp. 6000–6010, Red Hook, NY, USA, Curran Associates, Inc. (2017)
Google Scholar
Wang, R., et al.: K-Adapter: infusing knowledge into pre-trained models with adapters. In: Findings of ACL-IJCNLP, pp. 1405–1418, Online, August 2021. ACL (2021)
Google Scholar
Weber, L., Sänger, M., Garda, S., Barth, F., Alt, C., Leser, U.: Chemical–protein relation extraction with ensembles of carefully tuned pretrained language models. Database Nov 18 2022
Google Scholar
Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. ArXiv: abs/1609.08144 (2016)
Yuan, Z., Liu, Y., Tan, C., Huang, S., Huang, F.: Improving biomedical pretrained language models with knowledge. In: BioNLP Workshop, pp. 180–190, Online, 2021. ACL (2021)
Google Scholar
Zhang, N., et al.: Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. In: NAACL-HLT, pp. 3016–3025, Minneapolis, Minnesota, June 2019. ACL (2019)
Google Scholar
Zhang, Q., Liu, C., Chi, Y., Xie, X., Hua, X.: A multi-task learning framework for extracting bacteria biotope information. In: BioNLP Open Shared Tasks workshop, pp. 105–109, Hong Kong, China, November 2019. ACL (2019)
Google Scholar
Zhang, Q., Liu, C., Chi, Y., Xie, X., Hua, X.: ERNIE: enhanced language representation with informative entities. In: ACL, pp. 1441–1451, Florence, Italy, July (2019)
Google Scholar

Download references

Acknowledgements

We are grateful to the Saclay-IA platform of Université Paris-Saclay for providing computing and storage resources through its Lab-IA GPU cluster.

Author information

Authors and Affiliations

Sorbonne Université, CNRS, ISIR, 75005, Paris, France
Anfu Tang
Université Paris-Saclay, INRAE, MaIAGE, 78350, Jouy-en-Josas, France
Louise Deléger, Robert Bossy & Claire Nédellec
Université Paris-Saclay, CNRS, Laboratoire Interdisciplinaire des Sciences du Numérique, 91405, Orsay, France
Pierre Zweigenbaum

Authors

Anfu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Louise Deléger
View author publications
You can also search for this author in PubMed Google Scholar
Robert Bossy
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Zweigenbaum
View author publications
You can also search for this author in PubMed Google Scholar
Claire Nédellec
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anfu Tang .

Editor information

Editors and Affiliations

University of Turin, Turin, Italy
Amon Rapp
University of Turin, Turin, Italy
Luigi Di Caro
University of Derby, Derby, UK
Farid Meziane
Oakland University, Rochester, MI, USA
Vijayan Sugumaran

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tang, A., Deléger, L., Bossy, R., Zweigenbaum, P., Nédellec, C. (2024). Exploiting Graph Embeddings from Knowledge Bases for Neural Biomedical Relation Extraction. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14762. Springer, Cham. https://doi.org/10.1007/978-3-031-70239-6_28

Download citation

DOI: https://doi.org/10.1007/978-3-031-70239-6_28
Published: 20 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70238-9
Online ISBN: 978-3-031-70239-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Exploiting Graph Embeddings from Knowledge Bases for Neural Biomedical Relation Extraction