Skip to main content

Exploiting Graph Embeddings from Knowledge Bases for Neural Biomedical Relation Extraction

  • Conference paper
  • First Online:
Natural Language Processing and Information Systems (NLDB 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14762))

  • 401 Accesses

Abstract

Integrating external knowledge into neural models has been extensively studied to improve the performance of pre-trained language models, especially in the biomedical domain. In this paper, we explore the contribution of graph embeddings to relation extraction (RE) tasks. Given a pair of candidate entity mentions in a text, we hypothesize that the relations between them in an external knowledge base (KB) help predict whether a relation exists in the text, even if the KB relations are different from those of the RE task. Our approach consists of computing KB graph embeddings and estimating the plausibility that a KB relation exists between the candidate entities to better predict the target relation in the text. Experiments conducted on three biomedical RE tasks show that our method outperforms the baseline model PubMedBERT and achieves comparable performance to state-of-the-art methods. Our code is available at https://github.com/Bibliome/KBPubMedBERT.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.ncbi.nlm.nih.gov/gene.

  2. 2.

    We use two public pre-trained models: biosyn-sapbert-bc5cdr-chemical for chemicals; biosyn-sapbert-bc2gn for genes.

  3. 3.

    https://github.com/DeepGraphLearning/KnowledgeGraphEmbedding.

  4. 4.

    Official evaluation kits: https://codalab.lisn.upsaclay.fr/competitions/8293#participate (DrugProt); http://bibliome.jouy.inra.fr/demo/BioNLP-OST-2019-Evaluation/index.html (\(\text {BB-Rel}_p\)).

References

  1. Alrowili, S., Vijay-Shanker, K.: BioM-transformers: building large biomedical language models with BERT, ALBERT and ELECTRA. In: BioNLP workshop, pp. 221–227, Online, June 2021. ACL (2021)

    Google Scholar 

  2. Asada, M., Gunasekaran, N., Miwa, M., Sasaki, Y.: Representing a heterogeneous pharmaceutical knowledge-graph with textual information. Front. Res. Metrics Anal. 6, 670206 (2021)

    Google Scholar 

  3. Asada, M., Miwa, M., Sasaki, Y.: Integrating heterogeneous knowledge graphs into drug-drug interaction extraction from the literature. Bioinformatics 39(1), btac754, (2022)

    Google Scholar 

  4. Bodenreider, O.: The Unified Medical Language System (UMLS): integrating biomedical terminology. Nucleic Acids Res. 32(Database issue), D267–270 (2004)

    Google Scholar 

  5. Bordes, A., Usunier, N., Garcia-Duran, A., Weston, J., Yakhnenko, O.: Translating embeddings for modeling multi-relational data. In: NEURIPS, pp.787–2795, Red Hook, NY, USA, 2013. Curran Associates, Inc (2013)

    Google Scholar 

  6. Bossy, R., Deléger, L., Chaix, E., Ba, M., Nédellec, C.: Bacteria biotope at BioNLP open shared tasks 2019. In: Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, pp. 121–131 Hong Kong, China, November 2019. ACL (2019)

    Google Scholar 

  7. Chithrananda, S., Grand, G., Ramsundar, B., Chemberta: large-scale self-supervised pretraining for molecular property prediction. ArXiv:abs/2010.09885 (2020)

  8. Davis, A.P., et al.: Comparative Toxicogenomics Database (CTD): update 2023. Nucleic Acids Res. 51(D1):D1257–D1262 (2023)

    Google Scholar 

  9. Dérozier, S., et al.: Omnicrobe, an open-access database of microbial habitats and phenotypes using a comprehensive text mining and data fusion approach. PloS one 18(1), e0272473 (2023)

    Google Scholar 

  10. Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186, Minneapolis, Minnesota, June 2019. ACL (2019)

    Google Scholar 

  11. Federhen, S.: The NCBI Taxonomy database. Nucleic Acids Res. 40(D1), D136–D143 (2011)

    Google Scholar 

  12. Ferré, A., Deléger, L., Bossy, R., Zweigenbaum, P., Nédellec, C.: C-Norm: a neural approach to few-shot entity normalization. BMC Bioinform. 21(23), 579 (2020)

    Article  Google Scholar 

  13. Grover, A., Leskovec, J.: node2vec: scalable feature learning for networks. In: SIGKDD, pp. 855–864, New York, NY, USA, 2016. ACM (2016)

    Google Scholar 

  14. Yu, G., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare (HEALTH) 3(1), 1–23 (2021)

    Google Scholar 

  15. Hao, B., Zhu, H., Paschalidis, I.C.: Enhancing clinical BERT embedding using a biomedical knowledge base. In: COLING, pp. 657–661, Barcelona, Spain (Online), December (2020)

    Google Scholar 

  16. Iinuma, N., Miwa, M., Sasaki, Y.: Improving supervised drug-protein relation extraction with distantly supervised models. In: BioNLP workshop, pp. 161–170, Dublin, Ireland, May 2022. ACL (2022)

    Google Scholar 

  17. Krallinger, M., et al.: Overview of the BioCreative VI chemical-protein interaction track. In: BioCreative Workshop, vol. 1, pp. 141–146 (2017)

    Google Scholar 

  18. Lee, J., et al.: BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36(4), 1234–1240 (2020)

    Google Scholar 

  19. Lewis, P., Ott, M., Du, J., Stoyanov, V.: Pretrained language models for biomedical and clinical tasks: understanding and extending the state-of-the-art. In: Clinical NLP Workshop, pp. 146–157, Online, 2020. ACL (2020)

    Google Scholar 

  20. Mao, J., Liu, W.: Integration of deep learning and traditional machine learning for knowledge extraction from biomedical literature. In: BioNLP Open Shared Tasks Workshop, pp. 168–173, Hong Kong, China, November 2019. ACL (2019)

    Google Scholar 

  21. Michalopoulos, G., Wang, Y., Kaka, H., Chen, H., Wong, A.: UmlsBERT: Clinical domain knowledge augmentation of contextual embeddings using the Unified Medical Language System Metathesaurus. In: NAACL-HLT, pp. 1744–1753, Online, June 2021. ACL (2021)

    Google Scholar 

  22. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: ICLR (workshop poster) (2013)

    Google Scholar 

  23. Miranda, A., Mehryary, F., Luoma, J., Pyysalo, S., Valencia, A., Krallinger, M.: Overview of DrugProt BioCreative VII track: quality evaluation and large scale text mining of drug-gene/protein relations. In: BioCreative Workshop, pp. 11–21 (2021)

    Google Scholar 

  24. Nédellec, C., Bossy, R., Chaix, E., Deléger, L.: Text-mining and ontologies: new approaches to knowledge discovery of microbial diversity. arXiv preprint arXiv:1805.04107 (2018)

  25. Papaluca, A., Krefl, D., Suominen, H., Lenskiy, A.: Pretrained knowledge base embeddings for improved sentential relation extraction. In: ACL: Student Research Workshop, pp. 373–382, Dublin, Ireland, May 2022. ACL (2022)

    Google Scholar 

  26. Ribeiro, L.F., Saverese, P.H., Figueiredo, D.R.: struc2vec: learning node representations from structural identity. In: SIGKDD, pp. 385–394. ACM (2017

    Google Scholar 

  27. Sun, Z., Deng, Z.H., Nie, J.Y., Tang, J.: RotatE: Knowledge graph embedding by relational rotation in complex space. In: ICLR, New Orleans, LA, USA (2019) OpenReview.net

    Google Scholar 

  28. Sung, M., Jeon, H., Lee, J., Kang, J.: Biomedical entity representations with synonym marginalization. In: ACL, pp. 3641–3650, Online, July (2020)

    Google Scholar 

  29. Vaswani, A., et al. Attention is all you need. In: NEURIPS, vol. 30, pp. 6000–6010, Red Hook, NY, USA, Curran Associates, Inc. (2017)

    Google Scholar 

  30. Wang, R., et al.: K-Adapter: infusing knowledge into pre-trained models with adapters. In: Findings of ACL-IJCNLP, pp. 1405–1418, Online, August 2021. ACL (2021)

    Google Scholar 

  31. Weber, L., Sänger, M., Garda, S., Barth, F., Alt, C., Leser, U.: Chemical–protein relation extraction with ensembles of carefully tuned pretrained language models. Database Nov 18 2022

    Google Scholar 

  32. Wu, Y., et al.: Google’s neural machine translation system: Bridging the gap between human and machine translation. ArXiv: abs/1609.08144 (2016)

  33. Yuan, Z., Liu, Y., Tan, C., Huang, S., Huang, F.: Improving biomedical pretrained language models with knowledge. In: BioNLP Workshop, pp. 180–190, Online, 2021. ACL (2021)

    Google Scholar 

  34. Zhang, N., et al.: Long-tail relation extraction via knowledge graph embeddings and graph convolution networks. In: NAACL-HLT, pp. 3016–3025, Minneapolis, Minnesota, June 2019. ACL (2019)

    Google Scholar 

  35. Zhang, Q., Liu, C., Chi, Y., Xie, X., Hua, X.: A multi-task learning framework for extracting bacteria biotope information. In: BioNLP Open Shared Tasks workshop, pp. 105–109, Hong Kong, China, November 2019. ACL (2019)

    Google Scholar 

  36. Zhang, Q., Liu, C., Chi, Y., Xie, X., Hua, X.: ERNIE: enhanced language representation with informative entities. In: ACL, pp. 1441–1451, Florence, Italy, July (2019)

    Google Scholar 

Download references

Acknowledgements

We are grateful to the Saclay-IA platform of Université Paris-Saclay for providing computing and storage resources through its Lab-IA GPU cluster.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anfu Tang .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, A., Deléger, L., Bossy, R., Zweigenbaum, P., Nédellec, C. (2024). Exploiting Graph Embeddings from Knowledge Bases for Neural Biomedical Relation Extraction. In: Rapp, A., Di Caro, L., Meziane, F., Sugumaran, V. (eds) Natural Language Processing and Information Systems. NLDB 2024. Lecture Notes in Computer Science, vol 14762. Springer, Cham. https://doi.org/10.1007/978-3-031-70239-6_28

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70239-6_28

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70238-9

  • Online ISBN: 978-3-031-70239-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics