Abstract
Relation extraction plays an important role in natural language processing. There is a wide range of available datasets that benchmark existing relation extraction approaches. However, most benchmarking datasets are provided in different formats containing specific annotation rules, thus making it difficult to conduct experiments on different types of relation extraction approaches. We present RELD, an RDF knowledge graph of eight open-licensed and publicly available relation extraction datasets. We modeled the benchmarking datasets into a single ontology that provides a unified format for data access, along with annotations required for training different types of relation extraction systems. Moreover, RELD abides by the Linked Data principles. To the best of our knowledge, RELD is the largest RDF knowledge graph of entities and relations from text, containing \(\sim \)1230 million triples describing 1034 relations, 2 million sentences, 3 million abstracts and 4013 documents. RELD contributes to a variety of uses in the natural language processing community, and distinctly provides unified and easy modeling of data for benchmarking relation extraction and named entity recognition models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
We excluded datasets (e.g., TACRED) that are not freely available in this current version of the RELD. However, they can easily be included in the future.
- 2.
The detail information of schema, i.e., object properties, data properties, classes are available on RELD homepage.
- 3.
Due to page size limitation, some details and extra instances are truncated from Listing 1.2.
- 4.
We use VoID vocabulary to describe different metadata of the dataset.
- 5.
We use NLTK [9] for tokenization, parts of speech tagging, and punctuation.
- 6.
The complete details of the mapping process and the tools used are available in the tutorial https://reld-tutorial.readthedocs.io/en/latest/tutorial.html.
References
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94 (2000)
Ali, M., Saleem, M., Ngomo, A.C.N.: Rebench: microbenchmarking framework for relation extraction systems. In: Sattler, U., et al. (eds.) ISWC 2022. LNCS, vol. 13489, pp. 643–659. Springer, Heidelberg (2022). https://doi.org/10.1007/978-3-031-19433-7_37
Batista, D.S., Martins, B., Silva, M.J.: Semi-supervised bootstrapping of relationship extractors with distributional semantics. In: Empirical Methods in Natural Language Processing. ACL (2015)
Elsahar, H., et al.: T-rex: a large scale alignment of natural language with knowledge base triples. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (2018)
Gardent, C., Shimorina, A., Narayan, S., Perez-Beltrachini, L.: Creating training corpora for NLG micro-planners. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, Canada, pp. 179–188. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/P17-1017. https://aclanthology.org/P17-1017
Han, X., et al.: Fewrel: a large-scale supervised few-shot relation classification dataset with state-of-the-art evaluation. In: EMNLP (2018)
Hendrickx, I., et al.: SemEval-2010 task 8: multi-way classification of semantic relations between pairs of nominals. In: Proceedings of the 5th International Workshop on Semantic Evaluation, Uppsala, Sweden, pp. 33–38. Association for Computational Linguistics (2010). https://aclanthology.org/S10-1006
Honnibal, M., Montani, I.: spacy 2: natural language understanding with bloom embeddings, convolutional neural networks and incremental parsing. 7(1), 411–420 (2017, to appear)
Loper, E., Bird, S.: Nltk: the natural language toolkit. arXiv preprint cs/0205028 (2002)
Martinez-Rodriguez, J.L., Hogan, A., Lopez-Arevalo, I.: Information extraction meets the semantic web: a survey. Semant. Web 11(2), 255–335 (2020)
Moreira, J., Oliveira, C., Macêdo, D., Zanchettin, C., Barbosa, L.: Distantly-supervised neural relation extraction with side information using BERT. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–7 (2020). https://doi.org/10.1109/IJCNN48605.2020.9206648
Moussallem, D., Usbeck, R., Röeder, M., Ngomo, A.C.N.: Mag: a multilingual, knowledge-base agnostic and deterministic entity linking approach. In: Proceedings of the Knowledge Capture Conference, pp. 1–8 (2017)
Nadgeri, A., et al.: KGPool: dynamic knowledge graph context selection for relation extraction. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 535–548. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.findings-acl.48. https://aclanthology.org/2021.findings-acl.48
Ngonga Ngomo, A.C., et al.: LIMES - a framework for link discovery on the semantic web. KI-Künstliche Intelligenz, German Journal of Artificial Intelligence - Organ des Fachbereichs "Künstliche Intelligenz" der Gesellschaft für Informatik e.V. (2021). https://papers.dice-research.org/2021/KI_LIMES/public.pdf
Ning, Q., Feng, Z., Roth, D.: A structured learning approach to temporal relation extraction. arXiv preprint arXiv:1906.04943 (2019)
Orr, D.: 50,000 lessons on how to read: a relation extraction corpus. Online: Google Research Blog, vol. 11 (2013)
Pawar, S., Palshikar, G.K., Bhattacharyya, P.: Relation extraction: a survey. arXiv preprint arXiv:1712.05191 (2017)
Peng, Y., Yan, S., Lu, Z.: Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets. arXiv preprint arXiv:1906.05474 (2019)
Qu, M., Gao, T., Xhonneux, L.P., Tang, J.: Few-shot relation extraction via Bayesian meta-learning on relation graphs. In: Daume III, H., Singh, A. (eds.) Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 7867–7876. PMLR (2020). https://proceedings.mlr.press/v119/qu20a.html
Ravi, S., Larochelle, H.: Optimization as a model for few-shot learning (2016)
Riedel, S., Yao, L., McCallum, A., Marlin, B.M.: Relation extraction with matrix factorization and universal schemas. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 74–84 (2013)
Sorokin, D., Gurevych, I.: Context-aware representations for knowledge base relation extraction. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, pp. 1784–1789. Association for Computational Linguistics (2017). https://doi.org/10.18653/v1/D17-1188. https://aclanthology.org/D17-1188
Sui, D., Chen, Y., Liu, K., Zhao, J., Zeng, X., Liu, S.: Joint entity and relation extraction with set prediction networks. arXiv preprint arXiv:2011.01675 (2020)
Surdeanu, M., Tibshirani, J., Nallapati, R., Manning, C.D.: Multi-instance multi-label learning for relation extraction. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 455–465 (2012)
Tran, T.T., Le, P., Ananiadou, S.: Revisiting unsupervised relation extraction. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pp. 7498–7505. Association for Computational Linguistics (2020). https://www.aclweb.org/anthology/2020.acl-main.669
Tran, T.T., Le, P., Ananiadou, S.: Revisiting unsupervised relation extraction (2020). https://doi.org/10.48550/ARXIV.2005.00087. https://arxiv.org/abs/2005.00087
Walker, C., Strassel, S., Medero, J., Maeda, K.: ACE 2005 multilingual training corpus. Linguistic Data Consortium, Philadelphia, vol. 57, p. 45 (2006)
Wang, Y., Yu, B., Zhang, Y., Liu, T., Zhu, H., Sun, L.: TPLinker: single-stage joint extraction of entities and relations through token pair linking. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, pp. 1572–1582. International Committee on Computational Linguistics (2020). https://doi.org/10.18653/v1/2020.coling-main.138. https://aclanthology.org/2020.coling-main.138
Yao, Y., et al.: Docred: a large-scale document-level relation extraction dataset. arXiv preprint arXiv:1906.06127 (2019)
Yu, M., Yin, W., Hasan, K.S., Santos, C.d., Xiang, B., Zhou, B.: Improved neural relation detection for knowledge base question answering. arXiv preprint arXiv:1704.06194 (2017)
Zhang, Y., Zhong, V., Chen, D., Angeli, G., Manning, C.D.: Position-aware attention and supervised data improve slot filling. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 35–45 (2017)
Acknowledgment
This work has been supported by the BMBF-funded EuroStars project PORQUE (01QE2056C), the European Union’s Horizon Europe research and innovation programme ENEXA (101070305), the Ministry of Culture and Science of North Rhine-Westphalia (MKW NRW) within the project SAIL (NW21-059D) and the University of Malakand Pakistan.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ali, M., Saleem, M., Moussallem, D., Sherif, M.A., Ngonga Ngomo, AC. (2023). RELD: A Knowledge Graph of Relation Extraction Datasets. In: Pesquita, C., et al. The Semantic Web. ESWC 2023. Lecture Notes in Computer Science, vol 13870. Springer, Cham. https://doi.org/10.1007/978-3-031-33455-9_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-33455-9_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-33454-2
Online ISBN: 978-3-031-33455-9
eBook Packages: Computer ScienceComputer Science (R0)