Abstract
This paper provides an overview on graph databases for the retrieval and the integration of knowledge originating from textual data, attempting to bring together different bricks that are usually addressed separately. It explores concepts and insights that result from the scientific activities promoted by the GDR MADICS DOING Action (Intelligent Data: turning information into knowledge). The action promoted scientific discussion on the challenges, current findings, and open issues in converting textual data into information and, ultimately, knowledge. This topic has been investigated within a multidisciplinary context, involving specialists in Databases (DB), Natural Language Processing (NLP), Artificial Intelligence (AI), and professionals in various application domains.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
DOING is a coordination action funded by the network MADICS of the French Council of Scientific Research - CNRS http://www.madics.fr/actions/doing/. Created as a regional initiative in 2019, DOING extended its scope to a national level within the GDR MADICS in 2020 before attaining official status as an action.
- 2.
The impact of the DOING coordination action goes beyond national boundaries, inspiring international initiatives: (i) A regional project, APR-IA, supported by the Centre Val de Loire region (2021–2024). (ii) The international workshop DOING@ADBIS, marking its 5th edition this year, underscores the far-reaching influence of DOING.
- 3.
- 4.
DOING Webinar: Language-aware indexing for conjunctive path queries, George Fletcher, Eindhoven University of Technology, Netherlands, Mars, 2021.
- 5.
DOING Webinar: Evaluating navigational queries over graphs, Domagoj Vrogoc, Pontificia Universidad Católica de Chile, Chili, July 2021.
- 6.
MADICS Symposium, DOING workshop. On July 2022: Aperçu général des langages de requêtes pour graphes àpropriétés by Victor Marsault. On May 2023: A Researcher’s Digest of GQL by Liat Peterfreund.
- 7.
DOING Webinar: Natural language processing for epidemiology & public health, Aurélie Névéol, CNRS, LISN, France, 5th July, 2021.
- 8.
MADICS Symposium, DOING workshop: Tools for processing clinical reports in health data warehouses, Perceval Wajsbürt, Assistance de Paris - Hôpitaux de Paris (AP-HP), France, 25th June, 2023.
- 9.
DOING Webinar: Building Scientific Knowledge Graphs from Scholarly Data, Davide Buscaldi, LIPN, Université Sorbonne Paris Nord, France, 17th May, 2021.
- 10.
- 11.
DOING Webinars: Managing data quality in the age of big data, Salima Benbernou, Université Paris Descartes, LIPADE, France, 5th June 2020; Building Scientific Knowledge Graphs from Scholarly Data, Davide Buscaldi, LIPN, Université Sorbonne Paris Nord, France, 17th May, 2021.
- 12.
DOING Webinar: From Deep Learning to Deep Semantics, Andre Freitas, University of Manchester, UK, 8th July 2020.
- 13.
DOING Panel, Representing content and extracting knowledge from texts: automatic language learning, graph management systems, machine learning for graph analysis and semantic web approaches, Symposium MADICS, Lyon, 2022.
- 14.
French geological survey https://www.brgm.fr/en.
References
Amarilli, A., Bourhis, P., Mengel, S., Niewerth, M.: Constant-delay enumeration for nondeterministic document spanners. In: ICDT. LIPIcs, vol. 127, pp. 22:1–22:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
Balalau, O., et al.: Statistical claim checking: statcheck in action. In: Hasan, M.A., Xiong, L. (eds.) Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022, pp. 4798–4802. ACM (2022)
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676, IJCAI 2007. Morgan Kaufmann Publishers Inc., San Francisco (2007)
Bonifati, A., Fletcher, G., Voigt, H., Yakovets, N., Jagadish, H.: Querying Graphs, vol. 10. Springer, Cham (2018)
Buitelaar, P., Olejnik, D., Sintek, M.: A Protégé plug-in for ontology extraction from text based on linguistic analysis. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 31–44. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25956-5_3
Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. 24, 305–339 (2005)
Conte, D.: Graphs in pattern recognition: successes, shortcomings, and perspectives. J. Electron. Imaging 32(2), 020701–020701 (2023)
Coste, L., Helmers, F., Kheddouci, H., Le Nestour, L., Niazi, M., Vargas-Solar, G.: Strategies for creating knowledge graphs to depict a multi-perspective queer communities representation. In: Workshops of the EDBT/ICDT 2023 Joint Conference, vol. 3379 (2023)
Dessí, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E.: SCICERO: a deep learning and NLP approach for generating scientific knowledge graphs in the computer science domain. Knowl.-Based Syst. 258, 109945 (2022)
Drummond, L., Girard, R.: A survey of ontology learning procedures. In: Proceedings of the 3rd Workshop on Ontologies and their Applications (2008)
Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Document spanners: a formal approach to information extraction. J. ACM 62(2), 12:1–12:51 (2015)
Farokhnejad, M., Pranesh, R.R., Vargas-Solar, G., Mehr, D.A.: S_covid: an engine to explore COVID-19 scientific literature. In: Proceedings of the 24th International Conference on Extending Database Technology (EDBT), Nicosia, Cyprus, pp. 23–26 (2021)
Faure, D., Nédellec, C.: Knowledge acquisition of predicate argument structures from technical texts using Machine Learning: the system Asium. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48775-1_22
Florenzano, F., Riveros, C., Ugarte, M., Vansummeren, S., Vrgoc, D.: Constant delay algorithms for regular document spanners. CoRR abs/1803.05277 (2018)
Grabar, N., Claveau, V., Dalloux, C.: CAS: French corpus with clinical cases. In: Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, pp. 122–128. Association for Computational Linguistics, Brussels, Belgium, October 2018
Hiot, N.: Phd. thesis (in preparation)
Lefebvre, P., Moal, S.L., Azough, A., Travers, N.: NeoSGG: a scene graph generation framework for video-surveillance tasks. In: Proceedings 27th International Conference on Extending Database Technology, EDBT 2024, Paestum, Italy, March 25 - March 28, pp. 838–841. OpenProceedings.org (2024)
Lovera, F., Cardinale, Y., Buscaldi, D., Charnois, T.: A knowledge graph-based method for the geolocation of tweets. In: Workshop Proceedings of the 19th International Conference on Intelligent Environments (IE2023), pp. 53–62. IOS Press (2023)
Maekawa, S., Sasaki, Y., Fletcher, G., Onizuka, M.: Benchmarking GNNs with GenCat workbench. In: Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds.) ECML PKDD 2022. LNCS, vol. 13718, pp. 607–611. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-26422-1_40
Magnini, B., Altuna, B., Lavelli, A., Speranza, M., Zanoli, R.: The E3C project: collection and annotation of a multilingual corpus of clinical cases. In: Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020, Bologna, Italy, 1–3 March 2021. CEUR Workshop Proceedings, vol. 2769 (2020)
Mali, J., Ahvar, S., Atigui, F., Azough, A., Travers, N.: A global model-driven denormalization approach for schema migration. In: Guizzardi, R., Ralyté, J., Franch, X. (eds.) RCIS 2022. LNBIP, vol. 446, pp. 529–545. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05760-1_31
Mammar Kouadri, W., Benbernou, S., Ouziri, M., Ben Amor, I.: WSSA: weakly supervised semantic-based approach for sentiment analysis. In: Proceedings of the 34th International Conference on Scientific and Statistical Database Management, pp. 1–4 (2022)
Minard, A.L., Ligozat, A.L., Grau, B.: Apport de la syntaxe pour l’extraction de relations en domaine médical. In: TALN 2011, Montpellier, France, p. 383, June 2011
Minard, A., Roques, A., Hiot, N., Halfeld Ferrari, M., Savary, A.: DOING@DEFT: cascade de CRF pour l’annotation d’entités cliniques imbriquées (DOING@DEFT: cascade of CRF for the annotation of nested clinical entities). In: Actes de la 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes, Nancy, France, 8–19 June 2020, pp. 66–78. ATALA et AFCP (2020)
Peterfreund, L.: Grammars for document spanners. In: ICDT. LIPIcs, vol. 186, pp. 7:1–7:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)
Peterfreund, L., ten Cate, B., Fagin, R., Kimelfeld, B.: Recursive programs for document spanners. In: ICDT. LIPIcs, vol. 127, pp. 13:1–13:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)
Prevoteau, H., Djebali, S., Laiping, Z., Travers, N.: Propagation measure on circulation graphs for tourism behavior analysis. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, pp. 556–563 (2022)
Savary, A., Silvanovich, A., Minard, A., Hiot, N., Halfeld Ferrari, M.: Relation extraction from clinical cases for a knowledge graph. In: Chiusano, S., et al. (eds.) ADBIS 2022. CCIS, vol. 1652, pp. 353–365. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15743-1_33
Toledo-Alvarado, J.I., Guzman-Arenas, A., Luna, G.L.M.: Automatic building of an ontology from a corpus of text documents using data mining tools. J. Appl. Res. Technol. 10, 398–404 (2012)
Valentino, M., Ferreira, D., Thayaparan, M., Freitas, A., Ustalov, D.: Textgraphs 2022 shared task on natural language premise selection. In: Proceedings of TextGraphs-16: Graph-Based Methods for Natural Language Processing, pp. 105–113 (2022)
Vargas-Solar, G., Marrec, P., Halfeld Ferrari Alves, M.: Comparing graph data science libraries for querying and analysing datasets: towards data science queries on graphs. In: Hacid, H., et al. (eds.) ICSOC 2021. LNCS, vol. 13236, pp. 205–216. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-14135-5_16
Acknowledgements
This work was partially supported by the DOING project, a regional project funded by the council of the Centre Val de Loire Region (APR-IA).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Halfeld-Ferrari, M., Minard, AL., Vargas-Solar, G. (2025). Transforming Text Into Knowledge with Graphs: Report of the GDR MADICS DOING Action. In: Tekli, J., et al. New Trends in Database and Information Systems. ADBIS 2024. Communications in Computer and Information Science, vol 2186. Springer, Cham. https://doi.org/10.1007/978-3-031-70421-5_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-70421-5_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70420-8
Online ISBN: 978-3-031-70421-5
eBook Packages: Computer ScienceComputer Science (R0)