Skip to main content

Transforming Text Into Knowledge with Graphs: Report of the GDR MADICS DOING Action

  • Conference paper
  • First Online:
New Trends in Database and Information Systems (ADBIS 2024)

Abstract

This paper provides an overview on graph databases for the retrieval and the integration of knowledge originating from textual data, attempting to bring together different bricks that are usually addressed separately. It explores concepts and insights that result from the scientific activities promoted by the GDR MADICS DOING Action (Intelligent Data: turning information into knowledge). The action promoted scientific discussion on the challenges, current findings, and open issues in converting textual data into information and, ultimately, knowledge. This topic has been investigated within a multidisciplinary context, involving specialists in Databases (DB), Natural Language Processing (NLP), Artificial Intelligence (AI), and professionals in various application domains.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    DOING is a coordination action funded by the network MADICS of the French Council of Scientific Research - CNRS http://www.madics.fr/actions/doing/. Created as a regional initiative in 2019, DOING extended its scope to a national level within the GDR MADICS in 2020 before attaining official status as an action.

  2. 2.

    The impact of the DOING coordination action goes beyond national boundaries, inspiring international initiatives: (i) A regional project, APR-IA, supported by the Centre Val de Loire region (2021–2024). (ii) The international workshop DOING@ADBIS, marking its 5th edition this year, underscores the far-reaching influence of DOING.

  3. 3.

    https://cambridge-intelligence.com/graph-data-modeling-101/.

  4. 4.

    DOING Webinar: Language-aware indexing for conjunctive path queries, George Fletcher, Eindhoven University of Technology, Netherlands, Mars, 2021.

  5. 5.

    DOING Webinar: Evaluating navigational queries over graphs, Domagoj Vrogoc, Pontificia Universidad Católica de Chile, Chili, July 2021.

  6. 6.

    MADICS Symposium, DOING workshop. On July 2022: Aperçu général des langages de requêtes pour graphes àpropriétés by Victor Marsault. On May 2023: A Researcher’s Digest of GQL by Liat Peterfreund.

  7. 7.

    DOING Webinar: Natural language processing for epidemiology & public health, Aurélie Névéol, CNRS, LISN, France, 5th July, 2021.

  8. 8.

    MADICS Symposium, DOING workshop: Tools for processing clinical reports in health data warehouses, Perceval Wajsbürt, Assistance de Paris - Hôpitaux de Paris (AP-HP), France, 25th June, 2023.

  9. 9.

    DOING Webinar: Building Scientific Knowledge Graphs from Scholarly Data, Davide Buscaldi, LIPN, Université Sorbonne Paris Nord, France, 17th May, 2021.

  10. 10.

    https://www.esilv.fr.

  11. 11.

    DOING Webinars: Managing data quality in the age of big data, Salima Benbernou, Université Paris Descartes, LIPADE, France, 5th June 2020; Building Scientific Knowledge Graphs from Scholarly Data, Davide Buscaldi, LIPN, Université Sorbonne Paris Nord, France, 17th May, 2021.

  12. 12.

    DOING Webinar: From Deep Learning to Deep Semantics, Andre Freitas, University of Manchester, UK, 8th July 2020.

  13. 13.

    DOING Panel, Representing content and extracting knowledge from texts: automatic language learning, graph management systems, machine learning for graph analysis and semantic web approaches, Symposium MADICS, Lyon, 2022.

  14. 14.

    French geological survey https://www.brgm.fr/en.

References

  1. Amarilli, A., Bourhis, P., Mengel, S., Niewerth, M.: Constant-delay enumeration for nondeterministic document spanners. In: ICDT. LIPIcs, vol. 127, pp. 22:1–22:19. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

    Google Scholar 

  2. Balalau, O., et al.: Statistical claim checking: statcheck in action. In: Hasan, M.A., Xiong, L. (eds.) Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022, pp. 4798–4802. ACM (2022)

    Google Scholar 

  3. Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction from the web. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, pp. 2670–2676, IJCAI 2007. Morgan Kaufmann Publishers Inc., San Francisco (2007)

    Google Scholar 

  4. Bonifati, A., Fletcher, G., Voigt, H., Yakovets, N., Jagadish, H.: Querying Graphs, vol. 10. Springer, Cham (2018)

    Google Scholar 

  5. Buitelaar, P., Olejnik, D., Sintek, M.: A Protégé plug-in for ontology extraction from text based on linguistic analysis. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 31–44. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-25956-5_3

    Chapter  Google Scholar 

  6. Cimiano, P., Hotho, A., Staab, S.: Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Intell. Res. 24, 305–339 (2005)

    Article  Google Scholar 

  7. Conte, D.: Graphs in pattern recognition: successes, shortcomings, and perspectives. J. Electron. Imaging 32(2), 020701–020701 (2023)

    Article  Google Scholar 

  8. Coste, L., Helmers, F., Kheddouci, H., Le Nestour, L., Niazi, M., Vargas-Solar, G.: Strategies for creating knowledge graphs to depict a multi-perspective queer communities representation. In: Workshops of the EDBT/ICDT 2023 Joint Conference, vol. 3379 (2023)

    Google Scholar 

  9. Dessí, D., Osborne, F., Reforgiato Recupero, D., Buscaldi, D., Motta, E.: SCICERO: a deep learning and NLP approach for generating scientific knowledge graphs in the computer science domain. Knowl.-Based Syst. 258, 109945 (2022)

    Article  Google Scholar 

  10. Drummond, L., Girard, R.: A survey of ontology learning procedures. In: Proceedings of the 3rd Workshop on Ontologies and their Applications (2008)

    Google Scholar 

  11. Fagin, R., Kimelfeld, B., Reiss, F., Vansummeren, S.: Document spanners: a formal approach to information extraction. J. ACM 62(2), 12:1–12:51 (2015)

    Google Scholar 

  12. Farokhnejad, M., Pranesh, R.R., Vargas-Solar, G., Mehr, D.A.: S_covid: an engine to explore COVID-19 scientific literature. In: Proceedings of the 24th International Conference on Extending Database Technology (EDBT), Nicosia, Cyprus, pp. 23–26 (2021)

    Google Scholar 

  13. Faure, D., Nédellec, C.: Knowledge acquisition of predicate argument structures from technical texts using Machine Learning: the system Asium. In: Fensel, D., Studer, R. (eds.) EKAW 1999. LNCS (LNAI), vol. 1621, pp. 329–334. Springer, Heidelberg (1999). https://doi.org/10.1007/3-540-48775-1_22

    Chapter  Google Scholar 

  14. Florenzano, F., Riveros, C., Ugarte, M., Vansummeren, S., Vrgoc, D.: Constant delay algorithms for regular document spanners. CoRR abs/1803.05277 (2018)

    Google Scholar 

  15. Grabar, N., Claveau, V., Dalloux, C.: CAS: French corpus with clinical cases. In: Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, pp. 122–128. Association for Computational Linguistics, Brussels, Belgium, October 2018

    Google Scholar 

  16. Hiot, N.: Phd. thesis (in preparation)

    Google Scholar 

  17. Lefebvre, P., Moal, S.L., Azough, A., Travers, N.: NeoSGG: a scene graph generation framework for video-surveillance tasks. In: Proceedings 27th International Conference on Extending Database Technology, EDBT 2024, Paestum, Italy, March 25 - March 28, pp. 838–841. OpenProceedings.org (2024)

    Google Scholar 

  18. Lovera, F., Cardinale, Y., Buscaldi, D., Charnois, T.: A knowledge graph-based method for the geolocation of tweets. In: Workshop Proceedings of the 19th International Conference on Intelligent Environments (IE2023), pp. 53–62. IOS Press (2023)

    Google Scholar 

  19. Maekawa, S., Sasaki, Y., Fletcher, G., Onizuka, M.: Benchmarking GNNs with GenCat workbench. In: Amini, M.R., Canu, S., Fischer, A., Guns, T., Kralj Novak, P., Tsoumakas, G. (eds.) ECML PKDD 2022. LNCS, vol. 13718, pp. 607–611. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-26422-1_40

    Chapter  Google Scholar 

  20. Magnini, B., Altuna, B., Lavelli, A., Speranza, M., Zanoli, R.: The E3C project: collection and annotation of a multilingual corpus of clinical cases. In: Proceedings of the Seventh Italian Conference on Computational Linguistics, CLiC-it 2020, Bologna, Italy, 1–3 March 2021. CEUR Workshop Proceedings, vol. 2769 (2020)

    Google Scholar 

  21. Mali, J., Ahvar, S., Atigui, F., Azough, A., Travers, N.: A global model-driven denormalization approach for schema migration. In: Guizzardi, R., Ralyté, J., Franch, X. (eds.) RCIS 2022. LNBIP, vol. 446, pp. 529–545. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-05760-1_31

    Chapter  Google Scholar 

  22. Mammar Kouadri, W., Benbernou, S., Ouziri, M., Ben Amor, I.: WSSA: weakly supervised semantic-based approach for sentiment analysis. In: Proceedings of the 34th International Conference on Scientific and Statistical Database Management, pp. 1–4 (2022)

    Google Scholar 

  23. Minard, A.L., Ligozat, A.L., Grau, B.: Apport de la syntaxe pour l’extraction de relations en domaine médical. In: TALN 2011, Montpellier, France, p. 383, June 2011

    Google Scholar 

  24. Minard, A., Roques, A., Hiot, N., Halfeld Ferrari, M., Savary, A.: DOING@DEFT: cascade de CRF pour l’annotation d’entités cliniques imbriquées (DOING@DEFT: cascade of CRF for the annotation of nested clinical entities). In: Actes de la 6e conférence conjointe Journées d’Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Atelier DÉfi Fouille de Textes, Nancy, France, 8–19 June 2020, pp. 66–78. ATALA et AFCP (2020)

    Google Scholar 

  25. Peterfreund, L.: Grammars for document spanners. In: ICDT. LIPIcs, vol. 186, pp. 7:1–7:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021)

    Google Scholar 

  26. Peterfreund, L., ten Cate, B., Fagin, R., Kimelfeld, B.: Recursive programs for document spanners. In: ICDT. LIPIcs, vol. 127, pp. 13:1–13:18. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2019)

    Google Scholar 

  27. Prevoteau, H., Djebali, S., Laiping, Z., Travers, N.: Propagation measure on circulation graphs for tourism behavior analysis. In: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing, pp. 556–563 (2022)

    Google Scholar 

  28. Savary, A., Silvanovich, A., Minard, A., Hiot, N., Halfeld Ferrari, M.: Relation extraction from clinical cases for a knowledge graph. In: Chiusano, S., et al. (eds.) ADBIS 2022. CCIS, vol. 1652, pp. 353–365. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-15743-1_33

    Chapter  Google Scholar 

  29. Toledo-Alvarado, J.I., Guzman-Arenas, A., Luna, G.L.M.: Automatic building of an ontology from a corpus of text documents using data mining tools. J. Appl. Res. Technol. 10, 398–404 (2012)

    Article  Google Scholar 

  30. Valentino, M., Ferreira, D., Thayaparan, M., Freitas, A., Ustalov, D.: Textgraphs 2022 shared task on natural language premise selection. In: Proceedings of TextGraphs-16: Graph-Based Methods for Natural Language Processing, pp. 105–113 (2022)

    Google Scholar 

  31. Vargas-Solar, G., Marrec, P., Halfeld Ferrari Alves, M.: Comparing graph data science libraries for querying and analysing datasets: towards data science queries on graphs. In: Hacid, H., et al. (eds.) ICSOC 2021. LNCS, vol. 13236, pp. 205–216. Springer, Cham (2021). https://doi.org/10.1007/978-3-031-14135-5_16

    Chapter  Google Scholar 

Download references

Acknowledgements

This work was partially supported by the DOING project, a regional project funded by the council of the Centre Val de Loire Region (APR-IA).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mirian Halfeld-Ferrari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Halfeld-Ferrari, M., Minard, AL., Vargas-Solar, G. (2025). Transforming Text Into Knowledge with Graphs: Report of the GDR MADICS DOING Action. In: Tekli, J., et al. New Trends in Database and Information Systems. ADBIS 2024. Communications in Computer and Information Science, vol 2186. Springer, Cham. https://doi.org/10.1007/978-3-031-70421-5_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70421-5_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70420-8

  • Online ISBN: 978-3-031-70421-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics