Abstract
This paper presents SparkKG–ML, the first open–source library for Machine Learning at scale over semantic data stored in Knowledge Graphs directly in Python. SparkKG–ML serves as a bridge between (i) the Semantic Web data model, (ii) the distributed computing capabilities of Apache Spark, and (iii) the Python ecosystem. By harnessing the flexibility of Python and the scalability of Spark, SparkKG–ML reduces the barriers for Data Scientists and Machine Learning researchers to work with semantic data, and for Semantic Web experts to develop Machine Learning models.
Resource Type: Software
Repository: https://github.com/IDIASLab/SparkKG-ML
License: Apache License 2.0
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
- 10.
- 11.
- 12.
- 13.
- 14.
- 15.
- 16.
- 17.
- 18.
- 19.
The number of downloads has been computed using a Google BigQuery as described in https://packaging.python.org/en/latest/guides/analyzing-pypi-package-downloads/.
- 20.
- 21.
- 22.
References
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Belcao, M., Falzone, E., Bionda, E., Valle, E.D.: Chimera: a bridge between big data analytics and semantic technologies. In: Hotho, A., et al. (eds.) ISWC 2021. LNCS, vol. 12922, pp. 463–479. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88361-4_27
Bucher, T.-C., Jiang, X., Meyer, O., Waitz, S., Hertling, S., Paulheim, H.: scikit-learn pipelines meet knowledge graphs. In: Verborgh, R., et al. (eds.) ESWC 2021. LNCS, vol. 12739, pp. 9–14. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-80418-3_2
Chelmis, C., Gergin, B.: A knowledge graph for semantic-driven healthiness evaluation of recipes. Semant. Web J. (2021). https://www.semantic-web-journal.net/content/knowledge-graph-semantic-driven-healthiness-evaluation-online-recipes
Chelmis, C., Gergin, B.: A Knowledge graph for semantic-driven healthiness evaluation of online recipes. (2022). https://doi.org/10.7910/DVN/99PNJ5
Dai, J.J., et al.: BigDL: a distributed deep learning framework for big data. In: Proceedings of the ACM Symposium on Cloud Computing, pp. 50–60. SoCC ’19, Association for Computing Machinery, New York, NY, USA (2019). https://doi.org/10.1145/3357223.3362707
Draschner, C.F., Stadler, C., Bakhshandegan Moghaddam, F., Lehmann, J., Jabeen, H.: DistRDF2ML - scalable distributed in-memory machine learning pipelines for RDF knowledge graphs. In: Proceedings of the 30th ACM International Conference on Information and Knowledge Management, pp. 4465–4474. CIKM ’21, Association for Computing Machinery, New York, NY, USA (2021). https://doi.org/10.1145/3459637.3481999
Fafalios, P., Iosifidis, V., Ntoutsi, E., Dietze, S.: TweetsKB: a public and large-scale RDF corpus of annotated tweets. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 177–190. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_12
Hassanzadeh, O., Consens, M.P.: Linked movie data base. In: LDOW (2009). https://api.semanticscholar.org/CorpusID:16810971
Lehmann, J., et al.: Distributed semantic analytics using the SANSA stack. In: International Workshop on the Semantic Web (2017)
Meng, X., et al.: MLlib: machine learning in apache spark. J. Mach. Learn. Res. 17(1), 1235–1241 (2016)
N, T.R., Gupta, R.: Feature selection techniques and its importance in machine learning: a survey. In: 2020 IEEE International Students’ Conference on Electrical,Electronics and Computer Science (SCEECS), pp. 1–6 (2020). https://doi.org/10.1109/SCEECS48394.2020.189
Paulheim, H.: Machine learning with and for semantic web knowledge graphs. In: Reasoning Web (2018)
Steenwinckel, B., Vandewiele, G., Agozzino, T., Ongenae, F.: pyRDF2Vec: a python implementation and extension of RDF2Vec. In: Pesquita, C., et al. (eds.) The Semantic Web. LNCS, pp. 471–483. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-33455-9_28
Svetashova, Y.: Ontology-enhanced machine learning: a Bosch use case of welding quality monitoring. In: Pan, J.Z., et al. (eds.) ISWC 2020. LNCS, vol. 12507, pp. 531–550. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-62466-8_33
Tian, L., Zhou, X., Wu, Y.P., Zhou, W.T., Zhang, J.H., Zhang, T.S.: Knowledge graph and knowledge reasoning: a systematic review. J. Electron. Sci. Technol. 20(2), 100159 (2022). https://doi.org/10.1016/j.jnlest.2022.100159
Tiddi, I., Schlobach, S.: Knowledge graphs as tools for explainable machine learning: a survey. Artif. Intell. 302, 103627 (2022). https://doi.org/10.1016/j.artint.2021.103627
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014). https://doi.org/10.1145/2629489
Zaharia, M., et al.: Apache spark: a unified engine for big data processing. Commun. ACM 59(11), 56–65 (2016). https://doi.org/10.1145/2934664
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gergin, B., Chelmis, C. (2025). SparkKG-ML: A Library to Facilitate End–to–End Large–Scale Machine Learning Over Knowledge Graphs in Python. In: Demartini, G., et al. The Semantic Web – ISWC 2024. ISWC 2024. Lecture Notes in Computer Science, vol 15233. Springer, Cham. https://doi.org/10.1007/978-3-031-77847-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-77847-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-77846-9
Online ISBN: 978-3-031-77847-6
eBook Packages: Computer ScienceComputer Science (R0)