Abstract
Large knowledge graphs like DBpedia and YAGO are always based on the same source, i.e., Wikipedia. But there are more wikis that contain information about long-tail entities such as wiki hosting platforms like Fandom. In this paper, we present the approach and analysis of DBkWik++, a fused Knowledge Graph from thousands of wikis. A modified version of the DBpedia framework is applied to each wiki which results in many isolated Knowledge Graphs. With an incremental merge based approach, we reuse one-to-one matching systems to solve the multi source KG matching task. Based on this alignment we create a consolidated knowledge graph with more than 15 million instances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
a template in MediaWiki which usually contains the text infobox to visualize important information at the top right corner of a page.
- 8.
- 9.
- 10.
References
Alshammari, G., Jorro-Aragoneses, J.L., Kapetanakis, S., Petridis, M., Recio-García, J.A., Díaz-Agudo, B.: A hybrid CBR approach for the long tail problem in recommender systems. In: Aha, D.W., Lieber, J. (eds.) ICCBR 2017. LNCS (LNAI), vol. 10339, pp. 35–45. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-61030-6_3
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Aberer, K., Choi, K.-S., Noy, N., Allemang, D., Lee, K.-I., Nixon, L., Golbeck, J., Mika, P., Maynard, D., Mizoguchi, R., Schreiber, G., Cudré-Mauroux, P. (eds.) ASWC/ISWC -2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007). https://doi.org/10.1007/978-3-540-76298-0_52
Beek, W., Raad, J., Wielemaker, J., van Harmelen, F.: sameAs.cc: the closure of 500M owl:sameAs Statements. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 65–80. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_5
Chu, C.X., Razniewski, S., Weikum, G.: Tifi: taxonomy induction for fictional domains. In: The World Wide Web Conference (2019)
Defays, D.: An efficient algorithm for a complete link method. Comput. J. 20(4), 364–366 (1977)
Dohrn, H., Riehle, D.: Design and implementation of the sweble wikitext parser: unlocking the structured data of wikipedia. In: Proceedings of the 7th International Symposium on Wikis and Open Collaboration, pp. 72–81 (2011)
Fabian, M., Gjergji, K., Gerhard, W., et al.: Yago: a core of semantic knowledge unifying wordnet and wikipedia. In: 16th International World Wide Web Conference, WWW (2007)
Faria, D., Lima, B., Silva, M.C., Couto, F.M., Pesquita, C.: AML and AMLC results for OAEI 2021. In: Ontology Matching Workshop at ISWC, vol. 2536 (2021)
Fensel, D., et al.: How to Build a Knowledge Graph, pp. 11–68. Springer (2020)
Heist, N., Hertling, S., Ringler, D., Paulheim, H.: Knowledge graphs on the web - an overview. In: Knowledge Graphs for Explainable Artificial Intelligence. IOS Press (2020)
Heist, N., Paulheim, H.: Uncovering the semantics of wikipedia categories. In: Ghidini, C., et al. (eds.) ISWC 2019. LNCS, vol. 11778, pp. 219–236. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30793-6_13
Hertling, S., Paulheim, H.: DBkWik: extracting and integrating knowledge from thousands of Wikis. Knowl. Inf. Syst. 62(6), 2169–2190 (2019). https://doi.org/10.1007/s10115-019-01415-5
Hertling, S., Paulheim, H.: The knowledge graph track at OAEI. In: Harth, A., et al. (eds.) ESWC 2020. LNCS, vol. 12123, pp. 343–359. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-49461-2_20
Hertling, S., Paulheim, H.: Order matters: matching multiple knowledge graphs. In: K-CAP 2021: Knowledge Capture Conference, Virtual Event, USA, 2–3 December 2021, pp. 113–120 (2021)
Hertling, S., Paulheim, H.: DBkWik Plus Plus (2022). https://doi.org/10.6084/m9.figshare.20407864.v1, https://figshare.com/articles/dataset/DBkWik_Plus_Plus/20407864
Köpcke, H., Rahm, E.: Frameworks for entity matching: a comparison. Data & Knowl. Eng. 69(2), 197–210 (2010)
Lehmberg, O., Bizer, C., Brinkmann, A.: Winte.r - a web data integration framework. In: ISWC 2017 Posters & Demonstrations (2017)
Lenat, D.B.: Cyc: a large-scale investment in knowledge infrastructure. ACM Commun. 38(11), 33–38 (1995)
Li, H.: Smile (2014). https://haifengl.github.io
Meilicke, C., Stuckenschmidt, H.: Analyzing mapping extraction approaches. In: OM (2007)
Miller, G.A.: Wordnet: a lexical database for English. Commun. ACM 38(11), 39–41 (1995)
Mitchell, T., et al.: Never-ending learning. In: AAAI (2015)
Mudgal, S., et al.: Deep learning for entity matching: a design space exploration. In: SIGMOD Conference 2018, pp. 19–34 (2018)
Navigli, R., Ponzetto, S.P.: Babelnet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artif. Intell. 193, 217–250 (2012)
Noia, T.D., Ostuni, V.C., Tomeo, P., Sciascio, E.D.: Sprank: semantic path-based ranking for top-n recommendations using linked open data. ACM Trans. Intell. Syst. Technol. (TIST) 8(1), 1–34 (2016)
Portisch, J., Paulheim, H.: Alod2vec matcher results for OAEI 2021. In: CEUR Workshop Proceedings (2022)
Pour, M.A.N., et al.: Results of the ontology alignment evaluation initiative 2021. In: Ontology Matching Workshop at ISWC, vol. 3063, pp. 62–108 (2021)
Primpeli, A., Bizer, C.: Graph-Boosted active learning for multi-source entity resolution. In: Hotho, A., et al. (eds.) ISWC 2021. LNCS, vol. 12922, pp. 182–199. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-88361-4_11
Saeedi, A., David, L., Rahm, E.: Matching entities from multiple sources with hierarchical agglomerative clustering. In: KEOD, pp. 40–50 (2021)
Saeedi, A., Peukert, E., Rahm, E.: Using link features for entity clustering in knowledge graphs. In: Gangemi, A., et al. (eds.) ESWC 2018. LNCS, vol. 10843, pp. 576–592. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-93417-4_37
Schubert, E., Koos, A., Emrich, T., Züfle, A., Schmid, K.A., Zimek, A.: A framework for clustering uncertain data. Proc, VLDB Endow (2015)
Singh, R., et al.: Generating concise entity matching rules. In: SIGMOD Conference 2017, pp. 1635–1638 (2017)
Thor, A., Rahm, E.: MOMA - A mapping-based object matching system. In: Third Biennial Conference on Innovative Data Systems Research, CIDR 2007, Asilomar, CA, USA, January 7–10, 2007, Online Proceedings (2007)
Tonon, A., Felder, V., Difallah, D.E., Cudré-Mauroux, P.: VoldemortKG: mapping schema.org and web entities to linked open data. In: Groth, P., et al. (eds.) ISWC 2016. LNCS, vol. 9982, pp. 220–228. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46547-0_23
Volz, J., Bizer, C., Gaedke, M., Kobilarov, G.: Silk - a link discovery framework for the web of data. In: WWW2009 Workshop on Linked Data on the Web (2009)
Vrandečić, D., Krötzsch, M.: Wikidata: a free collaborative knowledgebase. Commun. ACM 57(10), 78–85 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Hertling, S., Paulheim, H. (2022). DBkWik++- Multi Source Matching of Knowledge Graphs. In: Villazón-Terrazas, B., Ortiz-Rodriguez, F., Tiwari, S., Sicilia, MA., Martín-Moncunill, D. (eds) Knowledge Graphs and Semantic Web . KGSWC 2022. Communications in Computer and Information Science, vol 1686. Springer, Cham. https://doi.org/10.1007/978-3-031-21422-6_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-21422-6_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21421-9
Online ISBN: 978-3-031-21422-6
eBook Packages: Computer ScienceComputer Science (R0)