Abstract
RDF is a data representation format for schema-free structured information that is gaining speed in the context of semantic web, life science, and vice versa. With the continuing proliferation of structured data, demand for RDF compression is becoming increasingly important. In this study, we introduce a novel lossless compression technique for RDF datasets (triples), called PIC (Predicate Invention based Compression). By generating informative predicates and constructing effective mapping to original predicates, PIC only needs to store dramatically reduced number of triples with the newly created predicates, and restoring the original triples efficiently using the mapping. These predicates are automatically generated by a decomposable forward-backward procedure, which consequently supports very fast parallel bit computation. As a semantic compression method for structured data, besides the reduction of syntactic verbosity and data redundancy, we also invoke semantics in the RDF datasets. Experiments on various datasets show competitive results in terms of compression ratio.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
References
Álvarez García, S., Brisaboa, N.R., Fernández, J.D., Martínez-Prieto, M.A.: Compressed k2-triples for full-in-memory RDF engines. In: AMCIS (2011)
Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: a nucleus for a web of open data. In: Proceedings of the ISWC 2007/ASWC 2007, pp. 722–735 (2007)
Fernández, J.D., Gutiérrez, C., Martínez-Prieto, M.A.: RDF compression: basic approaches. In: Proceedings of the WWW 2010, pp. 1091–1092 (2010)
Hammoud, M., Rabbou, D.A., Nouri, R., Beheshti, S., Sakr, S.: DREAM: distributed RDF engine with adaptive query planner and minimal communication. In: Proceedings of the VLDB 2015, pp. 654–665 (2015)
Iannone, L., Palmisano, I., Redavid, D.: Optimizing RDF storage removing redundancies: an algorithm. In: Ali, M., Esposito, F. (eds.) IEA/AIE 2005. LNCS (LNAI), vol. 3533, pp. 732–742. Springer, Heidelberg (2005). https://doi.org/10.1007/11504894_101
Joshi, A.K., Hitzler, P., Dong, G.: Logical linked data compression. In: Cimiano, P., Corcho, O., Presutti, V., Hollink, L., Rudolph, S. (eds.) ESWC 2013. LNCS, vol. 7882, pp. 170–184. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38288-8_12
Neumann, T., Weikum, G.: RDF-3X: a RISC-style engine for RDF. Proc. VLDB Endow. 1(1), 647–659 (2008)
Pan, J.Z., Pérez, J.M.G., Ren, Y., Wu, H., Wang, H., Zhu, M.: Graph pattern based RDF data compression. In: Supnithi, T., Yamaguchi, T., Pan, J.Z., Wuwongse, V., Buranarach, M. (eds.) JIST 2014. LNCS, vol. 8943, pp. 239–256. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-15615-6_18
Vrandečić, D.: Wikidata: a new platform for collaborative data collection. In: Proceedings of the WWW 2012, pp. 1063–1064 (2012)
Wu, W., Li, H., Wang, H., Zhu, K.Q.: Probase: a probabilistic taxonomy for text understanding. In: Proceedings of the SIGMOD 2012, pp. 481–492 (2012)
Yuan, P., Liu, P., Wu, B., Jin, H., Zhang, W., Liu, L.: TripleBit: a fast and compact system for large scale RDF data. PVLDB 6(7), 517–528 (2013)
Acknowledgement
This work is partially funded by the National Science Foundation of China under grant 61602260 and 61702279.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhu, M., Wu, W., Pan, J.Z., Han, J., Huang, P., Liu, Q. (2018). Predicate Invention Based RDF Data Compression. In: Ichise, R., Lecue, F., Kawamura, T., Zhao, D., Muggleton, S., Kozaki, K. (eds) Semantic Technology. JIST 2018. Lecture Notes in Computer Science(), vol 11341. Springer, Cham. https://doi.org/10.1007/978-3-030-04284-4_11
Download citation
DOI: https://doi.org/10.1007/978-3-030-04284-4_11
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04283-7
Online ISBN: 978-3-030-04284-4
eBook Packages: Computer ScienceComputer Science (R0)