Efficient SPARQL Query Evaluation via Automatic Data Partitioning

Yang, Tao; Chen, Jinchuan; Wang, Xiaoyan; Chen, Yueguo; Du, Xiaoyong

doi:10.1007/978-3-642-37450-0_18

Tao Yang²¹,
Jinchuan Chen²²,
Xiaoyan Wang²¹,
Yueguo Chen²² &
…
Xiaoyong Du²¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7826))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1830 Accesses
9 Citations

Abstract

The volume of RDF data increases very fast within the last five years, e.g. the Linked Open Data cloud grows from 2 billions to 50 billions of RDF triples. With its wonderful scalability, cloud computing platform like Hadoop is a good choice for processing queries over large data sets. Previous works on evaluating SPARQL queries with Hadoop mainly focus on reducing the number of joins through careful split of HDFS files and algorithms for generating Map/Reduce jobs. However, the way of partitioning RDF data could also affect the performance. Specifically, a good partitioning will greatly reduce or even totally avoid cross-node joins and significantly reduce the cost of query evaluation. Based on HadoopDB, this work processes SPARQL queries in a hybrid architecture where Map/Reduce takes charge of the computing tasks and an RDF query engine, RDF-3X, stores the data and evaluates join operations over local data. Based on analysis of query work-loads, we propose a novel algorithm for automatically partitioning RDF data. We also present an approximate solution to physically place the partitions in order to reduce data redundancy. All the proposed approaches are evaluated by extensive experiments over large RDF data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

RDF partitioning for scalable SPARQL query processing

Article 13 August 2015

S3QLRDF: distributed SPARQL query processing using Apache Spark—a comparative performance study

Article 24 January 2023

RDF Data Partitioning for Efficient SPARQL Query Processing with Spark SQL

References

Btc 2010 (2010), http://www.hpi.uni-potsdam.de/naumann/sites/btc2010
Metis, http://glaros.dtc.umn.edu/gkhome/views/metis/index.html/
Abouzeid, A., Bajda-Pawlikowski, K., Abadi, D., Silberschatz, A., Rasin, A.: Hadoopdb: An architectural hybrid of mapreduce and dbms technologies for analytical workloads. PVLDB 2(1), 992–933 (2009)
Google Scholar
Agrawal, S., Narasayya, V., Yang, B.: Integrating vertical and horizontal partitioning into automated physical database design. In: SIGMOD 2004, pp. 359–370 (2004)
Google Scholar
Andreev, K., Räcke, H.: Balanced graph partitioning. In: SPAA, pp. 120–124 (2004)
Google Scholar
Chang, C., Kurç, T.M., Sussman, A., Çatalyürek, Ü.V., Saltz, J.H.: A hypergraph-based workload partitioning strategy for parallel data aggregation. In: PPSC (2001)
Google Scholar
Du, F., Chen, Y., Du, X.: Partitioned indexes for entity search over rdf knowledge bases. In: Lee, S.-g., Peng, Z., Zhou, X., Moon, Y.-S., Unland, R., Yoo, J. (eds.) DASFAA 2012, Part I. LNCS, vol. 7238, pp. 141–155. Springer, Heidelberg (2012)
Chapter Google Scholar
Guo, Y., Pan, Z., Heflin, J.: Lubm: A benchmark for owl knowledge base systems. Web Semantics: Science, Services and Agents on the World Wide Web 3(2-3), 158–182 (2005)
Article Google Scholar
Huang, J., Ren, D.J.K.: Scalable sparql querying of large rdf graphs. PVLDB 4(11), 1123–1134 (2011)
Google Scholar
Husain, M., McGlothlin, J., Masud, M.M., Khan, L., Thuraisingham, B.: Heuristics based query processing for large rdf graphs using cloud computing. IEEE TKDE 23(9), 1312–1327 (2011)
Google Scholar
Kim, H., Ravindra, P., Anyanwu, K.: Scan-sharing for optimizing rdf graph pattern matching on mapreduce. In: IEEE CLOUD, pp. 139–146 (2012)
Google Scholar
Myung, J., Yeon, J., Lee, S.-G.: Sparql basic graph pattern processing with iterative mapreduce. In: Proc. of the 2010 Workshop on Massive Data Analytics on the Cloud, MDAC 2010, pp. 6:1–6:6 (2010)
Google Scholar
Neumann, T., Weikum, G.: Rdf-3x: a risc-style engine for rdf. PVLDB 1(1), 647–659 (2008)
Google Scholar
Pavlo, A., Curino, V., Zdonik, S.: Skew-aware automatic database partitioning in shared-nothing, parallel oltp systems. In: SIGMOD 2012, pp. 61–72 (2012)
Google Scholar
Rao, J., Zhang, C., Megiddo, N., Lohman, G.: Automating physical database design in a parallel database. In: SIGMOD 2002, pp. 558–569 (2002)
Google Scholar
Sanghavi, S., Shah, D., Willsky, A.S.: Message passing for maximum weight independent set. IEEE Trans. on Information Theory 55(11), 4822–4834 (2009)
Article MathSciNet Google Scholar
Wilkinson, K., Sayers, C., Kuno, H.A., Reynolds, D.: Efficient RDF Storage and Retrieval in Jena2. In: ISWC 2003, pp. 131–150 (2003)
Google Scholar
Yang, T., Chen, J., Wang, X., Chen, Y., Du, X.: Efficient sparql query evaluation via automatic data partitioning, technical report (2012), http://iir.ruc.edu.cn/~jchchen/rdfpartition.pdf

Download references

Author information

Authors and Affiliations

School of Information, Renmin University of China, China
Tao Yang, Xiaoyan Wang & Xiaoyong Du
Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, MOE, China
Jinchuan Chen & Yueguo Chen

Authors

Tao Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jinchuan Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yueguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaoyong Du
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Binghamton University, 13902, Binghamton, NY, USA
Weiyi Meng
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Ling Feng
Department of Computer Science, National University of Singapore, 117417, Singapore
Stéphane Bressan
Research Group Data Analystics and Computing, University of Vienna, 1090, Vienna, Austria
Werner Winiwarter
School of Computer, Wuhan University, 430072, Wuhan, China
Wei Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, T., Chen, J., Wang, X., Chen, Y., Du, X. (2013). Efficient SPARQL Query Evaluation via Automatic Data Partitioning. In: Meng, W., Feng, L., Bressan, S., Winiwarter, W., Song, W. (eds) Database Systems for Advanced Applications. DASFAA 2013. Lecture Notes in Computer Science, vol 7826. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-37450-0_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-37450-0_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-37449-4
Online ISBN: 978-3-642-37450-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics