Enhancing Source Selection for Live Queries over Linked Data via Query Log Mining

Tian, Yuan; Umbrich, Jürgen; Yu, Yong

doi:10.1007/978-3-642-29923-0_12

Yuan Tian²³,
Jürgen Umbrich²⁴ &
Yong Yu²³

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7185))

Included in the following conference series:

Joint International Semantic Technology Conference

962 Accesses

Abstract

Traditionally, Linked Data query engines execute SPARQL queries over a materialised repository which on the one hand, guarantees fast query answering but on the other hand requires time and resource consuming preprocessing steps. In addition, the materialised repositories have to deal with the ongoing challenge of maintaining the index which is – given the size of the Web – practically unfeasible. Thus, the results for a given SPARQL query are potentially out-dated. Recent approaches address the result freshness problem by answering a given query directly over dereferenced query relevant Web documents. Our work investigate the problem of an efficient selection of query relevant sources under this context. As a part of query optimization, source selection tries to estimate the minimum number of sources accessed in order to answer a query. We propose to summarize and index sources based on frequently appearing query graph patterns mined from query logs. We verify the applicability of our approach and empirically show that our approach significantly reduces the number of relevant sources estimated while keeping the overhead low.

The work presented in this paper has been funded in part by Science Foundation Ireland under Grant No. SFI/08/CE/I1380 (Lion-2).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Saving Knowledge About Sources: An Efficient Method for Querying Distributed Data

Challenges of Source Selection in the WoD

An analytical study of large SPARQL query logs

Article 02 August 2019

References

Berners-Lee, T.: Linked Data - Design Issues, http://www.w3.org/DesignIssues/LinkedData.html
Cyganiak, R., Harth, A., Hogan, A.: N-Quads: Extending N-Triples with Context (2009), http://sw.deri.org/2008/07/n-quads/
Deo, N., Micikevicius, P.: A new encoding for labeled trees employing a stack and a queue. Bulletin of the Institute of Combinatorics and its (2002)
Google Scholar
Haase, P., Mathaß, T., Ziller, M.: An evaluation of approaches to federated query processing over linked data. In: Proceedings of the 6th International Conference on Semantic Systems, pp. 1–9. ACM (2010)
Google Scholar
Harth, A., Hose, K., Karnstedt, M., Polleres, A., Sattler, K., Umbrich, J.: Data summaries for on-demand queries over linked data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 411–420. ACM, New York (2010)
Chapter Google Scholar
Hartig, O.: Zero-Knowledge Query Planning for an Iterator Implementation of Link Traversal Based Query Execution. In: Antoniou, G., Grobelnik, M., Simperl, E., Parsia, B., Plexousakis, D., De Leenheer, P., Pan, J. (eds.) ESWC 2011, Part I. LNCS, vol. 6643, pp. 154–169. Springer, Heidelberg (2011)
Chapter Google Scholar
Hartig, O., Bizer, C., Freytag, J.: Executing SPARQL Queries over the Web of Linked Data. In: Bernstein, A., Karger, D.R., Heath, T., Feigenbaum, L., Maynard, D., Motta, E., Thirunarayan, K. (eds.) ISWC 2009. LNCS, vol. 5823, pp. 293–309. Springer, Heidelberg (2009)
Chapter Google Scholar
Hartig, O., Huber, F.: A main memory index structure to query linked data. In: Proc. of the 4th Int. Linked Data on the Web (2011)
Google Scholar
Isele, R., Umbrich, J., Bizer, C.: Ldspider: An open-source crawling framework for the web of linked data. In: Internaitional Semantic Web Conference 2010, pp. 6–9 (2010)
Google Scholar
Ladwig, G., Tran, T.: Linked Data Query Processing Strategies. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 453–469. Springer, Heidelberg (2010)
Chapter Google Scholar
Lubiw, A.: Some NP-complete problems similar to graph isomorphism. SIAM Journal on Computing (1981)
Google Scholar
Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Conference on Very Large Data Bases (2002)
Google Scholar
Manola, F., Miller, E.: RDF Primer, http://www.w3.org/TR/rdf-syntax/
Neville, E.: The codifying of tree-structure. Proceedings of Cambridge Philosophical, 381–385 (November 1953)
Google Scholar
Ng, W., Dash, M.: Discovery of Frequent Patterns in Transactional Data Streams. Transaction on Large-Scale Data-and Knowledge-Centered Systems, 1–30 (2010)
Google Scholar
Prud’hommeaux, E., Seaborne, A.: SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/
Prüfer, H.: Neuer beweis eines satzes über permutationen. Archiv für Mathematik und Physik (1918)
Google Scholar
Umbrich, J., Hausenblas, M., Hogan, A., Polleres, A., Decker, S.: Towards dataset dynamics: Change frequency of linked open data sources. In: 3rd International Workshop on Linked Data on the Web (LDOW 2010), in Conjunction with 19th International World Wide Web Conference, CEUR (2010)
Google Scholar
Yan, X., Han, J.: gSpan: Graph-based substructure pattern mining. Order a Journal on the Theory of Ordered Sets and its Applications (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Shanghai Jiao Tong University, 800 Dongchuan Rd., Shanghai, China
Yuan Tian & Yong Yu
Digital Enterprise Research Institute, National University of Ireland, Galway, Ireland
Jürgen Umbrich

Authors

Yuan Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Umbrich
View author publications
You can also search for this author in PubMed Google Scholar
Yong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing Science, University of Aberdeen, AB24 3UE, Aberdeen, UK
Jeff Z. Pan
College of Computer Science, Zhejiang University, 310027, Hangzhou, China
Huajun Chen & Zhaohui Wu &
BIKE Lab, Seoul National University, Yeongun-Dong, Jongro-Gu, 110-749, Seoul, Korea
Hong-Gee Kim
Department of Computer Science and Technology, Tsinghua University, 100084, Beijing, China
Juanzi Li
Oracle Corporation, 500 Oracle Parkway, 94065, Redwood Shores, CA, USA
Zhe Wu
Department of Computer Science, University of Oxford, Wolfson Building, Parks Road, OX1 3QD, Oxford, UK
Ian Horrocks
Institute of Scientific and Industrial Research (ISIR), Osaka University, 8-1 Mihogaoka, 567-0047, Ibaraki, Osaka, Japan
Riichiro Mizoguchi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tian, Y., Umbrich, J., Yu, Y. (2012). Enhancing Source Selection for Live Queries over Linked Data via Query Log Mining. In: Pan, J.Z., et al. The Semantic Web. JIST 2011. Lecture Notes in Computer Science, vol 7185. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-29923-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-642-29923-0_12
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-29922-3
Online ISBN: 978-3-642-29923-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics