Abstract
This paper introduces probabilistic databases with unmerged duplicates (DBud), i.e., databases containing probabilistic information about instances found to describe the same real-world objects. We discuss the need for efficiently querying such databases and for supporting practical query scenarios that require analytical or summarized information. We also sketch possible methodologies and techniques that would allow performing efficient processing of queries over such probabilistic databases, and especially without the need to materialize the (potentially, huge) collection of all possible deduplication worlds.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Andritsos, P., Fuxman, A., Miller, R.: Clean answers over dirty databases: A probabilistic approach. In: ICDE (2006)
Dalvi, N., Suciu, D.: Efficient query evaluation on probabilistic databases. VLDBÂ 16(4) (2007)
Dylla, M., Miliaraki, I., Theobald, M.: Top-k query processing in probabilistic databases with non-materialized views. In: ICDE (2013)
Elmagarmid, A., Ipeirotis, P., Verykios, V.: Duplicate record detection: A survey. TKDEÂ 19(1) (2007)
Fink, R., Han, L., Olteanu, D.: Aggregation in probabilistic databases via knowledge compilation. PVLDBÂ 5(5) (2012)
Ioannou, E., Nejdl, W., Niederée, C., Velegrakis, Y.: On-the-fly entity-aware query processing in the presence of linkage. PVLDB 3(1) (2010)
Olteanu, D., Wen, H.: Ranking query answers in probabilistic databases: Complexity and efficient algorithms. In: ICDE (2012)
Ré, C., Dalvi, N., Suciu, D.: Efficient top-k query evaluation on probabilistic data. In: ICDE (2007)
Sismanis, Y., Wang, L., Fuxman, A., Haas, P., Reinwald, B.: Resolution-aware query answering for business intelligence. In: ICDE (2009)
Soliman, M., Ilyas, I., Chang, K.: Top-k query processing in uncertain databases. In: ICDE (2007)
Wick, M., Rohanimanesh, K., Schultz, K., McCallum, A.: A unified approach for schema matching, coreference and canonicalization. In: KDD (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Ioannou, E., Garofalakis, M. (2014). Analytics over Probabilistic Unmerged Duplicates. In: Straccia, U., Calì, A. (eds) Scalable Uncertainty Management. SUM 2014. Lecture Notes in Computer Science(), vol 8720. Springer, Cham. https://doi.org/10.1007/978-3-319-11508-5_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-11508-5_17
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11507-8
Online ISBN: 978-3-319-11508-5
eBook Packages: Computer ScienceComputer Science (R0)