Abstract
Top-k representative skyline queries are important for multi-criteria decision making applications since they provide an intuitive way to identify the k most significant objects for data analysts. Despite their importance, top-k representative skyline queries have not received adequate attention from the research community. Existing work addressing the problem focuses only on certain data models. For this reason, in this paper, we present the first study on processing top-k representative skyline queries in uncertain databases, based on user-defined references, regarding the priority of individual dimensions. We also apply the odds ratio to restrict the cardinality of the result set, instead of using a threshold which might be difficult for an end-user to define. We then develop two novel algorithms for answering top-k representative skyline queries on uncertain data. In addition, several pruning conditions are proposed to enhance the efficiency of our proposed algorithms. Performance evaluations are conducted on both real-life and synthetic datasets to demonstrate the efficiency, effectiveness and scalability of our proposed approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Zhan, L., Zhang, Y., Zhang, W., Lin, X.: Identifying top k dominating objects over uncertain data. In: Bhowmick, S.S., Dyreson, C.E., Jensen, C.S., Lee, M.L., Muliantara, A., Thalheim, B. (eds.) DASFAA 2014, Part I. LNCS 8421, vol. 8421, pp. 388–405. Springer, Heidelberg (2012)
Yiu, M.L., Mamoulis, N.: Multi-dimensional top-k dominating queries. The VLDB Journal 18(3), 695–718 (2009)
Yiu, M.L., Mamoulis, N.: Efficient processing of top-k dominating queries on multi-dimensional data. In: Proceedings of the 33rd International Conference on Very Large data Bases, pp. 483-494. VLDB Endowment (2007)
Lin, X., Yuan, Y., Zhang, Q., Zhang, Y.: Selecting stars: The k most representative skyline operator. In: IEEE 23rd International Conference on Data Engineering, ICDE 2007, pp. 86-95. IEEE (2007)
Lian, X., Chen, L.: Probabilistic top-k dominating queries in uncertain databases. Information Sciences 226, 23–46 (2013)
Lian, X., Chen, L.: Top-k dominating queries in uncertain databases. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, pp. 660-671. ACM (2009)
Zhang, W., Lin, X., Zhang, Y., Pei, J., Wang, W.: Threshold-based probabilistic top-k dominating queries. The VLDB Journal 19(2), 283–305 (2010)
Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Continuous top-k dominating queries in subspaces. In: Panhellenic Conference on Informatics, PCI 2008, pp. 31-35. IEEE (2008)
Yao, Y.: Measuring retrieval effectiveness based on user preference of documents. JASIS 46(2), 133–145 (1995)
Zhou, B., Yao, Y.: Evaluating information retrieval system performance based on user preference. Journal of Intelligent Information Systems 34(3), 227–248 (2010)
Vargas, S., Castells, P.: Exploiting the diversity of user preferences for recommendation. In: Proceedings of the 10th Conference on Open Research Areas in Information Retrieval, pp. 129-136 (2013)
Chomicki, J.: Querying with Intrinsic Preferences. In: Jensen, C.S., Jeffery, K., Pokorný, J., Šaltenis, S., Bertino, E., Böhm, K., Jarke, M. (eds.) EDBT 2002. LNCS, vol. 2287, pp. 34–51. Springer, Heidelberg (2002)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to information retrieval, vol. 1. Cambridge university press Cambridge, (2008)
Li, H., Li, J., Wong, L., Feng, M., Tan, Y.-P.: Relative risk and odds ratio: a data mining perspective. In: Proceedings of the Twenty-Fourth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 368-377. ACM (2005)
Nanongkai, D., Sarma, A.D., Lall, A., Lipton, R.J., Xu, J.: Regret-minimizing representative databases. Proceedings of the VLDB Endowment 3(1–2), 1114–1124 (2010)
Das Sarma, A., Lall, A., Nanongkai, D., Lipton, R.J., Xu, J.: Representative skylines using threshold-based preference distributions. In: 2011 IEEE 27th International Conference on Data Engineering (ICDE), pp. 387-398. IEEE (2011)
Magnani, M., Assent, I., Mortensen, M.L.: Taking the Big Picture: representative skylines based on significance and diversity. The VLDB Journal 1-21 (2014)
Zhang, Y., Zhang, W., Lin, X., Jiang, B., Pei, J.: Ranking uncertain sky: The probabilistic top-k skyline operator. Information Systems 36(5), 898–915 (2011)
Yong, H., Lee, J., Kim, J., Hwang, S.-W.: Skyline ranking for uncertain databases. Information Sciences 273, 247–262 (2014)
Pei, J., Jiang, B., Lin, X., Yuan, Y.: Probabilistic skylines on uncertain data. In: Proceedings of the 33rd International Conference on Very large data bases, pp. 15-26. VLDB Endowment (2007)
Tao, Y., Ding, L., Lin, X., Pei, J.: Distance-based representative skyline. In: IEEE 25th International Conference on Data Engineering, ICDE 2009, pp. 892-903. IEEE (2009)
Vlachou, A., Doulkeridis, C., Halkidi, M.: Discovering representative skyline points over distributed data. In: Ailamaki, A., Bowers, S. (eds.) SSDBM 2012. LNCS, vol. 7338, pp. 141–158. Springer, Heidelberg (2012)
Borzsony, S., Kossmann, D., Stocker, K.: The skyline operator. In: Proceedings. 17th International Conference on Data Engineering, 2001, pp. 421-430. IEEE (2001)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Nguyen, H.T.H., Cao, J. (2015). Preference-Based Top-k Representative Skyline Queries on Uncertain Databases. In: Cao, T., Lim, EP., Zhou, ZH., Ho, TB., Cheung, D., Motoda, H. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2015. Lecture Notes in Computer Science(), vol 9078. Springer, Cham. https://doi.org/10.1007/978-3-319-18032-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-18032-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18031-1
Online ISBN: 978-3-319-18032-8
eBook Packages: Computer ScienceComputer Science (R0)