Abstract
The hash join algorithm family is one of the leading techniques for equi-join performance evaluation. OLAP systems borrow this line of research to efficiently implement foreign key joins between dimension tables and big fact tables. From data warehouse schema and workload feature perspective, the hash join algorithm can be further simplified with multidimensional mapping, and the foreign key join algorithms can be evaluated from multiple perspectives instead of single performance perspective. In this paper, we introduce the surrogate key index oriented foreign key join as schema-conscious and OLAP workload customized design foreign key join to comprehensively evaluate how state-of-the-art join algorithms perform in OLAP workloads. Our experiments and analysis gave the following insights: (1) customized foreign key join algorithm for OLAP workload can make join performance step forward than general-purpose hash joins; (2) each join algorithm shows strong and weak performance regions dominated by the cache locality ratio of input_size/cache_size with a fine-grained micro join benchmark; (3) the simple hardware-oblivious shared hash table join outperforms complex hardware-conscious radix partitioning hash join in most benchmark cases; (4) the customized foreign key join algorithm with surrogate key index simplified the algorithm complexity for hardware accelerators and make it easy to be implemented for different hardware accelerators. Overall, we argue that improving join performance is a systematic work opposite to merely hardware-conscious algorithm optimizations, and the OLAP domain knowledge enables surrogate key index to be effective for foreign key joins in data warehousing workloads for both CPU and hardware accelerators.





























Similar content being viewed by others
References
Blanas, S., Li, Y., Patel, J.M.: Design and evaluation of main memory hash join algorithms for multi-core CPUs. In: Proceedings of SIGMOD Conference. ACM, New York, NY, pp. 37–48. https://doi.org/10.1145/1989323.1989328 (2011)
Balkesen, C., Teubner, J., Alonso, G., Ozsu, T.: Main-memory hash joins on multi-core cpus: tuning to the underlying hardware. In: Proceedings of ICDE Conference, pp. 362–373, https://doi.org/10.1109/icde.2013.6544839 (2013)
Boncz, P.A., Kersten, M.L., Manegold, S.: Breaking the memory wall in MonetDB. Commun. ACM 51(12), 77–85 (2008). https://doi.org/10.1145/1409360.1409380
Stefan, S., Xiao, C., Jens, D.: An experimental comparison of thirteen relational equi-joins in main memory. In: Proceedings of SIGMOD Conference, pp. 1961–1976 (2016)
Zhang, Y., Zhou, X., Zhang, Y., Zhang, Y., Su, M., Wang, S.: Virtual denormalization via array index reference for main memory OLAP. IEEE Trans. Knowl. Data Eng. 28(4), 1061–1074 (2016)
Kaldewey, T., Lohman, G., Mueller, R., Volk, P.: GPU join processing revisited. In: Proceedings of DaMoN Conference, pp. 55–62 (2012)
Yuan, Y., Lee, R., Zhang, X.: The Yin and Yang of processing data warehousing queries on GPU devices. PVLDB 6(10), 817–828 (2013)
He, J., Lu, M., He, B.: Revisiting co-processing for hash joins on the coupled CPU-GPU architecture. In: Proceedings of VLDB Conference, vol. 6, no. 10, pp. 889–900 (2013)
Jha, S., He, B., Lu, M., Cheng, X., Huynh, H.P.: Improving main memory hash joins on Intel Xeon Phi processors: an experimental approach. PVLDB 8(6), 642–653 (2015)
Halstead, R.J., Absalyamov, I., Najjar, W.A., Tsotras, V.J.: FPGA-based multithreading for in-memory hash joins. In: Proceedings of CIDR Conference (2015)
Avinash, S., Roger, G., Jesús, C., Ho-Seop, K., Krishna, V., Sundaram, C., Steven, H., Rajat, A., Yen-Chen, L.: Knights landing: second-generation Intel Xeon Phi Product. IEEE Micro 36(2), 34–46 (2016)
Jack, D., Wen-Fu, K., Allen, K.L., Julius, M., Anirudha, R., Lihu, R., Efraim, R., Ahmad, Y., Adi, Y.: Inside 6th-generation Intel Core: new microarchitecture code-named Skylake. IEEE Micro 37(2), 52–62 (2017)
Barber, R., Lohman, G.M., Pandis, I., et al.: Memory-efficient hash joins. Proc. VLDB Endow. 8(4), 353–364 (2015)
Sompolski, J., Zukowski, M., Boncz, P.A.: Vectorization vs. compilation in query execution. In: Proceedings of DaMoN Conference, pp. 33–40 (2011)
Boncz, P.A., Zukowski, M., Nes, N.: MonetDB/X100: hyper-pipelining query execution. In: Proceedings of CIDR Conference, pp. 225–237 (2005)
Polychroniou, O., Raghavan, A., Ross, K.A.: Rethinking SIMD vectorization for in-memory databases. In: Proceedings of SIGMOD Conference, pp. 1493–1508, May. 2015, https://doi.org/10.1145/2723372.2747645 (2015)
Richter, S., Alvarez, V., Dittrich, J.: A seven-dimensional analysis of hashing methods and its implications on query processing. PVLDB 9(3), 96–107 (2015)
Kemper, A., Neumann, T.: HyPer: a hybrid OLTP&OLAP main memory database system based on virtual memory snapshots. In: Proceedings of ICDE Conference, pp. 195–206, https://doi.org/10.1109/icde.2011.5767867 (2011)
Sikka, V., Färber, F., Lehner, W., Cha, S.K., Peh, T., Bornhövd, C.: Efficient transaction processing in SAP HANA database: the end of a column store myth. In: Proceedings of SIGMOD Conference, pp. 731–742 (2012)
Zhang, Y., Wang, S., Lu, J.: Improving performance by creating a native join-index for OLAP. Front Comput Sci China 5(2): 236–249 (2011)
Balkesen, C., Alonso, G., Teubner, J., Özsu, M.T.: Multi-core, main-memory joins: sort vs. hash revisited. PVLDB 7(1), 85–96 (2013)
Abadi, D.J., Madden, S., Hachem, N.: Column-stores vs. row-stores: how different are they really? In: Proceedings of SIGMOD Conference, pp. 967–980 (2008)
Acknowledgements
This work is supported by Nature Science foundation of China Project Nos. 61732014, 61772533 and Academy of Finland (310321).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhang, Y., Zhang, Y., Zhou, X. et al. Main-memory foreign key joins on advanced processors: design and re-evaluations for OLAP workloads. Distrib Parallel Databases 37, 469–506 (2019). https://doi.org/10.1007/s10619-018-7226-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10619-018-7226-4