Abstract
With the rapid development of the Internet of Things and cloud computing, HBase has become a good choice for massive data storage, and is efficient in reading and writing data. However, HBase is not supportive for multi-dimensional query of non-rowkey data, unconducive to data analysis and processing. To address this issue, we first analyze the constitution principle and deficiency of secondary index and clustering index, and select clustering index as the basis of optimization. Then, we choose the Hilbert curve in the space filling curve as the linearization technology, design the pre-partition algorithm and subspace partition algorithm, and realize the Hilbert-curve-based clustering index (HCIndex) which supports multi-dimensional point query and range query. Finally, the performance of HCIndex is verified by comparison experiments with HBase Scan, HiBase and CCIndex. The experimental results show that the query efficiency of HCIndex has been greatly improved at the expense of very limited storage space, which is necessary for storing index data and only 1.7 times the size of the original data table of HBase. Compared with HBase scan, the query efficiency of HCIndex’s multi-dimensional point query and range query has been increased to more than 4 times and more than 2 times, respectively. Therefore, the proposed HCIndex is well suited for efficient multi-dimensional and complex queries of massive data in cloud storage systems.













Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Xiong, Q., Zhang, X., Liu, W., et al.: An efficient row key encoding method with ASCII code for storing geospatial big data in HBase. ISPRS Int. J. Geo Inf. 9(11), 1–17 (2020)
Bjeladinovic, S., Marjanovic, Z., Babarogic, S.: A proposal of architecture for integration and uniform use of hybrid SQL/NoSQL database components. J. Syst. Softw. 168(110633), 1–29 (2020)
Davoudian, A., Liu, M.: Big data systems: a software engineering perspective. ACM Comput. Surv. (CSUR) 53(5), 1–39 (2020)
Chen, X., Wu, J., Yuan, G.: Research on the construction of spatio-temporal information cloud platform for big data. Geomat. Spat. Inf. Technol 43, 138–140 (2020)
Liu, Z., Chen, L., Yang, A., et al.: HiIndex: an efficient spatial index for rapid visualization of large-scale geographic vector data. ISPRS Int. J. Geo Inf. 10(10), 1–21 (2021)
Kim, H.J., Ko, E.J., Jeon, Y.H., et al.: Techniques and guidelines for effective migration from RDBMS to NoSQL. J. Supercomput. 76(10), 7936–7950 (2020)
Zou Z, Zheng L, Xia D, et al. “CSIndex: a coprocessor-based classified secondary index mechanism for efficient HBase query,” in 2019 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom). pp. 897–904, Xiamen, China, 2019.
Zhang, J.: Spatio-temporal association query algorithm for massive video surveillance data in smart campus. IEEE Access 6, 59871–59880 (2018)
Martinez-Mosquera, D., Navarrete, R., Lujan-Mora, S.: Modeling and management big data in databases—a systematic literature review. Sustainability 12(2), 1–41 (2020)
Dhulavvagol, P.M., Bhajantri, V.H., Totad, S.G.: Performance analysis of distributed processing system using shard selection techniques on elasticsearch. Procedia Comput. Sci. 167, 1626–1635 (2020)
Li-yi, Y.U.: Design and implementation of commodity pinyin search system based on solr. Comput. Telecommun. 1(7), 7–10 (2020)
Adams, B.: Chronotopic information interaction: integrating temporal and spatial structure for historical indexing and interactive search. Digital Scholarsh. Humanit. 36(3), 525–541 (2021)
Song, J., He, H.Y., Thomas, R., et al.: Haery: a Hadoop based query system on accumulative and high-dimensional data model for big data. IEEE Trans. Knowl. Data Eng. 32(7), 1362–1377 (2019)
Yang, W., Liu, L., Liu, Y., et al.: Secure and efficient multi-dimensional range query algorithm over TMWSNs. Ad Hoc Netw. 130(1), 1–12 (2022)
Fan, L., Liu, L., Gao, H., et al.: Secure K-Nearest neighbor queries in two-tiered mobile wireless sensor networks. Digital Commun. Netw. 7(2), 247–256 (2021)
Xu, J., Tan, Y.: Optimization of multidimensional index query mechanism based on HBase. J. Comput. Appl. 40(2), 571–577 (2020)
Cao, J., Genton, M.G., Keyes, D.E., et al.: Hierarchical-block conditioning approximations for high-dimensional multivariate normal probabilities. Stat. Comput. 29(3), 585–598 (2019)
Kumar, A., Pharwaha, A.P.S.: Development of a modified Hilbert curve fractal antenna for multiband applications. IETE J. Res. (2020). https://doi.org/10.1080/03772063.2020.1772126
Qin, J., Ma, L., et al.: THBase: a coprocessor-based scheme for big trajectory data management. Future Internet 11(1), 1–17 (2019)
Moussa, A.M.: KD-tree based algorithm for copy-move forgery detection. Int. J. Sci. Technol. Res. 9(3), 6973–6977 (2020)
Antoniotti, L., Caldarola, F., Maiolo, M.: Infinite numerical computing applied to Hilbert’s, Peano’s, and Moore’s curves. Mediterr. J. Math. 17(99), 1–19 (2020)
Goyal, P., Challa, J.S., Kumar, D., et al.: Grid-R-tree: a data structure for efficient neighborhood and nearest neighbor queries in data mining. Int. J. Data Sci. Anal. 10(1), 25–47 (2020)
Qi, J., Tao, Y., Chang, Y., et al.: Packing R-trees with space-filling curves: theoretical optimality, empirical efficiency, and bulk-loading parallelizability. ACM Trans. Database Syst. (TODS) 45(3), 1–47 (2020)
Hassan, M.U., Yaqoob, I., Zulfiqar, S., et al.: A comprehensive study of hbase storage architecture—a systematic literature review. Symmetry 13(1), 1–21 (2021)
Zhou, J., Ben, J., Wang, R., et al.: Lattice quad-tree indexing algorithm for a hexagonal discrete global grid system. ISPRS Int. J. Geo Inf. 9(2), 1–16 (2020)
Albert, M., Holmgren, C., Johansson, T., et al.: Embedding small digraphs and permutations in binary trees and split trees. Algorithmica 82(3), 589–615 (2020)
Fellah, K., Kechar, B.: New approach based on Hilbert curve for energy efficient data collection in WSN with mobile sink. IET Wireless Sens. Syst. 10(5), 214–220 (2020)
He, T., Tai, J., Shan, Y., et al.: A fast acoustic emission beamforming localization method based on Hilbert curve. Mech. Syst. Signal Process. 133(106291), 1–16 (2019)
Shahna, K.U., Mohamed, A.: A novel image encryption scheme using both pixel level and bit level permutation with chaotic map. Appl. Soft Comput. 90(106162), 1–17 (2020)
Alrayes, N., Hussein, M.I.: Metamaterial-based sensor design using split ring resonator and Hilbert fractal for biomedical application. Sens. Bio Sens. Res. 31(100395), 1–10 (2021)
Qin, J., Ma, L., Liu, Q.: DFTHR: a distributed framework for trajectory similarity query based on HBase and Redis. Information (Switzerland) 10(2), 1–24 (2019)
Vyas U, Panchal P, Patel M, et al. “STSDB: spatio-temporal sensor database for smart city query processing,” in Proceedings of the 20th International Conference on Distributed Computing and Networking, pp. 433–438, Gold Coast, Australia, 2019.
Funding
This work was supported by the Fundamental Research Funds for the Central Universities (Grant No. BLX201923), National Natural Science Foundation of China (Grant No. 62072187), Guangdong Major Project of Basic and Applied Basic Research (Grant No. 2019B030302002), the Major Key Project of PCL (Grant No. PCL2021A09), Guangdong Marine Economic Development Special Fund Project (Grant No. GDNRC[2022]17), and Guangzhou Development Zone Science and Technology Project (Grant No. 2021GH10, 2020GH10).
Author information
Authors and Affiliations
Contributions
Xinyang Wang, Yu Sun, and Qiao Sun wrote the main manuscript text; Weiwei Lin, and James Z. Wang revised the manuscript and provided revision suggestions; Yu Sun and Wei Li wrote code and process the experiment data.
Corresponding authors
Ethics declarations
Conflict of interest
All authors declare that they have no conflict of interest.
Research involving human and animal rights
This paper does not contain any studies with human participants or animals performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, X., Sun, Y., Sun, Q. et al. HCIndex: a Hilbert-Curve-based clustering index for efficient multi-dimensional queries for cloud storage systems. Cluster Comput 26, 2011–2025 (2023). https://doi.org/10.1007/s10586-022-03723-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-022-03723-y