Abstract
Most encrypted speech retrieval algorithms are over-optimized for discriminability and robustness, which leads to poor security and efficiency. And it’s inefficient to compute large amounts of data in a single machine. Therefore, in this paper, based on the traditional model of ciphertext speech retrieval system, an efficient encrypted speech retrieval based on Hadoop cluster under SW CPU is proposed. The study uses the SW CPU as a cloud and introduces the Hadoop cluster technology. In the proposed algorithm, firstly, the peak frequency and spectral crest factor of the speech are extracted and fused. Secondly, the hyper chaotic measurement matrix is generated by the key so that it is iterated with the feature vectors and further binarized to generate the BioHashing sequence. A pseudo-random sequence is generated by the key, mapping encryption is performed on the speech segments to generate the encrypted speech, and linear-feedback shift register (LSFR) encryption is performed on the BioHashing sequence to generate the hash index. Finally, the hash index and encrypted speech are uploaded to the cloud via WinSCP. In the SW CPU, multi-processor simultaneous operation can speed up the processing of large amounts of data. The experimental results show that the proposed BioHashing algorithm has a good compromise relationship and the proposed system model has a good security. Moreover, the Hadoop cluster technology effectively improves the retrieval performance.























Similar content being viewed by others
Data Availability
Raw data were generated at the large-scale facility. Derived data supporting the findings of this study are available from the corresponding author upon request.
References
Wu Z, Sun J, Zhang Y, Wei Z, Chanussot J (2021) Recent developments in parallel and distributed computing for remotely sensed big data processing. Proc IEEE 109(8):1282–1305
Dai D, Boroomand S (2021) A review of artificial intelligence to enhance the security of big data systems: state-of-art, methodologies, applications, and challenges. Arch Comput Methods Eng 1–19
Zhang YJ, Alazab M, Muthu B (2021) Machine learning-based holistic privacy decentralized framework for big data security and privacy in smart city. Arab J Sci Eng 1–11
Awaysheh FM, Aladwan MN, Alazab M, Alawadi S, Cabaleiro JC, Pena TF (2021) Security by design for big data frameworks over cloud computing. IEEE Trans Eng Manag
Huang YB, Wang Y, Zhang QY, Hou HX (2020) Multi-format speech perception hashing algorithm based on short-time logarithmic energy and improved mel energy parameter fusions. Int J Netw Secur 22(6):1043–1053
Zhang QY, Zhao XJ, Zhang QW, Li YZ (2022) Content-based encrypted speech retrieval scheme with deep hashing. Multimed Tools Appl 81(7):10221–10242
Zhang QY, Bai J, Xu FJ (2022) A retrieval method for encrypted speech based on improved power normalized cepstrum coefficients and perceptual hashing. Multimed Tools Appl 81(11):15127–15151
Zhang Y, Huang Y, Chen D, Zhang Q (2023) Verifiable speech retrieval algorithm based on diversity security template and biohashing. Multimed Tools Appl 1–30
Huang Y, Chen T-F, Yan S-H, Zhang Q et al (2022) Speech biohashing security authentication algorithm based on cnn hyperchaotic map. Multimed Tools Appl 1–27
Zhang Q, Li Y, Hu Y, Zhao X (2020) An encrypted speech retrieval method based on deep perceptual hashing and cnn-bilstm. IEEE Access 8:148556–148569
Huang YB, Hou HX, Fan MH, Zhang WZ, Zhang QY (2021) Long sequence speech perception hash authentication based on multi-feature fusion and arnold transformation. Int J Netw Secur 23(2):359–370
Huang Y, Hou H, Wang Y, Zhang Y, Fan M (2020) A long sequence speech perceptual hashing authentication algorithm based on constant q transform and tensor decomposition. IEEE Access 8:34140–34152
Yi-bo H, Hexiang H, Chen T, Li H, Qiu-yu Z (2022) Long sequence biometric hashing authentication based on 2d-simm and cqcc cosine values. Multimed Tools Appl 81(2):2873–2899
Huang Y, Chen T, Zhang Q, Zhang Y, Yan S (2022) Encrypted speech perceptual hashing authentication algorithm based on improved 2d-henon encryption and harmonic product spectrum. Multimed Tools Appl 1–24
Huang Y, Wang Y, Zhang Q, Chen T (2020) Biohashing encrypted speech retrieval based on chaotic measurement matrix. J Huazhong Univ Sci Technol: Nat Sci Ed 48(12):6
Huang YB, Wang Y, Zhang QY, Zhang WZ, Fan MH (2020) Multi-format speech biohashing based on spectrogram. Multimed Tools Appl 79(33):24889–24909
Huang YB, Li H, Wang Y, Zhang QY (2021) High security speech biohashing authentication algorithm based on multi-feature fusion. Int J Netw Secur 23(6):962–972
Zhang Q, Ge Z, Hu Y, Bai J, Huang Y (2020) An encrypted speech retrieval algorithm based on chirp-z transform and perceptual hashing second feature extraction. Multimed Tools Appl 79(9):6337–6361
An L, Huang Y, Zhang Q (2022) Verifiable speech retrieval algorithm based on knn secure hashing. Multimed Tools Appl 1–22
Huang Y, Li H, Wang Y, Xie Y, Zhang Q (2021) A high security biohashing encrypted speech retrieval algorithm based on feature fusion. Multimed Tools Appl 80(25):33615–33640
Huang YB, Zhang Y, Zhang QY (2022) Biohashing speech security retrieval algorithm based on mscc and improved hadamard measurement matrix. Int J Netw Secur 24(2):377–387
Wang Y, Huang YB, Zhang R, Zhang QY (2021) Multi-format speech biohashing based on energy to zero ratio and improved lp-mmse parameter fusion. Multimed Tools Appl 80(7):10013–10036
Huang YB, Wang Y, Li H, Zhang Y, Zhang QY (2022) Encrypted speech retrieval based on long sequence biohashing. Multimed Tools Appl 81(9):13065–13085
Niu WJ, Feng ZK, Feng BF, Xu YS, Min YW (2021) Parallel computing and swarm intelligence based artificial intelligence model for multi-step-ahead hydrological time series prediction. Sustain Cities Soc 66:102686
Zainab A, Syed D, Ghrayeb A, Abu-Rub H, Refaat SS, Houchati M, Bouhali O, Lopez SB (2021) A multiprocessing-based sensitivity analysis of machine learning algorithms for load forecasting of electric power distribution system. IEEE Access 9:31684–31694
Takahashi K, Ichikawa K, Park J, Pao GM (2023) Scalable empirical dynamic modeling with parallel computing and approximate k-nn search. IEEE Access
Sokolinsky LB (2021) Bsf: A parallel computation model for scalability estimation of iterative numerical algorithms on cluster computing systems. J Parallel Distrib Comput 149:193–206
Li X, Liu H, Wang W, Zheng Y, Lv H, Lv Z (2022) Big data analysis of the internet of things in the digital twins of smart city based on deep learning. Future Gener Comput Syst 128:167–177
Amazal H, Ramdani M, Kissi M (2021) A parallel global tfidf feature selection using hadoop for big data text classification. In: Advances on smart and soft computing, pp 107–117. Springer
Vinutha DC, Raju GT (2021) Budget constraint scheduler for big data using hadoop mapreduce. SN Comput Sci 2(4):1–7
Zhai Y, Tchaye-Kondi J, Lin KJ, Zhu L, Tao W, Du X, Guizani M (2021) Hadoop perfect file: a fast and memory-efficient metadata access archive file to face small files problem in hdfs. J Parallel Distrib Comput 156:119–130
Xie Y, Yang K, Luo P (2021) Logm: log analysis for multiple components of hadoop platform. IEEE Access 9:73522–73532
Mostafaeipour A, Rafsanjani AJ, Ahmadi M, Dhanraj JA (2021) Investigating the performance of hadoop and spark platforms on machine learning algorithms. J Supercomput 77(2):1273–1300
Priyanka EB, Thangavel S, Meenakshipriya B, Venkatesa Prabu D, Sivakumar NS (2021) Big data technologies with computational model computing using hadoop with scheduling challenges. In: Deep learning and big data for intelligent transportation, pages 3–19. Springer
Zhang Q, Ge Z, Zhou L, Zhang Y (2019) An efficient retrieval algorithm of encrypted speech based on inverse fast fourier transform and measurement matrix. Turk J Electr Eng Comput Sci 27(3):1719–1736
Zhang C, Zhu L, Xu C (2017) Ptbi: an efficient privacy-preserving biometric identification based on perturbed term in the cloud. Inf Sci 409:56–67
Koduru A, Valiveti HB, Budati AK (2020) Feature extraction algorithms to improve the speech emotion recognition rate. Int J Speech Technol 23(1):45–55
Guo W, Li S (2023) Highly-efficient hardware architecture for crystals-kyber with a novel conflict-free memory access pattern. Regular Papers, IEEE transactions on circuits and systems I
Pham TX, Duong-Ngoc P, Lee H (2023) An efficient unified polynomial arithmetic unit for crystals-dilithium. IEEE Trans Circuits Syst I Regul Pap
Shim KA (2023) On the suitability of post-quantum signature schemes for internet of things. IEEE Internet Things J
Acknowledgements
Key Science and Technology Foundation of Gansu Province (21JR7RA120).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conict of interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Core framework of Hadoop

Appendix B: Content preservation operations

Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, H., Jing, X., Zhang, Y. et al. Efficient encrypted speech retrieval based on hadoop cluster under SW CPU. Multimed Tools Appl 83, 63047–63073 (2024). https://doi.org/10.1007/s11042-023-17932-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17932-z