Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

Sen, Nirmalya; Sahidullah, Md; Patil, Hemant A.; Das Mandal, Shyamal Kumar; Rao, Krothapalli Sreenivasa; Basu, Tapan Kumar

doi:10.1007/s10772-021-09862-8

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

Published: 13 July 2021

Volume 24, pages 1067–1088, (2021)
Cite this article

International Journal of Speech Technology Aims and scope Submit manuscript

Nirmalya Sen¹,
Md Sahidullah²,
Hemant A. Patil³,
Shyamal Kumar Das Mandal⁴,
Krothapalli Sreenivasa Rao⁵ &
…
Tapan Kumar Basu⁶

208 Accesses
Explore all metrics

Abstract

The performance of speaker recognition system is highly dependent on the duration of speech used in enrollment and test. This work presents a detailed experimental review and analysis of the GMM-SVM based speaker recognition system in presence of duration variability. This article also reports a comparison of the performance of GMM-SVM classifier with its precursor technique Gaussian mixture model- universal background model (GMM-UBM) classifier in presence of duration variability. The goal of this research work is not to propose a new algorithm for improving speaker recognition performance in presence of duration variability. However, the main focus of this work is on utterance partitioning (UP), a commonly used strategy to compensate the duration variability issue. We have analysed in detailed the impact of training utterance partitioning in speaker recognition performance under GMM-SVM framework. We further investigate the reason why the utterance partitioning is important for boosting speaker recognition performance. We have also shown in which case the utterance partitioning could be useful and where not. Our study has revealed that utterance partitioning does not reduce the data imbalance problem of the GMM-SVM classifier as claimed in earlier study. Apart from these, we also discuss issues related to the impact of parameters such as number of Gaussians, supervector length, amount of splitting required for obtaining better performance in short and long duration test conditions from speech duration perspective. We have performed the experiments with telephone speech from POLYCOST corpus consisting of 130 speakers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems

Article 16 February 2017

Speaker Classification via Supervised Hierarchical Clustering Using ICA Mixture Model

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

Article 04 May 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Alpaydin, E. (2004). Introduction to machine learning (2nd ed.). Cambridge: MIT Press.
MATH Google Scholar
Bilmes, J. A. (1998). A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Tech. Rep. ICSI-TR-97–021, Department of Electrical Engineering and Computer Science,U.C. Berkeley. pp. 1–15.
Burges, C. J. C. (1998). A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2), 121–167.
Article Google Scholar
Campbell, W. M., Sturim, D. E., & Reynolds, D. A. (2006a). Support vector machines using GMM supervectors for speaker verification. IEEE Signal Processing Letters, 13(5), 308–311.
Article Google Scholar
Campbell,W.M., Sturim, D.E., Reynolds, D.A. & Solomonoff, A. (2006b). SVM based speaker verification using a GMM supervector kernel and NAP variability compensation. In: ICASSP06, vol. 1, pp 97–100.
Chakroborty, S. (2008). Some studies on acoustic feature extraction, feature selection and multi-level fusion strategies for robust text-independent speaker identification. Ph.D. Thesis, department of electronics and electrical communication engineering, IIT Kharagpur, India.
Chang, C.-C. & Lin, C.-J. (2001). LIBSVM: A Library for Support Vector Machines. [Online]. http://www.csie.ntu.edu.tw/~cjlin/libsvm
Davis, S. B., & Mermelsteine, P. (1980). Comparison of parametric representation for monosyllabic word recognition in continuously spoken sentences. IEEE Transactions Acousting, Speech, Signal Processing ASSP, 28(4), 357–365.
Article Google Scholar
Dehak, N., Chollet, G. (2006). Support vector GMMs for speaker verification. In: Proc. IEEE Odyssey: the Speaker and Language Recognition Workshop (Odyssey 2006), San Juan, Puerto Rico, June 2006.
Dehak, N., Kenny, P. J., Dehak, R., Dumouchel, P., & Ouellet, P. (2010). Front-end factor analysis for speaker verification. IEEE Transactions on Audio, Speech, and Language Processing, 19(4), 788–798.
Article Google Scholar
Fauve, B., Evans, N., Pearson, N., Bonastre, J.-F., Mason, J. (2007). Influence of task duration in text-independent speaker verification. In: Proc. Interspeech2007, Antwerp, Belgium, pp. 794–797.
Hansen, J. H., & Hasan, T. (2015). Speaker recognition by machines and humans: A tutorial review. IEEE Signal Processing Magazine, 32(6), 74–99.
Article Google Scholar
Hautamäki, R. G., Sahidullah, M., Hautamäki, V., & Kinnunen, T. (2017). Acoustical and perceptual study of voice disguise by age modification in speaker verification. Speech Communication, 95, 1–15.
Article Google Scholar
Kanagasundaram, A., Dean, D., Sridharan, S., Ghaemmaghami, H., & Fookes, C. (2017). A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. International Journal of Speech Technology, 20(2), 247–259.
Article Google Scholar
Kanagasundaram, A., Dean, D., Sridharan, S., Gonzalez-Dominguez, J., Gonzalez-Rodriguez, J., & Ramos, D. (2014). Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Communication, 59, 69–82.
Article Google Scholar
Kandali, A. B. (2012). Classification of discrete emotions in speech using prosodic and spectral features: Intra and cross-lingual studies in five native languages of Assam. Ph.D. Thesis, department of electrical engineering, IIT Kharagpur, India.
Kinnunen, T., & Li, H. (2010). An overview of text-independent speaker recognition: From features to supervectors. Speech Communication, 52(1), 12–40.
Article Google Scholar
Kinnunen, T. (2004). Spectral features for automatic text-independent speaker recognition. Ph.D. Thesis, University of Joensuu.
Kinnunen, T., Saastamoinen, J., Hautamäki, V., Vinni, M., & Franti, P. (2009). Comparative evaluation of maximum a posteriori vector quantization and Gaussian mixture models in speaker verification. Pattern Recognition Letters., 30(4), 341–347.
Article Google Scholar
Mak, M. W., & Rao, W. (2011). Utterance partitioning with acoustic vector resampling for GMM–SVM speaker verification. Speech Communication, 53(1), 119–130.
Article Google Scholar
Matějka, P., Glembek, O., Castaldo, F., Alam, M.J., Plchot, O., Kenny, P., Burget, L. and Černocky, J. (May 2011). Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification. In 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4828–4831).
Patil, H. A. (2005). Speaker Recognition in Indian Languages: A Feature Based Approach. Ph.D. Thesis, department of electrical engineering, IIT Kharagpur, India.
Petrovska, D., et al. (1998). POLYCOST: A Telephonic speech database for speaker recognition. RLA2C, Avignon, France, April 20–23, pp. 211–214.
Poddar, A., Sahidullah, M., & Saha, G. (2017). Speaker verification with short utterances: A review of challenges, trends and opportunities. IET Biometrics, 7(2), 91–101.
Article Google Scholar
Rao, W., & Mak, M. W. (2013). Boosting the performance of i-vector based speaker verification via utterance partitioning. IEEE Transactions on Audio, Speech, and Language Processing, 21(5), 1012–1022.
Article Google Scholar
Reynolds, D. A., Quatieri, T. F., & Dunn, R. B. (2000). Speaker verification using adapted Gaussian mixture models. Digital Signal Processing, 10, 19–41.
Article Google Scholar
Sahidullah, Md. (2015). Enhancement of speaker recognition performance using block level, relative, and temporal information of subband energies. Ph.D. Thesis, Department of Electronics and Electrical Communication Engineering, IIT Kharagpur, India.
Sahidullah, Md., & Saha, G. (2012). Design, analysis and experimental evaluation of block based transformation in MFCC computation for speaker recognition. Speech Communication., 54(4), 543–565.
Article Google Scholar
Sen, N. (2014). Enhancement of speaker recognition performance for short test segments using GMM-SVM and polynomial classifiers. Ph.D. Thesis, Centre for Educational Technology, IIT Kharagpur, India.
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D. & Khudanpur, S., 2018, April. X-vectors: Robust dnn embeddings for speaker recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 5329–5333).
Vapnik, V. (1995). The nature of statistical learning theory. Springer-Verlag.
Book Google Scholar

Download references

Acknowledgements

The authors are grateful to Professor Goutam Saha, Department of E & ECE, IIT Kharagpur for his help in the experimentation with the POLYCOST database. First author is extremely grateful to Dr. Richa Mittal, erstwhile student of Department of CET, IIT Kharagpur for her help at the time of preparation of the manuscript. First author is also extremely grateful to Dr. Rahul Dasgupta, erstwhile student of Department of CET, IIT Kharagpur for rigorous technical discussions.

Author information

Authors and Affiliations

R. H. Sapat College of Engineering, Management Studies and Research, Nashik, 422005, India
Nirmalya Sen
CNRS, Inria, LORIA, Université de Lorraine, 54000, Nancy, France
Md Sahidullah
Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar, 382 007, India
Hemant A. Patil
Centre for Educational Technology, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
Shyamal Kumar Das Mandal
Department of Computer Science and Engineering, Indian Institute of Technology Kharagpur, Kharagpur, 721302, India
Krothapalli Sreenivasa Rao
B. P. Poddar Institute of Management and Technology, V.I.P Road, Kolkata, India
Tapan Kumar Basu

Authors

Nirmalya Sen
View author publications
You can also search for this author inPubMed Google Scholar
Md Sahidullah
View author publications
You can also search for this author inPubMed Google Scholar
Hemant A. Patil
View author publications
You can also search for this author inPubMed Google Scholar
Shyamal Kumar Das Mandal
View author publications
You can also search for this author inPubMed Google Scholar
Krothapalli Sreenivasa Rao
View author publications
You can also search for this author inPubMed Google Scholar
Tapan Kumar Basu
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Md Sahidullah.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sen, N., Sahidullah, M., Patil, H.A. et al. Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework. Int J Speech Technol 24, 1067–1088 (2021). https://doi.org/10.1007/s10772-021-09862-8

Download citation

Received: 06 November 2020
Accepted: 21 May 2021
Published: 13 July 2021
Issue Date: December 2021
DOI: https://doi.org/10.1007/s10772-021-09862-8

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems

Speaker Classification via Supervised Hierarchical Clustering Using ICA Mixture Model

Supervector-based approaches in a discriminative framework for speaker verification in noisy environments

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now