Abstract
Multilingual speech recognition system is required for tasks that use several languages in one speech recognition application. In this paper, we propose an approach for multilingual speech recognition by spotting consonant-vowel (CV) units. The important features of spotting approach are that there is no need for automatic segmentation of speech and it is not necessary to use models for higher level units to recognise the CV units. The main issues in spotting multilingual CV units are the location of anchor points and labeling the regions around these anchor points using suitable classifiers. The vowel onset points (VOPs) have been used as anchor points. The distribution capturing ability of autoassociative neural network (AANN) models is explored for detection of VOPs in continuous speech. We explore classification models such as support vector machines (SVMs) which are capable of discriminating confusable classes of CV units and generalisation from limited amount of training data. The data for similar CV units across languages are shared to train the classifiers for recognition of CV units of speech in multiple languages. We study the spotting approach for recognition of a large number of CV units in the broadcast news corpus of three Indian languages.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Rabiner, L.R., Juang, B.-H.: Fundamentals of Speech Recognition. PTR Prentice Hall, Englewood Cliffs (1993)
Eswar, P., Gupta, S.K., Chandra Sekhar, C., Yegnanarayana, B., Nagamma Reddy, K.: An acoustic-phonetic expert for analysis and processing of continuous speech in Hindi. In: Proc. European Conf. Speech Technology, Edinburgh, pp. 369–372 (1987)
Gangashetty, S.V., Chandra Sekhar, C., Yegnanarayana, B.: Detection of vowel onset points in continuous speech using autoassociative neural network models. In: Proc. Eighth Int. Conf. Spoken Language Processing (INTERSPEECH 2004 - ICSLP), pp. 1081–1084 (2004)
Gangashetty, S.V., Chandra Sekhar, C., Yegnanarayana, B.: Acoustic model combination for recognition of speech in multiple languages using support vector machines. In: Proc. IEEE Int. Joint Conf. Neural Networks (Budapest, Hungary), vol. 4(4), pp. 3065–3069 (2004)
Haykin, S.: Neural Networks: A Comprehensive Foundation. Prentice-Hall International, New Jersey (1999)
Gangashetty, S.V., Chandra Sekhar, C., Yegnanarayana, B.: Dimension reduction using autoassociative neural network models for recognition of consonant-vowel units of speech. In: Proc. Fifth Int. Conf. Advances in Pattern Recognition (ISI Calcutta, India), pp. 156–159 (2003)
Diamantaras, K.I., Kung, S.Y.: Principal Component Neural Networks, Theory and Applications. John Wiley and Sons, Inc., New York (1996)
Roukos, S., Rohlicek, R., Russel, W., Gish, H.: Continuous hidden Markov modelling for speaker-independent word spotting. In: Proc. IEEE Int. Conf. Acoust., Speech and Signal Processing, pp. 627–630 (1989)
Chandra Sekhar, C., Yegnanarayana, B.: Neural network models for spotting stop consonant-vowel (SCV) segments in continuous speech. In: Proc. Int. Conf. Neural Networks, pp. 2003–2008 (1996)
Gangashetty, S.V., Chandra Sekhar, C., Yegnanarayana, B.: Spotting consonant-vowel units in continuous speech using autoassociative neural networks and support vector machines. In: Proc. IEEE Int. Workshop on Machine Learning for Signal Processing (Sao Luis, Brazil), pp. 401–410 (2004)
Chandra Sekhar, C.: Neural Network Models for Recognition of Stop Consonant-Vowel (SCV) Segments in Continuous Speech. PhD thesis, Department of Computer Science and Engineering, Indian Institute of Technology Madras (1996)
Gangashetty, S.V., Mahadeva Prasanna, S.R.: Significance of vowel onset point for speech recognition using neural network models. In: Proc. Fifth Int. Conf. Cognitive and Neural Systems (Boston, USA), vol. 24 (2001)
Siva Rama Krishna Rao, J.Y., Chandra Sekhar, C., Yegnanarayana, B.: Neural networks based approach for detection of vowel onset points. In: Proc. Int. Conf. Advances in Pattern Recognition and Digital Techniques, Calcutta, pp. 316–320 (1999)
Yegnanarayana, B., Kishore, S.P.: AANN-An alternative to GMM for pattern recognition. Neural Networks 15, 459–469 (2002)
Bourlard, H., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Boston (1994)
Davis, S.B., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust., Speech, and Signal Processing 28, 357–366 (1980)
Furui, S.: On the role of spectral transition for speech perception. J. Acoust. Soc. Am. 80(4), 1016–1025 (1986)
Chandra Sekhar, C., Yegnanarayana, B.: A constraint satisfaction model for recognition of stop consonant-vowel (SCV) utterances. IEEE Trans. Speech and Audio Processing 10, 472–480 (2002)
Chopde, A.: ITRANS Indian Language Transliteration Package Version 5.2. Source, http://www.aczone.com/itrans/
Chandra Sekhar, C., Takeda, K., Itakura, F.: Recognition of consonant-vowel (CV) units of speech in a broadcast news corpus using support vector machines. In: Proc. Int. Workshop on Pattern Recognition using Support Vector Machines (Niagara Falls, Canada), pp. 171–185 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gangashetty, S.V., Sekhar, C.C., Yegnanarayana, B. (2006). Spotting Multilingual Consonant-Vowel Units of Speech Using Neural Network Models. In: Faundez-Zanuy, M., Janer, L., Esposito, A., Satue-Villar, A., Roure, J., Espinosa-Duro, V. (eds) Nonlinear Analyses and Algorithms for Speech Processing. NOLISP 2005. Lecture Notes in Computer Science(), vol 3817. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11613107_27
Download citation
DOI: https://doi.org/10.1007/11613107_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-31257-4
Online ISBN: 978-3-540-32586-4
eBook Packages: Computer ScienceComputer Science (R0)