Abstract
This paper focuses on the understanding of the effects leading to high-quality HMM-based speech synthesis with various degrees of articulation. The adaptation of a neutral speech synthesizer to generate hypo and hyperarticulated speech is first performed. The impact of cepstral adaptation, of prosody, of phonetic transcription as well as the adaptation technique on the perceived degree of articulation is studied. For this, a subjective evaluation is conducted. It is shown that high-quality hypo and hyperarticulated speech synthesis requires the use of an efficient adaptation such as CMLLR. Moreover, in addition to prosody adaptation, the importance of cepstrum adaptation as well as the use of a Natural Language Processor able to generate realistic hypo and hyperarticulated phonetic transcriptions is assessed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Lindblom, B.: Economy of Speech Gestures. The Production of Speech. Springer, New-York (1983)
Beller, G.: Analyse et Modèle Génératif de l’Expressivité - Application à la Parole et à l’Interprétation Musicale, PhD Thesis, Universit Paris VI - Pierre et Marie Curie, IRCAM (2009) (in French)
Beller, G., Obin, N., Rodet, X.: Articulation Degree as a Prosodic Dimension of Expressive Speech. In: Fourth International Conference on Speech Prosody, Campinas, Brazil (2008)
Picart, B., Drugman, T., Dutoit, T.: Analysis and Synthesis of Hypo and Hyperarticulated Speech. In: Proc. Speech Synthesis Workshop 7 (SSW7), Kyoto, Japan (2010)
Picart, B., Drugman, T., Dutoit, T.: Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis. In: Proc. Interspeech, Firenze, Italy (2011)
Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., King, S., Renals, S.: A Robust Speaker-Adaptive HMM-based Text-to-Speech Synthesis. IEEE Audio, Speech, & Language Processing 17(6), 1208–1230 (2009)
Yamagishi, J., Masuko, T., Kobayashi, T.: HMM-based expressive speech synthesis – Towards TTS with arbitrary speaking styles and emotions. In: Proc. of Special Workshop in Maui, SWIM (2004)
Nose, T., Tachibana, M., Kobayashi, T.: HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker’s Voice Using Model Adaptation. IEICE Transactions on Information and Systems 92(3), 489–497 (2009)
HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Drugman, T., Wilfart, G., Dutoit, T.: A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis. In: Proc. Interspeech, Brighton, U.K. (2009)
Digalakis, V., Rtischev, D., Neumeyer, L.: Speaker adaptation using constrained reestimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)
Gales, M.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Ferguson, J.: Variable Duration Models for Speech. In: Proc. Symp. on the Application of Hidden Markov Models to Text and Speech, pp. 143–179 (1980)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Picart, B., Drugman, T., Dutoit, T. (2011). Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_23
Download citation
DOI: https://doi.org/10.1007/978-3-642-25020-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)