Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis

Picart, Benjamin; Drugman, Thomas; Dutoit, Thierry

doi:10.1007/978-3-642-25020-0_23

Benjamin Picart²⁰,
Thomas Drugman²⁰ &
Thierry Dutoit²⁰

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7015))

Included in the following conference series:

International Conference on Nonlinear Speech Processing

968 Accesses

Abstract

This paper focuses on the understanding of the effects leading to high-quality HMM-based speech synthesis with various degrees of articulation. The adaptation of a neutral speech synthesizer to generate hypo and hyperarticulated speech is first performed. The impact of cepstral adaptation, of prosody, of phonetic transcription as well as the adaptation technique on the perceived degree of articulation is studied. For this, a subjective evaluation is conducted. It is shown that high-quality hypo and hyperarticulated speech synthesis requires the use of an efficient adaptation such as CMLLR. Moreover, in addition to prosody adaptation, the importance of cepstrum adaptation as well as the use of a Natural Language Processor able to generate realistic hypo and hyperarticulated phonetic transcriptions is assessed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 54.99; Price excludes VAT (USA)

Softcover Book: USD 69.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Parameterization of Excitation Signal for Improving the Quality of HMM-Based Speech Synthesis System

Article 03 January 2017

Superpositional HMM-Based Intonation Synthesis Using a Functional F0 Model

Article 19 May 2015

Evaluation of the Impact of Corpus Phonetic Alignment on the HMM-Based Speech Synthesis Quality

References

Lindblom, B.: Economy of Speech Gestures. The Production of Speech. Springer, New-York (1983)
Book Google Scholar
Beller, G.: Analyse et Modèle Génératif de l’Expressivité - Application à la Parole et à l’Interprétation Musicale, PhD Thesis, Universit Paris VI - Pierre et Marie Curie, IRCAM (2009) (in French)
Google Scholar
Beller, G., Obin, N., Rodet, X.: Articulation Degree as a Prosodic Dimension of Expressive Speech. In: Fourth International Conference on Speech Prosody, Campinas, Brazil (2008)
Google Scholar
Picart, B., Drugman, T., Dutoit, T.: Analysis and Synthesis of Hypo and Hyperarticulated Speech. In: Proc. Speech Synthesis Workshop 7 (SSW7), Kyoto, Japan (2010)
Google Scholar
Picart, B., Drugman, T., Dutoit, T.: Continuous Control of the Degree of Articulation in HMM-based Speech Synthesis. In: Proc. Interspeech, Firenze, Italy (2011)
Google Scholar
Yamagishi, J., Nose, T., Zen, H., Ling, Z., Toda, T., Tokuda, K., King, S., Renals, S.: A Robust Speaker-Adaptive HMM-based Text-to-Speech Synthesis. IEEE Audio, Speech, & Language Processing 17(6), 1208–1230 (2009)
Article Google Scholar
Yamagishi, J., Masuko, T., Kobayashi, T.: HMM-based expressive speech synthesis – Towards TTS with arbitrary speaking styles and emotions. In: Proc. of Special Workshop in Maui, SWIM (2004)
Google Scholar
Nose, T., Tachibana, M., Kobayashi, T.: HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker’s Voice Using Model Adaptation. IEICE Transactions on Information and Systems 92(3), 489–497 (2009)
Article Google Scholar
HMM-based Speech Synthesis System (HTS), http://hts.sp.nitech.ac.jp/
Zen, H., Tokuda, K., Black, A.W.: Statistical parametric speech synthesis. Speech Commun. 51(11), 1039–1064 (2009)
Article Google Scholar
Drugman, T., Wilfart, G., Dutoit, T.: A Deterministic plus Stochastic Model of the Residual Signal for Improved Parametric Speech Synthesis. In: Proc. Interspeech, Brighton, U.K. (2009)
Google Scholar
Digalakis, V., Rtischev, D., Neumeyer, L.: Speaker adaptation using constrained reestimation of Gaussian mixtures. IEEE Trans. Speech Audio Process. 3(5), 357–366 (1995)
Article Google Scholar
Gales, M.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Article Google Scholar
Ferguson, J.: Variable Duration Models for Speech. In: Proc. Symp. on the Application of Hidden Markov Models to Text and Speech, pp. 143–179 (1980)
Google Scholar

Download references

Author information

Authors and Affiliations

TCTS Lab, Faculté Polytechnique (FPMs), University of Mons (UMons), Belgium
Benjamin Picart, Thomas Drugman & Thierry Dutoit

Authors

Benjamin Picart
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Drugman
View author publications
You can also search for this author in PubMed Google Scholar
Thierry Dutoit
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute for Technological Development and Innovation in Communications (IDETIC), Signals and Communications Department, University of Las Palmas de Gran Canaria, Campus de Tafira, s/n, 35017, Las Palmas de Gran Canaria, Spain
Carlos M. Travieso-González & Jesús B. Alonso-Hernández &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Picart, B., Drugman, T., Dutoit, T. (2011). Perceptual Effects of the Degree of Articulation in HMM-Based Speech Synthesis. In: Travieso-González, C.M., Alonso-Hernández, J.B. (eds) Advances in Nonlinear Speech Processing. NOLISP 2011. Lecture Notes in Computer Science(), vol 7015. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25020-0_23

Download citation

DOI: https://doi.org/10.1007/978-3-642-25020-0_23
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-25019-4
Online ISBN: 978-3-642-25020-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics