Abstract
This paper presents a spectral transformation method for emotional speech synthesis based on voice conversion framework. Three emotions are studied, including anger, happiness and sadness. For the sake of high naturalness, superior speech quality and emotion expressiveness, our original STASC system is modified by introducing a new feature selection strategy and hierarchical codebook mapping procedure. Our result shows that the LSF coefficients at low frequency carry more emotion-relative information, and therefore only these coefficients are converted. Listening tests prove that the proposed method can achieve a satisfactory balance between emotional expression and speech quality of converted speech signals.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Murry, I.R., et al.: Towards the Simulation of Emotion in Synthetic Speech: A Review of the Literature of Human Vocal Emotion. J. of ASA 93(2), 1097–1108 (1993)
Iida, A., et al.: A Speech Synthesis System with Emotion for Assisting Communication. In: Proc. ICSA Workshop on Speech And Emotion, pp. 167–177 (2000)
Iida, A., Campbell, N.: A corpus-based speech synthesis system with emotion. Speech Communication 40 (2003)
Abe, M., Nakamura, S., Shikano, K., Kuwabara, H.: Voice Conversion through vector quantization. In: Proceedings of ICASSP 1988, pp. 655–658 (1988)
Shuang, Z.W., Wang, Z.X., Ling, Z.H., Wang, R.H.: A Novel Voice Conversion System based on Codebook Mapping with Phonome-tied Weighting. In: ISCSLP 2004, pp. 1197–1200 (2004)
Maeda, N., Hideki, B., Kajita, S.: Speaker conversion through NoN-Linear frequency warping of STRAIGHT spectrum. In: Proc. of Eurospeech 1999, pp. 827–830 (1999)
Toda, T., Saruwatari, H.: Voice conversion algorithm based on Gaussian Mixture Model with dynamic frequency warping of STRAIGHT spectrum. In: Proc. Of ICASSP, 2001, pp. 841–944 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, YP., Ling, ZH., Wang, RH. (2005). Emotional Speech Synthesis Based on Improved Codebook Mapping Voice Conversion. In: Tao, J., Tan, T., Picard, R.W. (eds) Affective Computing and Intelligent Interaction. ACII 2005. Lecture Notes in Computer Science, vol 3784. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11573548_48
Download citation
DOI: https://doi.org/10.1007/11573548_48
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29621-8
Online ISBN: 978-3-540-32273-3
eBook Packages: Computer ScienceComputer Science (R0)