We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey

by Rajeswari K C, Uma Maheswari P
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 39 - Number 16
Year of Publication: 2012
Authors: Rajeswari K C, Uma Maheswari P
10.5120/4902-7399

Rajeswari K C, Uma Maheswari P . Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey. International Journal of Computer Applications. 39, 16 ( February 2012), 8-11. DOI=10.5120/4902-7399

@article{ 10.5120/4902-7399,
author = { Rajeswari K C, Uma Maheswari P },
title = { Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey },
journal = { International Journal of Computer Applications },
issue_date = { February 2012 },
volume = { 39 },
number = { 16 },
month = { February },
year = { 2012 },
issn = { 0975-8887 },
pages = { 8-11 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume39/number16/4902-7399/ },
doi = { 10.5120/4902-7399 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:26:35.385607+05:30
%A Rajeswari K C
%A Uma Maheswari P
%T Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey
%J International Journal of Computer Applications
%@ 0975-8887
%V 39
%N 16
%P 8-11
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents a study on prosody modeling for speech synthesis. Any Text to Speech system comprises of two phases. One is text analysis and second is speech synthesis. The task of text analysis is to find the words and the task of speech synthesis is to generate the speech. To attain this, different models are available such as text as language models, grapheme to phoneme models, full linguistic analysis model and complete prosody generation model. In complete prosody generation model, the quantities like phrasing, stress and the like are determined to generate naturalness bearing synthetic voice. Towards generating such a speech, an explicit prosodic model is required. This makes the speech more understandable. Many researches have been done in this stream, but still better solution is required. In this paper, the strength and weaknesses of different approaches of prosody models are discussed.

References
  1. M. Nageshwara Rao, Samuel Thomas, T. Nagarajan and Hema A. Murthy, “Text-to-speech synthesis using syllable like units,” in National Conference on Communication, Kharagpur, India, Jan 2005, pp 277-280.
  2. G.L.Jayavardhana Rama, A G Ramakrishnan, R. Muralishankar and Vijay Venkatesh” Thirukkural – A text to speech synthesis system”. Proc. Tamil Internet 2001, Kuala Lumpur 2001, 92-97.
  3. Vinodh M Vishwanath, Ashwin Bellur, Badri Narayan K, Deepali M Thakare, Anila Susan, Suthakar N M and Hema A Murthy,“Using Polysyllabic units for Text to Speech Synthesis in Indian languages,” Proceedings of National Conference on Communication,pp.1-5, 29-31, Jan. 2010.
  4. X.J. Ma, W. Zhang, W.B. Zhu, Q. Shi and L. Jin, "Probability Based Prosody Model for Unit Selection", ICASSP 2004, Montreal, Canada
  5. Wei Zhang, Liang Gu and Yuqing Gao “Recent improvements of probability based prosody model for unit selection in concatenative Text to Speech”, in the proceedings of ICASSP 2009, pp 3777-3780
  6. N. Sridhar Krishna, Partha Pratim Talukdar, Kalika Bali, A. G. Ramakrishnan, “Duration Modeling for Hindi Text to Speech Synthesis System”, in Proc. ICSLP 2004, South Korea, 2004.
  7. A.S.Madhukumar, S.Rajendran and B.Yegnanarayana, “Intonation component of a Text to Speech system for Hindi”, Proceedings of International journal of Computer Speech and Language, 1993, Volume7, pp 283-301
  8. Ashwin Bellur, K Badri Narayan, Raghava Krishnan K, Hema A Murthy, “Prosody modeling for syllable based concatenative speech synthesis of Hindi and Tamil”, in National conference on Communications, Jan 2011, pp 28-30.
  9. Samuel Thomas, M. Nageshwara Rao, Hema A.Murthy and C.S. Ramalingam, “Natural sounding TTS based on syllable-like units,” in the proceedings of the 14th European Signal Processing Conference, Florence, Italy, Sep 2006.
  10. Ovidiu Buza, Gavril Toderean, Jozsef Domokos, “A rule based approach to build a Text to speech system for Romanian”, in proceedings of international Conference on communications, June 2010, pp. 33-36.
  11. G. L. Jayavardhana Rama, A. G. Ramakrishnan, R. Muralishankar and R Prathibha, “A Complete Text-To- Speech Synthesis System in Tamil”, in 0-7803-7395-2/02, IEEE proceedings of ICASSP,2002.
  12. Chi-Chun Hsia, Chung-Hsien Wu, and Jung-Yun Wu, “Exploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM based speech synthesis’, in Inernational journal of Audio, Speech and Language processing, Nov 2010,Volume 18, pp,1994-2003.
  13. Hung-Yan GU, Ming-Yen LAI and Sung-Feng TSAI, “Combining HMM spectru models and ANN prosody models for speech synthesis of syllable prominent languages”, in Inernational journal of Audio, Speech and Language processing, 2010, pp,451-454.
  14. Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, and Shrikanth S. Narayanan,”Exploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework”, in Inernational Journal of Audio, Speech and Language processing, May 2008,Volume 16, pp,797-811.
  15. Raja Mohamed S, Raviraj P,” Prosodic Feature Extraction for Regional Tamil dialects”, in Inernational Conference on emerging Trends in electrical and Computer Technology, March 2011, pp 922-925.
  16. Nicolas Obin, Xavier Rodet and Anne Lacheret Dujour,”A multi-level context-dependent prosodic model applied to duration modeling”, in the tenth annual conference,Inerspeech,France,2009.
  17. Nicolas Obin, Pierre Lanchantin, Mathieu Avanzi, Anne Lacheret-Dujour and Xavier Rodet,” Towards improved HMM-based speech synthesis using high- level syntactical features”, in the fifth International Conference on Speech Prosody, Chicago, 2010.
  18. Jan Romportl and Jiri Kala, “Prosody Modeling in Czech Text-to-Speech Synthesis”, in the proceeding of Sixth International workshop on speech synthesis, 2007.
  19. Yu-Lun Chou, Chen-Yu- Chiang, Yih-Ru Wang, Hsui-Min Yu and Sin-Horng Chen, “Prosody labeling and modeling for Mandarin spontaneous Speech”,in the International Conference on Speech Prosody, Chicago, 2010.
  20. Javier Latorre, Sabine Buchholz, Masami kamine, ”Usages of an external duration model for HMM- based speech synthesis”, in fifth International conference on Speech Prosody, Chicago, 2010
  21. Dimitris Spiliotopoulos, Gerasimos Xydas, and Georgios Kouroupetroglou,” Diction Based Prosody Modeling in Table-to-Speech Synthesis”, in LNAI 3658, pp. 294–301, 2005.
  22. Chung-Hsien Wu, Chi-Chun Hsia, Chung-Han Lee, and Mai-Chun Lin,” Hierarchical Prosody Conversion using Regression-Based Clustering for Emotional Speech Synthesis”, in IEEE Transactions on Audio, Speech and Language Processing, Vol. 18,No.6,August 2010.
  23. Dan-ning Jiang, Wei Zhang, Li-qin Shen and Lian- Hong Cai,” Prosody Analysis and Modeling for Emotional Speech Synthesis”, in IEEE proceedings of ICASSP,0-7803-8874-7/05,pp 281-284, 2005.
Index Terms

Computer Science
Information Sciences

Keywords

Prosody model speech synthesis Text to Speech systems