Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey

Rajeswari K C; Uma Maheswari P

Call for Paper

April Edition

IJCA solicits high quality original research papers for the upcoming April edition of the journal. The last date of research paper submission is 20 March 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey

by Rajeswari K C, Uma Maheswari P

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 39 - Number 16

Year of Publication: 2012

Authors: Rajeswari K C, Uma Maheswari P

10.5120/4902-7399

Rajeswari K C, Uma Maheswari P . Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey. International Journal of Computer Applications. 39, 16 ( February 2012), 8-11. DOI=10.5120/4902-7399

@article{ 10.5120/4902-7399,

author = { Rajeswari K C, Uma Maheswari P },

title = { Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey },

journal = { International Journal of Computer Applications },

issue_date = { February 2012 },

volume = { 39 },

number = { 16 },

month = { February },

year = { 2012 },

issn = { 0975-8887 },

pages = { 8-11 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume39/number16/4902-7399/ },

doi = { 10.5120/4902-7399 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:26:35.385607+05:30

%A Rajeswari K C

%A Uma Maheswari P

%T Prosody Modeling Techniques for Text-to-Speech Synthesis Systems ñ A Survey

%J International Journal of Computer Applications

%@ 0975-8887

%V 39

%N 16

%P 8-11

%D 2012

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper presents a study on prosody modeling for speech synthesis. Any Text to Speech system comprises of two phases. One is text analysis and second is speech synthesis. The task of text analysis is to find the words and the task of speech synthesis is to generate the speech. To attain this, different models are available such as text as language models, grapheme to phoneme models, full linguistic analysis model and complete prosody generation model. In complete prosody generation model, the quantities like phrasing, stress and the like are determined to generate naturalness bearing synthetic voice. Towards generating such a speech, an explicit prosodic model is required. This makes the speech more understandable. Many researches have been done in this stream, but still better solution is required. In this paper, the strength and weaknesses of different approaches of prosody models are discussed.

References

M. Nageshwara Rao, Samuel Thomas, T. Nagarajan and Hema A. Murthy, ‚ÄúText-to-speech synthesis using syllable like units,‚Äù in National Conference on Communication, Kharagpur, India, Jan 2005, pp 277-280.
G.L.Jayavardhana Rama, A G Ramakrishnan, R. Muralishankar and Vijay Venkatesh‚Äù Thirukkural ‚Äì A text to speech synthesis system‚Äù. Proc. Tamil Internet 2001, Kuala Lumpur 2001, 92-97.
Vinodh M Vishwanath, Ashwin Bellur, Badri Narayan K, Deepali M Thakare, Anila Susan, Suthakar N M and Hema A Murthy,‚ÄúUsing Polysyllabic units for Text to Speech Synthesis in Indian languages,‚Äù Proceedings of National Conference on Communication,pp.1-5, 29-31, Jan. 2010.
X.J. Ma, W. Zhang, W.B. Zhu, Q. Shi and L. Jin, "Probability Based Prosody Model for Unit Selection", ICASSP 2004, Montreal, Canada
Wei Zhang, Liang Gu and Yuqing Gao ‚ÄúRecent improvements of probability based prosody model for unit selection in concatenative Text to Speech‚Äù, in the proceedings of ICASSP 2009, pp 3777-3780
N. Sridhar Krishna, Partha Pratim Talukdar, Kalika Bali, A. G. Ramakrishnan, ‚ÄúDuration Modeling for Hindi Text to Speech Synthesis System‚Äù, in Proc. ICSLP 2004, South Korea, 2004.
A.S.Madhukumar, S.Rajendran and B.Yegnanarayana, ‚ÄúIntonation component of a Text to Speech system for Hindi‚Äù, Proceedings of International journal of Computer Speech and Language, 1993, Volume7, pp 283-301
Ashwin Bellur, K Badri Narayan, Raghava Krishnan K, Hema A Murthy, ‚ÄúProsody modeling for syllable based concatenative speech synthesis of Hindi and Tamil‚Äù, in National conference on Communications, Jan 2011, pp 28-30.
Samuel Thomas, M. Nageshwara Rao, Hema A.Murthy and C.S. Ramalingam, ‚ÄúNatural sounding TTS based on syllable-like units,‚Äù in the proceedings of the 14th European Signal Processing Conference, Florence, Italy, Sep 2006.
Ovidiu Buza, Gavril Toderean, Jozsef Domokos, ‚ÄúA rule based approach to build a Text to speech system for Romanian‚Äù, in proceedings of international Conference on communications, June 2010, pp. 33-36.
G. L. Jayavardhana Rama, A. G. Ramakrishnan, R. Muralishankar and R Prathibha, ‚ÄúA Complete Text-To- Speech Synthesis System in Tamil‚Äù, in 0-7803-7395-2/02, IEEE proceedings of ICASSP,2002.
Chi-Chun Hsia, Chung-Hsien Wu, and Jung-Yun Wu, ‚ÄúExploiting prosody hierarchy and dynamic features for pitch modeling and generation in HMM based speech synthesis‚Äô, in Inernational journal of Audio, Speech and Language processing, Nov 2010,Volume 18, pp,1994-2003.
Hung-Yan GU, Ming-Yen LAI and Sung-Feng TSAI, ‚ÄúCombining HMM spectru models and ANN prosody models for speech synthesis of syllable prominent languages‚Äù, in Inernational journal of Audio, Speech and Language processing, 2010, pp,451-454.
Vivek Kumar Rangarajan Sridhar, Srinivas Bangalore, and Shrikanth S. Narayanan,‚ÄùExploiting acoustic and syntactic features for automatic prosody labeling in a maximum entropy framework‚Äù, in Inernational Journal of Audio, Speech and Language processing, May 2008,Volume 16, pp,797-811.
Raja Mohamed S, Raviraj P,‚Äù Prosodic Feature Extraction for Regional Tamil dialects‚Äù, in Inernational Conference on emerging Trends in electrical and Computer Technology, March 2011, pp 922-925.
Nicolas Obin, Xavier Rodet and Anne Lacheret Dujour,‚ÄùA multi-level context-dependent prosodic model applied to duration modeling‚Äù, in the tenth annual conference,Inerspeech,France,2009.
Nicolas Obin, Pierre Lanchantin, Mathieu Avanzi, Anne Lacheret-Dujour and Xavier Rodet,‚Äù Towards improved HMM-based speech synthesis using high- level syntactical features‚Äù, in the fifth International Conference on Speech Prosody, Chicago, 2010.
Jan Romportl and Jiri Kala, ‚ÄúProsody Modeling in Czech Text-to-Speech Synthesis‚Äù, in the proceeding of Sixth International workshop on speech synthesis, 2007.
Yu-Lun Chou, Chen-Yu- Chiang, Yih-Ru Wang, Hsui-Min Yu and Sin-Horng Chen, ‚ÄúProsody labeling and modeling for Mandarin spontaneous Speech‚Äù,in the International Conference on Speech Prosody, Chicago, 2010.
Javier Latorre, Sabine Buchholz, Masami kamine, ‚ÄùUsages of an external duration model for HMM- based speech synthesis‚Äù, in fifth International conference on Speech Prosody, Chicago, 2010
Dimitris Spiliotopoulos, Gerasimos Xydas, and Georgios Kouroupetroglou,‚Äù Diction Based Prosody Modeling in Table-to-Speech Synthesis‚Äù, in LNAI 3658, pp. 294‚Äì301, 2005.
Chung-Hsien Wu, Chi-Chun Hsia, Chung-Han Lee, and Mai-Chun Lin,‚Äù Hierarchical Prosody Conversion using Regression-Based Clustering for Emotional Speech Synthesis‚Äù, in IEEE Transactions on Audio, Speech and Language Processing, Vol. 18,No.6,August 2010.
Dan-ning Jiang, Wei Zhang, Li-qin Shen and Lian- Hong Cai,‚Äù Prosody Analysis and Modeling for Emotional Speech Synthesis‚Äù, in IEEE proceedings of ICASSP,0-7803-8874-7/05,pp 281-284, 2005.

Index Terms

Computer Science

Information Sciences

Keywords

Prosody model speech synthesis Text to Speech systems