Efficient Methodology for Segmentation of Speech Signals in Text to Speech

Call for Paper

May Edition

IJCA solicits high quality original research papers for the upcoming May edition of the journal. The last date of research paper submission is 20 April 2026

Submit your paper

Know more

The week's pick

A Unified NIST SP 800-90B Validation Framework for CMOS True Random Number Generators and Quantum Random Number Generators

Che-Ping Lin

Random Articles

Reseach Article

Efficient Methodology for Segmentation of Speech Signals in Text to Speech

Published on June 2015 by N.s. Raut, V.m. Thakare, S.s. Sherekar

National Conference on Recent Trends in Computer Science and Engineering

Foundation of Computer Science USA

MEDHA2015 - Number 2

June 2015

Authors: N.s. Raut, V.m. Thakare, S.s. Sherekar

N.s. Raut, V.m. Thakare, S.s. Sherekar . Efficient Methodology for Segmentation of Speech Signals in Text to Speech. National Conference on Recent Trends in Computer Science and Engineering. MEDHA2015, 2 (June 2015), 20-22.

@article{

author = { N.s. Raut, V.m. Thakare, S.s. Sherekar },

title = { Efficient Methodology for Segmentation of Speech Signals in Text to Speech },

journal = { National Conference on Recent Trends in Computer Science and Engineering },

issue_date = { June 2015 },

volume = { MEDHA2015 },

number = { 2 },

month = { June },

year = { 2015 },

issn = 0975-8887,

pages = { 20-22 },

numpages = 3,

url = { /proceedings/medha2015/number2/21434-8028/ },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Proceeding Article

%1 National Conference on Recent Trends in Computer Science and Engineering

%A N.s. Raut

%A V.m. Thakare

%A S.s. Sherekar

%T Efficient Methodology for Segmentation of Speech Signals in Text to Speech

%J National Conference on Recent Trends in Computer Science and Engineering

%@ 0975-8887

%V MEDHA2015

%N 2

%P 20-22

%D 2015

%I International Journal of Computer Applications

Abstract

This paper proposes a method for tuning the weights of unit selection cost functions in syllable based text-to-speech (TTS) synthesis system, two-stage feedforward neural network (FFNN) based approach for modeling fundamental frequency (F0) values of a sequence of syllables. Unrestricted Text To Speech System (TTS) is capable of synthesize different domain speech with improved quality. A clustering technique is used in annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit. Unit selection cost functions, namely target cost and concatenation cost, are designed appropriate to syllables. The method tunes the weights in such a way that perceptual preference patterns are appropriately considered while selecting the units. The method uses genetic algorithm to derive the optimal weights. From the evaluation, it is observed that prediction accuracy is better for two stage FFNN models, compared to the other different models.

References

N. P. Narendra, K. Sreenivasa Rao," Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis", Science Direct, VOL. 13, NO. 2, PP. 773-781. Feb-2013.
V. Ramu Reddy, K. Sreenivasa Rao," Two-stage intonation modeling using feedforward neural networksfor syllable based text-to-speech synthesis", Science Direct, VOL. 27, NO. 5, PP. 1105-1126, Aug-2013.
Sudhakar Sangeetha , Sekar Jothilakshmi," Syllable based text to speech synthesis system using auto associative neural network prosody prediction", Springer Science, Vol. 17,No. 2,PP. 91-98, June 2014.
Black, A. W. , Taylor, P. , Caley, R. ,. "The Festival speech synthesis system", Manual and source code available at www. cstr. ed. ac. uk/ projects/festival. html, 2009.
Rao, K. S. , & Yegnanarayana, B," Prosodic manipulation usinginstants of significant excitation". In Proc. IEEE int. conf. multimedia, 2003 and expo, Baltimore Maryland, USA (pp. 389–392).
Rao, K. S. , Yegnanarayana, B. ," Intonation modeling for Indian languages" Computer Speech and Language 23 (April), 240–256. 2009.
Rao, K. S. , & Yegnanarayana, B. . "Prosodic manipulation using instants of significant excitation" In Proc. IEEE int. conf. multimediaand expo, Baltimore Maryland, USA,pp. 389–392 ,2003.
Rao, K. S. ," Acquisition and incorporation prosody knowledge for speech systems in Indian languages". PhD Thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005.
D. E. Goldberg, "The Design of Innovation", Lessons From and For CompetentGenetic Algorithms, Kluwer Academic Publisher, Dordrecht, 2002.
Lokesh, S. , & Balakrishnan, G. Speech enhancement using mel-LPC cepstrum and vector quantization for ASR. EuropeanJournal of Scientific Research, 73(2), 202–209. 2012.

Index Terms

Computer Science

Information Sciences

Keywords

Text To Speech Synthesis (tts) Concatenative Synthesis Approach Intonation Models Feed Forward Neural Networks Unit Selection Target Cost Tuning Of Weights.