CFP last date
20 January 2025
Reseach Article

Efficient Methodology for Segmentation of Speech Signals in Text to Speech

Published on June 2015 by N.s. Raut, V.m. Thakare, S.s. Sherekar
National Conference on Recent Trends in Computer Science and Engineering
Foundation of Computer Science USA
MEDHA2015 - Number 2
June 2015
Authors: N.s. Raut, V.m. Thakare, S.s. Sherekar
b18493ea-611a-4fbc-acbb-88913e82156c

N.s. Raut, V.m. Thakare, S.s. Sherekar . Efficient Methodology for Segmentation of Speech Signals in Text to Speech. National Conference on Recent Trends in Computer Science and Engineering. MEDHA2015, 2 (June 2015), 20-22.

@article{
author = { N.s. Raut, V.m. Thakare, S.s. Sherekar },
title = { Efficient Methodology for Segmentation of Speech Signals in Text to Speech },
journal = { National Conference on Recent Trends in Computer Science and Engineering },
issue_date = { June 2015 },
volume = { MEDHA2015 },
number = { 2 },
month = { June },
year = { 2015 },
issn = 0975-8887,
pages = { 20-22 },
numpages = 3,
url = { /proceedings/medha2015/number2/21434-8028/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Recent Trends in Computer Science and Engineering
%A N.s. Raut
%A V.m. Thakare
%A S.s. Sherekar
%T Efficient Methodology for Segmentation of Speech Signals in Text to Speech
%J National Conference on Recent Trends in Computer Science and Engineering
%@ 0975-8887
%V MEDHA2015
%N 2
%P 20-22
%D 2015
%I International Journal of Computer Applications
Abstract

This paper proposes a method for tuning the weights of unit selection cost functions in syllable based text-to-speech (TTS) synthesis system, two-stage feedforward neural network (FFNN) based approach for modeling fundamental frequency (F0) values of a sequence of syllables. Unrestricted Text To Speech System (TTS) is capable of synthesize different domain speech with improved quality. A clustering technique is used in annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit. Unit selection cost functions, namely target cost and concatenation cost, are designed appropriate to syllables. The method tunes the weights in such a way that perceptual preference patterns are appropriately considered while selecting the units. The method uses genetic algorithm to derive the optimal weights. From the evaluation, it is observed that prediction accuracy is better for two stage FFNN models, compared to the other different models.

References
  1. N. P. Narendra, K. Sreenivasa Rao," Optimal weight tuning method for unit selection cost functions in syllable based text-to-speech synthesis", Science Direct, VOL. 13, NO. 2, PP. 773-781. Feb-2013.
  2. V. Ramu Reddy, K. Sreenivasa Rao," Two-stage intonation modeling using feedforward neural networksfor syllable based text-to-speech synthesis", Science Direct, VOL. 27, NO. 5, PP. 1105-1126, Aug-2013.
  3. Sudhakar Sangeetha , Sekar Jothilakshmi," Syllable based text to speech synthesis system using auto associative neural network prosody prediction", Springer Science, Vol. 17,No. 2,PP. 91-98, June 2014.
  4. Black, A. W. , Taylor, P. , Caley, R. ,. "The Festival speech synthesis system", Manual and source code available at www. cstr. ed. ac. uk/ projects/festival. html, 2009.
  5. Rao, K. S. , & Yegnanarayana, B," Prosodic manipulation usinginstants of significant excitation". In Proc. IEEE int. conf. multimedia, 2003 and expo, Baltimore Maryland, USA (pp. 389–392).
  6. Rao, K. S. , Yegnanarayana, B. ," Intonation modeling for Indian languages" Computer Speech and Language 23 (April), 240–256. 2009.
  7. Rao, K. S. , & Yegnanarayana, B. . "Prosodic manipulation using instants of significant excitation" In Proc. IEEE int. conf. multimediaand expo, Baltimore Maryland, USA,pp. 389–392 ,2003.
  8. Rao, K. S. ," Acquisition and incorporation prosody knowledge for speech systems in Indian languages". PhD Thesis, Dept. of Computer Science and Engineering, Indian Institute of Technology Madras, Chennai, India, May 2005.
  9. D. E. Goldberg, "The Design of Innovation", Lessons From and For CompetentGenetic Algorithms, Kluwer Academic Publisher, Dordrecht, 2002.
  10. Lokesh, S. , & Balakrishnan, G. Speech enhancement using mel-LPC cepstrum and vector quantization for ASR. EuropeanJournal of Scientific Research, 73(2), 202–209. 2012.
Index Terms

Computer Science
Information Sciences

Keywords

Text To Speech Synthesis (tts) Concatenative Synthesis Approach Intonation Models Feed Forward Neural Networks Unit Selection Target Cost Tuning Of Weights.