National Conference on Recent Trends in Computer Science and Engineering |
Foundation of Computer Science USA |
MEDHA2015 - Number 2 |
June 2015 |
Authors: N.s. Raut, V.m. Thakare, S.s. Sherekar |
b18493ea-611a-4fbc-acbb-88913e82156c |
N.s. Raut, V.m. Thakare, S.s. Sherekar . Efficient Methodology for Segmentation of Speech Signals in Text to Speech. National Conference on Recent Trends in Computer Science and Engineering. MEDHA2015, 2 (June 2015), 20-22.
This paper proposes a method for tuning the weights of unit selection cost functions in syllable based text-to-speech (TTS) synthesis system, two-stage feedforward neural network (FFNN) based approach for modeling fundamental frequency (F0) values of a sequence of syllables. Unrestricted Text To Speech System (TTS) is capable of synthesize different domain speech with improved quality. A clustering technique is used in annotated speech corpus that provides way to select the appropriate unit for concatenation, based on the lowest total join cost of the speech unit. Unit selection cost functions, namely target cost and concatenation cost, are designed appropriate to syllables. The method tunes the weights in such a way that perceptual preference patterns are appropriately considered while selecting the units. The method uses genetic algorithm to derive the optimal weights. From the evaluation, it is observed that prediction accuracy is better for two stage FFNN models, compared to the other different models.