CFP last date
20 December 2024
Reseach Article

Article:Design and Development of a Prosody Generator for Arabic TTS Systems

by Zied Mnasri, Fatouma Boukadida, Noureddine Ellouze
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 12 - Number 1
Year of Publication: 2010
Authors: Zied Mnasri, Fatouma Boukadida, Noureddine Ellouze
10.5120/1641-2206

Zied Mnasri, Fatouma Boukadida, Noureddine Ellouze . Article:Design and Development of a Prosody Generator for Arabic TTS Systems. International Journal of Computer Applications. 12, 1 ( December 2010), 24-31. DOI=10.5120/1641-2206

@article{ 10.5120/1641-2206,
author = { Zied Mnasri, Fatouma Boukadida, Noureddine Ellouze },
title = { Article:Design and Development of a Prosody Generator for Arabic TTS Systems },
journal = { International Journal of Computer Applications },
issue_date = { December 2010 },
volume = { 12 },
number = { 1 },
month = { December },
year = { 2010 },
issn = { 0975-8887 },
pages = { 24-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume12/number1/1641-2206/ },
doi = { 10.5120/1641-2206 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:00:34.530439+05:30
%A Zied Mnasri
%A Fatouma Boukadida
%A Noureddine Ellouze
%T Article:Design and Development of a Prosody Generator for Arabic TTS Systems
%J International Journal of Computer Applications
%@ 0975-8887
%V 12
%N 1
%P 24-31
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Prosody modeling has become the backbone of TTS synthesis systems. Amongst all the prosodic modeling approaches, phonetic methods aiming to predict duration and F0 contour are being very praised, thanks to the development of regression tools, such as neural networks (NN). Besides, parametric representations like Fujisaki model for F0 contour generation help to reduce the problem into the approximation of parameters only. But, prior to the prediction process, text analysis should be carried out first, to select and encode the necessary input features. In our purpose to promote Arabic TTS synthesis, an Integrated Model of Arabic Prosody for Speech Synthesis (IMAPSS) tool has been designed to integrate our developed models for text analysis, NN-based phonemic duration prediction and Fujisaki-inspired F0 contour. Hence, the yielding parameters provide a command file to be read by speech synthesis systems, like MBROLA.

References
  1. Moebius, B. 1997. Synthesizing German F0 contours. In J. Van Santen, J. Sproat, R., Olive, J. and Hirschberg, J., Progress in speech synthesis, Chapter 32, pp 401-416, Springer Verlag, New York.
  2. Fujisaki, H. 2003 Prosody, information and modeling with emphasis on tonal features of speech, in Proceedings of Workshop on spoken language processing, ISCA-supported event, Mumbai, India.
  3. Klatt, D. 1976. The linguistic uses of segmental duration in English: Acoustic and perceptual evidence. Journal of the acoustical society of America, JASA. No 59. pp1208-1221.
  4. Pierhumbert, J. 1980. The phonology and phonetics of English intonation, Ph. D. Thesis, MIT, Cambridge. USA.
  5. Rao, S. and Yegnanarayana, B. 2009. Intonation modeling for Indian languages, Computer speech and language Journal, Volume 23, pp 240-256, Elsevier.
  6. Sun, X. 2002. F0 Generation for speech synthesis using a multi-tier approach, in proceedings of ICSLP’02. pp 2077-2080. Denver, USA.
  7. Fujisaki, H. and Hirose, K. 1984. Analysis of voice fundamental frequency contours for declarative sentences of Japanese, in Journal of the acoustic society of Japan (E), 5(4), pp 233-241.
  8. Haykin, S. 1999. Neural Networks: a comprehensive foundation. 2nd edition. Engelwood Cliffs. Prentice Hall.
  9. MBROLA speech synthesis system, available at http://tcts.fpms.ac.be/Synthesis/mbrola.html.
  10. Boukadida, F. and Ellouze, N. 2004. Arabic intonative speech database. IEEE International Conference on Industrial Technologies. Tunis, Tunisia.
  11. Fujisaki, H and Ohno, S. 1996. Prosodic parameterization of spoken Japanese based on a model of the generation process of F0 contours, in Proceedings of ICSLP’96, vol 4, pp 2439-2442, Philadelphia, PA, USA.
  12. Mixdorff, H. and Jockisch, O. 2001. Building an integrated prosodic model of German. In proceedings of Eurospeech, vol2, pp 947-950. Aaloborg, Denmark.
  13. Mixdorff, H., Fujisaki, H. Chen, G. and Hu, Y. 2003. Towards the automatic extraction of Fujisaki model parameters for Mandarin. In Proceedings of Eurospeech. pp 973-976, Geneva. Switzerland.
  14. Mixdorff, H. 2002 An integrated approach to modeling German prosody. Habilitation Thesis. Technical University of Dresden., Germany
  15. Boukadida, F. 2006. Etude de la prosodie pour un système de synthèse de la parole Arabe standard à partir du texte. Thèse de doctorat, Université Tunis El Manar, Tunisia.
  16. Campbell, N. 1992. Syllable duration modeling by neural networks, PHD thesis, University of Essex, UK.
  17. Barbosa, P. 2004. Caractérisation et génération de la structuration rythmique du Français. Thèse de doctorat. Institut polytechnique de Grenoble. France.
  18. Van Santen, J. 1994. Assignment of segmental durations in Text-To-Speech synthesis, Computer Speech and language 8, pp 265-273.
  19. Sutton, R. 1998. Reinforcement learning, MIT Press, Cambridge, MA, USA.
  20. Baloul, S. 2003. Développement d’un système automatique de synthèse de la parole à partir du texte Arabe voyellé. Thèse de doctorat. Académie de Nantes. Université du Maine. France.
  21. Zaki, A. 2004. Modélisation de la prosodie pour la synthèse de parole Arabe standard à partir du texte, Thèse de Doctorat, Université Bordeaux I, France.
Index Terms

Computer Science
Information Sciences

Keywords

Arabic TTS prosodic parameters text analysis phonemic duration F0 contour neural networks Fujisaki model