We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Arabic Unit Selection Emotional Speech Synthesis using Blending Data Approach

by Waleed M. Azmy, Sherif Abdou, Mahmoud Shoman
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 81 - Number 8
Year of Publication: 2013
Authors: Waleed M. Azmy, Sherif Abdou, Mahmoud Shoman
10.5120/14033-1887

Waleed M. Azmy, Sherif Abdou, Mahmoud Shoman . Arabic Unit Selection Emotional Speech Synthesis using Blending Data Approach. International Journal of Computer Applications. 81, 8 ( November 2013), 22-28. DOI=10.5120/14033-1887

@article{ 10.5120/14033-1887,
author = { Waleed M. Azmy, Sherif Abdou, Mahmoud Shoman },
title = { Arabic Unit Selection Emotional Speech Synthesis using Blending Data Approach },
journal = { International Journal of Computer Applications },
issue_date = { November 2013 },
volume = { 81 },
number = { 8 },
month = { November },
year = { 2013 },
issn = { 0975-8887 },
pages = { 22-28 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume81/number8/14033-1887/ },
doi = { 10.5120/14033-1887 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:56:36.100233+05:30
%A Waleed M. Azmy
%A Sherif Abdou
%A Mahmoud Shoman
%T Arabic Unit Selection Emotional Speech Synthesis using Blending Data Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 81
%N 8
%P 22-28
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper introduce the work done to build an Arabic unit selection voice that could carry emotional information. Three emotional sates were covered; normal, sad and questions. An emotional speech classifier was used to enhance the intelligibility of the used recorded speech database. The classification information was employed in the proposed target cost to produce more natural and emotive synthetic speech. The system is evaluated according to the naturalness and emotiveness of the produced speech. The system evaluations show significant increase in the naturalness and emotiveness scores.

References
  1. M Montero, J M Gutierrez-Arriola, S Palazuelos, E Enriquez, S Aguilera, J M Pardo. 1998. Emotional speech synthesis: from speech database to TTS. ICSLP
  2. Schröder, M. (2001). Emotional speech synthesis: A review. In Eurospeech 2001 Scandinavia. Proceedings of the 7th european conference on speech communication and technology, 2nd interspeech event. (pp. 561-4). Aalborg, Denmark, September 3-7, 2001.
  3. Roberto Barra-Chicote, Junichi Yamagishi, Simon King, Juan Manuel Montero, and Javier Macías Guarasa. 2010. Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech, Speech Communication 52(5):394-404
  4. Marc Schröder. Expressive Speech Synthesis: Past, Present, and Possible Futures. 2009. Affective Information Processing, chapter 7.
  5. Marc Schröder. Emotional Speech Synthesis for Emotionally-Rich Virtual Worlds. 2003. the 8th International Conference on 3D Web Technology (Web3D)
  6. Brigitte Krenn, Hannes Pirker, Martine Grice, Paul Piwek, Kees van Deemter, Marc Schröder, Martin Klesen, and Erich Gstrein. Generation of multimodal dialogue for net environments. In Proceedings of Konvens, Saarbr¨ucken, Germany, 2002. URL http://www. ai. univie. ac. at/NECA.
  7. Irene Albrecht, J¨org Haber, Kolja Khler, Marc Schröder, and H. -P. Seidel. "May I talk to you? :-)" – Facial animation from text. In Proceedings of Pacific Graphics 2002, pages 77–86, 2002.
  8. Taylor,P. A. , Black, A. W. , & Caley, R. (1998) The architecture of the festival speech synthesis system. In The Third ESCA Workshop in Speech Synthesis, pages 147-151, Jenolan Caves, Australia
  9. Black, A. W. , 2003. Unit selection and emotional speech. In: Proc. EUROSPEECH 2003, pp. 1649–1652.
  10. Alan W. Black. 2002. Perfect synthesis for all of the people all of the time. IEEE TTS Workshop 2002.
  11. Robert A. J. Clark, Korin Richmond, Simon King. 2007. Multisyn: Open-domain unit selection for the Festival speech synthesis system. Speech Communication, 49(4):317-330.
  12. P. Taylor, A. Black, and R. Caley. 1998. The architecture of the festival speech synthesis system. In 3rd ESCA Workshop on Speech Synthesis, pages 147--141, Jenolan Caves, Australia.
  13. Wael Hamza and Mohsen Rashwan. 2000. ''Concatenative Arabic speech synthesis using large database'', In Proceedings of ICSLP2000, vol. 2, pages 182-185, Beijing, China.
  14. Yong Zhao, Peng Liu, Yusheng Li, Yining Chen and Min Chu. 2006. Acoustics, Speech and Signal Processing. ICASSP 2006 Proceedings. 2006 IEEE International Conference on (Volume:1 )
  15. Maria Assaf, Harald Berthelsen and Beata Megyesi. (2004). "A Prototype of an Arabic Diphone Speech Synthesizer in Festival". Msc Thesis.
  16. Al-Haj, H. , Hsiao, R. , Lane, I. , Black, A. , and Waibel, A. "Pronunciation Modeling for Dialectal Arabic Speech Recognition" ASRU 2009, Merano, Italy.
  17. Hassan Al-haj, Roger Hsiao, Ian Lane and Alan W. Black. (2009). Pronunciation Modeling for Dialectal Arabic Speech Recognition. ASRU, page 525-528.
  18. Anumanchipalli, G. , Prahallad, K. , Black, A. 2011. Festvox: Tools for Creation and Analysis of Large Speech Corpora. in Proceedings of Very Large Scale Phonetics Research, UPenn, 2011. `
  19. Stefan Steidl, Tim Polzehl, H. Timothy Bunnell, Ying Dou, Prasanna Kumar Muthukumar, Daniel Perry, Kishore Prahallad, Callie Vaughn, Alan W. Black, and Florian Metze, Emotion Identification for Evaluation of Synthesized Emotional Speech Speech Prosody 2012, Shanghai, China.
Index Terms

Computer Science
Information Sciences

Keywords

Speech synthesis Emotion Festival Emovoice.