CFP last date
20 January 2025
Reseach Article

Hidden Markov Model based Speech Synthesis: A Review

by Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 130 - Number 3
Year of Publication: 2015
Authors: Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi
10.5120/ijca2015906965

Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi . Hidden Markov Model based Speech Synthesis: A Review. International Journal of Computer Applications. 130, 3 ( November 2015), 35-39. DOI=10.5120/ijca2015906965

@article{ 10.5120/ijca2015906965,
author = { Sangramsing Kayte, Monica Mundada, Jayesh Gujrathi },
title = { Hidden Markov Model based Speech Synthesis: A Review },
journal = { International Journal of Computer Applications },
issue_date = { November 2015 },
volume = { 130 },
number = { 3 },
month = { November },
year = { 2015 },
issn = { 0975-8887 },
pages = { 35-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume130/number3/23191-2015906965/ },
doi = { 10.5120/ijca2015906965 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:24:03.426796+05:30
%A Sangramsing Kayte
%A Monica Mundada
%A Jayesh Gujrathi
%T Hidden Markov Model based Speech Synthesis: A Review
%J International Journal of Computer Applications
%@ 0975-8887
%V 130
%N 3
%P 35-39
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A Text-to-speech (TTS) synthesis system is the artificial production of human system. This paper reviews recent research advances in field of speech synthesis with related to statistical parametric approach to speech synthesis based on HMM. In this approach, Hidden Markov Model based Text to speech synthesis (HTS) is reviewed in brief. The HTS is based on the generation of an optimal parameter sequence from subword HMMs. The quality of HTS framework relies on the accurate description of the phoneset. The most attractive part of HTS system is the prosodic characteristics of the voice can be modified by simply varying the HMM parameters, thus reducing the large storage requirement.

References
  1. T. Dutoit, “An Introduction to Text-to-Speech Synthesis”, Kluwer Academic Publishers, 1997.
  2. Black, A. Zen, H., Tokuda, K. “Statistical Parametric Synthesis”, in proc. ICASSP, Honululu, USA,2007.
  3. X.Huang, A.Acero, H.-W. Hon, “Spoken Language Processing”, Prentice Hall PTR, 2001.
  4. D. Jurafsky and J.H. Martin, “Speech and Language Processing”, Pearson Education, 2000.
  5. Paul Taylor, “Text to Speech Synthesis”, University of Cambridge, pp.442-446.
  6. Newton, “Review of methods of Speech Synthesis”, M.Tech Credit Seminar Report, Electronic Systems Group, November, 2011, pp. 1-15
  7. Christopher Richards, “Normalization of non-standard words”. Computer Speech and Language (2001), pp.287–333
  8. M.B.Chandak, Dr.R.V.Dharaskar and Dr.V.M.Thakre,”Text to Speech with Prosody Feature: Implementation of Emotion in Speech Output using Forward Parsing”, International Journal of Computer science and Security, Volume (4), Issue (3)
  9. Ramani Boothalingam,V Sherlin Solomi, Anushiya Rachel Gladston,S Lilly Christina, “Development and Evaluation of Unit Selection and HMM-Based Speech Synthesis Systems for Tamil”, 978-1-4673-5952-8/13, IEEE 2013 National Conference
  10. Heiga Zen, Tomoki Toda and Keiichi Tokuda. “The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006”, INTERSPEECH 2005.
  11. J. Ferguson, Ed., “Hidden Markov Models for speech” IDA, Princeton, NJ, 1980
  12. L.R. Rabiner, “A tutorial on hidden markov models and selected applications in speech recognition” Proc. IEEE, 77(2), pp.257-286, 1989
  13. L.R.Rabiner and B.H. Juang, “Fundamentals of speech recognition”, Prentice-Hall, Englewood Cliff,New Jersey,1993.
  14. Furtado X A & Sen A, “Synthesis of unlimited speech in Indian Languages using formant-based rules”’ Sadhana,1996,pp 345-362 .
  15. Agrawal S S & Stevens K, “Towards synthesis of Hindi consonants using KLSYN88”, Proc ICSLP92, Canada, 1992, pp.177-180 .
  16. Dan T K, Datta A K & Mukherjee, B, “Speech synthesis using signal concatenation”, J ASI, vol. XVIII (3&4), 1995, pp 141-145 .
  17. Kishore S. P., Kumar R & Sanghal R, “A data driven synthesis approach for Indian language using syllable as basic unit”, Proc ICON 2002, Mumbai, 2002 .
  18. Agrawal S. S. 2010, “Recent Developments in Speech Corpora in Indian Languages: Country Report of India”, O-COCOSDA, Nepal.
  19. B. Ramani, S.Lilly Christina, G Anushiya Rachel, V Sherlin Solomi,Mahesh Kumar Nandwana, Anusha Prakash,, Aswin Shanmugam S, Raghava Krishnan, S P Kishore, K Samudravijaya, P Vijayalakshmi, T Nagarajan and Hema A Murthy.” A Common Attribute based Unified HTS framework for Speech Synthesis in Indian Languages”. 8th ISCA Speech Synthesis Workshop. August 31 – September 2, 2013,Barcelona, Spain
  20. Zen HeigaNose, Takashi Yamagishi, Junichi Sako, Shinji Masuko, Takashi, Black, Alan W.” The HMM-based speech synthesis system (HTS) version 2.0”. 6th ISCA Workshop on Speech Synthesis, Bonn, Germany, August 22-24, 2007.
  21. K. Tokuda , H. Zen, J. Yamagishi, T. Masuko, S. Sako, T. Toda, A.W. Black, T. Nose , and K. Oura, “The HMM based synthesis system(HTS)” http://hts.sp.nitech.ac.jp/.
  22. H. Zen, K. Tokuda, T. Masuko, T. Kobayashi and T. Kitamura,“A hidden semi-Markov model-based speech synthesis system.” IEICE Trans. Inf.Syst., E90-D (5):825–834, 2007.
  23. J. Yamagishi and T. Kobayashi. Average-voice based speech synthesis using HSMM-based speaker adaptation and adaptive training. IEICE Trans. Inf. Syst., E90-D (2):533–543, 2007.
  24. J. Yamagishi, T. Kobayashi, Y. Nakano, K. Ogata, and J. Isogai, “Analysis of speaker adaptation algorithms for HMM-based speech synthesis and a constrained SMAPLR adaptation algorithm”, IEEE Trans. Audio Speech Lang. Process., 17(1), pp.66–83, 2009.
  25. H. Kawahara, I. Masuda-Katsuse, and A.de Cheveign´e, “Restructuring speech representations using a pitch-adaptive time-frequency smoothing and an instantaneous-frequency-based F0 extraction: possible role of a repetitive structure in sounds”, Speech Comm., 27:187–207, 1999.
  26. H. Zen, T. Toda, M. Nakamura, and K. Tokuda, “Details of Nitech HMM based speech synthesis system for the Blizzard Challenge 2005. IEICE Trans. Inf. Syst., E90-D(1):325–333, Jan. 2007.
  27. H. Zen, T. Toda, and K. Tokuda, “The Nitech-NAIST HMM-based speech synthesis system for the Blizzard Challenge 2006”, In Blizzard Challenge Workshop, 2006.
  28. J. Yamagishi, T. Nose, H. Zen, Z.-H. Ling, T. Toda, K. Tokuda, S. King, and S. Renals,“A robust speaker-adaptive HMM-based text-to-speech synthesis”, IEEE Trans. Audio Speech Lang. Process., 2009. (accept for publication).
  29. T.Yoshimura, K.Tokuda, T. Masuko, T. Kobayashi and T. Kitamura,“Simultaneous Modeling of Spectrum, Pitch and Duration in HMM-Based Speech Synthesis”In Proc. of ICASSP 2000, vol 3, pp.1315-1318, June 2000.
  30. Dempster, A., Laird, N., Rubin, D., 1977,“ Maximum likelihood from incomplete data via the EM algorithm”, Journal of Royal Statistics Society 39, 1–38.
  31. Fukada,T., Tokuda, K., Kobayashi, T., Imai, S., 1992, “An adaptive algorithm for mel-cepstral analysis of speech”, In Proc. ICASSP. pp. 137–140.
  32. Young, S., Evermann, G., Gales, M., Hain, T., Kershaw, D., Liu, X.-Y., Moore, G., Odell, J., Ollason, D., Povey, D., Valtchev, V., Woodland, P., 2006,“The Hidden Markov Model Toolkit (HTK) version 3.4. http://htk.eng.cam.ac.uk/.
  33. Yoshimura, T., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T. 1998, “Duration modeling for HMM-based speech synthesis”, In Proc. ICSLP. pp. 29–32.
  34. Ishimatsu, Y., Tokuda, K., Masuko, T., Kobayashi, T., Kitamura, T., 2001,“Investigation of state duration model based on gamma distribution for HMM based speech synthesis”, In Tech. Rep. of IEICE. vol. 101 of SP 2001-81. pp. 57–62, (In Japanese).
  35. Odell, J., 1995,“The use of context in large vocabulary speech recognition”, Ph.D. thesis, University of Cambridge.
  36. Tokuda, K., Yoshimura, T., Masuko, T., Kobayashi, T., Kitamura, T., 2000,“Speech parameter generation algorithms for HMM-based speech synthesis”In Proc. ICASSP. pp. 1315–1318.
  37. Tachiwa, W., Furui, S., “A study of speech synthesis using HMMs” In: Proc. Spring Meeting of ASJ. pp. 239–240,(In Japanese),1999.
  38. Imai, S., Sumita, K., Furuichi, C., “Mel log spectrum approximation (MLSA) filter for speech synthesis”, Electronics and Communications in Japan 66 (2), 10–18, 1983 .
  39. Stylianou, Y., Cap´pe,O., Moulines, E., 1998, “Continuous probabilistic transform for voice conversion”, IEEE Trans. Speech Audio Process. 6 (2), 131–142.
  40. Sangramsing Kayte , Kavita Waghmare , Dr. Bharti Gawali "Marathi Speech Synthesis: A review" International Journal on Recent and Innovation Trends in Computing and Communication ISSN: 2321-8169 Volume: 3 Issue: 6
  41. Monica Mundada, Bharti Gawali, Sangramsing Kayte "Recognition and classification of speech and its related fluency disorders" Monica Mundada et al, / (IJCSIT)International Journal of Computer Science and Information Technologies, Vol. 5 (5) , 2014, 6764-6767
  42. Monica Mundada, Sangramsing Kayte, Dr. Bharti Gawali "Classification of Fluent and Dysfluent Speech Using KNN Classifier" International Journal of Advanced Research in Computer Science and Software Engineering Volume 4, Issue 9, September 2014
  43. Sangramsing Kayte, Monica Mundada "Study of Marathi Phones for Synthesis of Marathi Speech from Text" International Journal of Emerging Research in Management &Technology ISSN: 2278-9359 (Volume-4, Issue-10) October 2015.
Index Terms

Computer Science
Information Sciences

Keywords

TTS speech corpus Marathi phonemes.