CFP last date
20 December 2024
Reseach Article

An Assistive Reading System for Visually Impaired using OCR and TTS

by Akshay Sharma, Abhishek Srivastava, Adhar Vashishth
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 95 - Number 2
Year of Publication: 2014
Authors: Akshay Sharma, Abhishek Srivastava, Adhar Vashishth
10.5120/16566-6231

Akshay Sharma, Abhishek Srivastava, Adhar Vashishth . An Assistive Reading System for Visually Impaired using OCR and TTS. International Journal of Computer Applications. 95, 2 ( June 2014), 13-18. DOI=10.5120/16566-6231

@article{ 10.5120/16566-6231,
author = { Akshay Sharma, Abhishek Srivastava, Adhar Vashishth },
title = { An Assistive Reading System for Visually Impaired using OCR and TTS },
journal = { International Journal of Computer Applications },
issue_date = { June 2014 },
volume = { 95 },
number = { 2 },
month = { June },
year = { 2014 },
issn = { 0975-8887 },
pages = { 13-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume95/number2/16566-6231/ },
doi = { 10.5120/16566-6231 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:18:23.320801+05:30
%A Akshay Sharma
%A Abhishek Srivastava
%A Adhar Vashishth
%T An Assistive Reading System for Visually Impaired using OCR and TTS
%J International Journal of Computer Applications
%@ 0975-8887
%V 95
%N 2
%P 13-18
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Reading machines are mechatronic devices which use optical character recognition and text-to-speech technology in order to output synthetic voice from printed text. In this paper an assistive system has been proposed for visually impaired or blind persons. It reads textual information on papers and produces corresponding voice using OCR (Optical Character Recognition)and TTS (Text-to-speech) system. To localize text regions in images connected component labeling approach using histogram analysis is done on binarized image. TTS system using Concatenative synthesis based on SDK (Software Development Kit) platform is used. This system is operated via a voice-based user interface and also has a user friendly GUI (graphical user interface) to scan the text and to control various speech parameters. Speech signal produced can be saved and reproduced for later use.

References
  1. M. Lyu, J. Song, M. Cai, A comprehensive method for multilingual video text detection, localization, and extraction, IEEE Transactions on Circuits and Systems for Video Technology 15 (2) (2005) 243–255.
  2. J. Lim, J. Park, G. G. Medioni, Text segmentation in color images using tensor voting, Image and Vision Computing 25 (5) (2007) 671–685
  3. K. I. Kim, K. Jung, J. H. Kim, Texture-based approach for text detection in images using support vector machines and continuously adaptive mean shift algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence 25 (12) (2003) 1631–1639
  4. S. Kumar, R. Gupta, N. Khanna, S. Chaudhury, S. D. Joshi, Text extraction and document image segmentation using matched wavelets and MFR model, IEEE Transactions on Image Processing 16 (8) (2007) 2117–2128.
  5. D. Chen, O. Jean-Marc, B. Herve, Text detection and recognition in images and video frames, Pattern Recognition 37 (3) (2004) 595–608.
  6. C. Jung, Q. Liu, J. Kim, Accurate text localization in images based on SVM output scores, Image and Vision Computing 27 (2009) 1295–1301.
  7. Q. X. Ye, Q. M. Huang, W. Gao, D. B. Zhao, Fast and robust text detection in images and video frames, Image and Vision Computing 23 (6) (2005) 565–576.
  8. M. Anthimopoulos, B. Gatos and I. Pratikakis, A two-stage scheme for text detection in video images, Image and Vision Computing, (2010)
  9. H. Y. Shen, J. Coughlan, V. Ivanchenko, Figure-ground segmentation using factor graphs, Image and Vision Computing 27 (7) (2009) 854–863.
  10. C. Strouthopoulos, N. Papamarkos, Text identification for document image analysis using a neural network, Image and Vision Computing 16 (12–13) (1998) 879–896
  11. Tokuda et al," Speech Synthesis Based on Hidden Markov Models",Proceedings of the IEEE | Vol. 101, No. 5, May 2013
  12. A. G. Ramakrishnan, Lakshmish N Kaushik, LaxmiNarayana. M, "Natural Language Processing for Tamil TTS", Proc. 3rd Language and Technology Conference, Poznan, Poland, October 5-7, 2007
  13. Chen, G. L. , Yue, D. J. , Zu, Y. Q. , Yu, Z. L. , "An embedded English synthesis approach based on speech concatenation and smoothing", ISCSLP2004, pp. 157-160, Hong Kong, Dec. 2004
  14. T. Dutoit, "An Introduction to Text-to-Speech Synthesis". Dordrecht/Boston/London: Kluwer Academic Publishers, 1997.
  15. T. Styger and E. Keller, Fundamentals ofSpeech Synthesis and Speech Recognition: Basic Concepts, State of the Art, and Future Challenges in Formant synthesis, In Keller E. (ed. ), 109-128, Chichester: John Wiley, 1994. , 4,5
  16. 13. D. H. Klatt, ''Software for a cascade/parallel formant synthesizer,'' J. Acoust. Soc. Am. , vol. 67, no. 3,971–995, 1980.
  17. J. Allen, M. S. Hunnicutt, and D. Klatt, From Text to Speech, The MITalk System, Cambridge: CambridgeUniversity Press, 1987
  18. Moulines, E. , Charpentier, F. "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones", Speech Communication, Vol. 9, pp. 453-468, 1990
  19. Sproat, R. , Hirschberg, J. , Yarowsky, D. , "A corpus-based synthesizer", ICSLP1992, pp. 563-566, Alberta, Canada, Oct. 1992
  20. Van Santen J. , Sproat, R. , Olive, J. , Hirshberg, J. , editors, Progress in Speech Synthesis, Springer Verlag, New York, 1995
  21. Gonzalez, R. C. andWoods, R. E. 1992. "Digital Image Processing". Addison-Wesley.
  22. Wang Y. , Phillips I. T. , and Haralick, R. M. 2006. Document zone content classificationand its performance evaluation. Pattern Recognition, 39: 57-73.
  23. Shih, F. Y. and Chen, S. S. 1996. Adaptivedocument block segmentation andclassification. IEEE Transfusion. SystemMan and Cybernetics-PART B: Cybernetics,26, 5: 797-802.
  24. IngmundBjørkan,Speech Generation and Modification in Concatenative Speech Synthesis Ph D Thesis,Norwegian University of Science and Technology . Faculty of Information Technology, Mathematics and Electrical Engineering, Department of Electronics and Telecommunications 2010
  25. Sproat, R. and Oliver, J. "An Approach to Text-to-Speech Synthesis". Chapter 17 in book "Speech Coding and Synthesis", Elsevier, 1995
  26. S. Nakajima and H. Hamada, "Automatic generation of Synthesis Units based on context oriented clustering", Proc. ICASSP 1988, pp. 659-662, (New York, USA), 1988].
  27. R. E. Donovan and E. M. Eide, ''The IBM trainable speech synthesis system,'' in Proc. Int. Conf. Spoken Lang. Process. , 1998, pp. 1703–1706.
  28. B. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal, ''The AT&T Next-Gen TTS system,'' in Proc. Joint ASA/EAA/DAEA Meeting, 1999,pp. 15–19.
  29. G. Coorman, J. Fackrell, P. Rutten, and B. Coile, ''Segment selection in the L&H realspeak laboratory TTS system,'' in Proc. Int. Conf. Spoken Lang. Process. , 2000,pp. 395–398. ]
  30. http://msdn. microsoft. com/en-us/library/ms720151(v=vs. 85). aspx.
  31. Zenget a," Speech dynamic range for cochlear implants". J. Acoust. Soc. Am. , Vol. 111, No. 1, Pt. 1, Jan. 2002
Index Terms

Computer Science
Information Sciences

Keywords

Text Information Extraction(TIE) Optical Character Recognition (OCR) Connected Component Labeling Text-to-speech (TTS) Concatenative synthesis Graphical User Interface(GUI)