CFP last date
20 January 2025
Reseach Article

Head Mounted Device for Real World Text to Speech Conversion

by Nikhil Varghese, Gaurav Tripathi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 155 - Number 5
Year of Publication: 2016
Authors: Nikhil Varghese, Gaurav Tripathi
10.5120/ijca2016912309

Nikhil Varghese, Gaurav Tripathi . Head Mounted Device for Real World Text to Speech Conversion. International Journal of Computer Applications. 155, 5 ( Dec 2016), 16-20. DOI=10.5120/ijca2016912309

@article{ 10.5120/ijca2016912309,
author = { Nikhil Varghese, Gaurav Tripathi },
title = { Head Mounted Device for Real World Text to Speech Conversion },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2016 },
volume = { 155 },
number = { 5 },
month = { Dec },
year = { 2016 },
issn = { 0975-8887 },
pages = { 16-20 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume155/number5/26600-2016912309/ },
doi = { 10.5120/ijca2016912309 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:00:27.686909+05:30
%A Nikhil Varghese
%A Gaurav Tripathi
%T Head Mounted Device for Real World Text to Speech Conversion
%J International Journal of Computer Applications
%@ 0975-8887
%V 155
%N 5
%P 16-20
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

There is no low-cost aid for visually impaired people despite several advances in technology. This paper presents a mobile head-mounted device to detect and convert text in natural scenes to speech. The major components of the device are a Raspberry Pi, a high definition webcam, earphones and a portable power bank. The Raspberry Pi is connected to the webcam which captures the image. A text detection algorithm using Class Specific Extremal Regions (CSERs) is implemented to detect the text in complex natural scenes. The segmented image is passed to the Tesseract OCR engine for text detection. The identified text is converted to audio using the espeak Python module in the Raspberry Pi. Thus, a visually impaired person can use this device to hear all the text in his surroundings like the name of a shop, public notices, billboards, road directions, etc.

References
  1. (Aug. 2014). WHO | Visual impairment and blindness. [Online] Available: http://www.who.int/mediacentre/factsheets/fs282/en/
  2. R. Kurzweil, The age of spiritual machines: when computers exceed human intelligence. Viking Press, 1998
  3. T. Hedgpeth, J. A. Black, and S. Panchanathan, “A demonstration of the icare portable reader,” in ACM SIGACCESS, 2006, pp. 279–280.
  4. H. Aoki, B. Schiele, and A. Pentland, “Realtime personal positioning system for a wearable computer,” in ISWC, 1999, pp. 37–43.
  5. J. Chmiel, O. Stankiewicz, W. Switala, M. Tluczek, and J. Jelonek, “Read IT project report: A portable text reading system for the blind people,” 2005
  6. About – Google Translate. [Online] Available: http://translate.google.co.in/about/intl/en_ALL/
  7. (2016). KNFB Reader. [Online] Available: http://www.knfbreader.com/
  8. X. Shi and Y. Xu, “A wearable translation robot,” in ICRA, 2005.
  9. Carlos Merino-Gracia, Karel Lenc and Majid Mirmehdi, “A Headmounted Device for Recognizing Text in Natural Scenes”, Visual Information Laboratory, University of Bristol, UK
  10. Help Videos - Raspberry Pi. [Online] Available: https://www.raspberrypi.org/help/what-is-a-raspberry-pi/
  11. (2016). Logitech C920 HD Pro Webcam for Windows, Mac, and Chrome OS. [Online] Available: https://secure.logitech.com/en-in/product/hd-pro-webcam-c920
  12. (Nov, 2014). Class-specific Extremal Regions for Scene Text Detection. [Online] Available: http://docs.opencv.org/3.0-beta/modules/text/doc/erfilter.html
  13. Chen, Huizhong, et al. “Robust Text Detection in Natural Images with Edge-Enhanced Maximally Stable Extremal Regions.” Image Processing
  14. J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide baseline stereo from maximally stable extremal regions.” In BMVC, 2002 (ICIP), 2011 18th IEEE International Conference on. IEEE, 2011 Document Analysis and Recognition, 2013
  15. Gomez L. and Karatzas D., "Multi-script Text Extraction from Natural Scenes", 12th International Conference on Robust Text Detection in Natural Scene Images.
  16. GitHub Tessaract OCR. [Online] Available: https://github.com/tesseract-ocr/tesseract
  17. Thierry DutoitTTS research team, TCTS Lab:An Introduction to text-to-speech synthesis - TCTS Lab
  18. Neumann L., Matas J.: Real-Time Scene Text Localization and Recognition, CVPR 2012 (Providence, Rhode Island, USA)
  19. (2016).GitHub TessData. [Online] Available: https://github.com/tesseract-ocr/tessdata
  20. (Aug, 2016). Norvig, P. How to Write a Spelling Corrector. [Online] Available: http://norvig.com/spell-correct.html
  21. eSpeak text to speech. [Online] Available: http://espeak.sourceforge.net/
  22. (Oct, 2012). Yao, C. MSRA Text Detection 500 Database. [Online] Available: http://www.iapr-tc11.org/mediawiki/index.php/MSRA_Text_Detection_500_Database_(MSRA-TD500)
  23. Andrej Karpathy, Li Fei-Fei "Deep Visual-Semantic Alignments for Generating Image Descriptions", Department of Computer Science, Stanford University, 2014
Index Terms

Computer Science
Information Sciences

Keywords

Class-Specific Extremal Region Head-mounted device MSER(Maximally Stable Extremal Regions) Raspberry Pi Tesseract OCR Probabilistic Hough Lines Transformation