CFP last date
20 January 2025
Reseach Article

An Automatic Approach for Translating Simple Images into Text Descriptions and Speech for Visually Impaired People

by Mrunmayee Patil, Ramesh Kagalkar
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 118 - Number 3
Year of Publication: 2015
Authors: Mrunmayee Patil, Ramesh Kagalkar
10.5120/20725-3080

Mrunmayee Patil, Ramesh Kagalkar . An Automatic Approach for Translating Simple Images into Text Descriptions and Speech for Visually Impaired People. International Journal of Computer Applications. 118, 3 ( May 2015), 14-19. DOI=10.5120/20725-3080

@article{ 10.5120/20725-3080,
author = { Mrunmayee Patil, Ramesh Kagalkar },
title = { An Automatic Approach for Translating Simple Images into Text Descriptions and Speech for Visually Impaired People },
journal = { International Journal of Computer Applications },
issue_date = { May 2015 },
volume = { 118 },
number = { 3 },
month = { May },
year = { 2015 },
issn = { 0975-8887 },
pages = { 14-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume118/number3/20725-3080/ },
doi = { 10.5120/20725-3080 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:00:41.591280+05:30
%A Mrunmayee Patil
%A Ramesh Kagalkar
%T An Automatic Approach for Translating Simple Images into Text Descriptions and Speech for Visually Impaired People
%J International Journal of Computer Applications
%@ 0975-8887
%V 118
%N 3
%P 14-19
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Image processing is a rapidly growing field of research. Images are of different file formats and of different things, places, humans, scientific, astrological and many such. An image is a collection of several pixels arranged in rows and columns. These images are captured, processed and stored for various uses. For common people it is very easy to identify and analyze general images but for the blind and physically disabled people it is difficult. Unfortunately, there is no prior medium or interface for such needy people to communicate with the world. Blind or visually impaired people are usually those people who are neglected by the society, so there is always a need to help such people. Hence, we propose a new technique of converting images into text as well as speech using techniques provided by image processing like pre-processing, image segmentation, edge detection, object detection and speech synthesis. In this paper we first introduce image to text conversion need for blind people and system overview of image to text and speech conversion system. Edge detection plays an important role in this system where Canny edge detection algorithm is used to detect objects from images. Object recognition is done on the basis of color, size, texture and shape of the object.

References
  1. Girish Kulkarni, Visruth Premraj, Vicente Ordonez, Sagnik Dhar, Siming Li, Yejin Choi, Alexander C. Berg and Tamara L. Berg, "Baby Talk: Understanding and Generating Simple Descriptions," IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 35, NO. 12, DECEMBER 2013.
  2. Benjamin Z. Yao, Xiong Yang, Liang Lin, Mun Wai Lee and Song-Chun Zhu, "I2T: Image Parsing to Text Description" ,IEEE transactions on image processing, 2008.
  3. Iasonas Kokkinos, Member, IEEE, and Petros Maragos, Fellow, IEEE "Synergy between Object Recognition and Image Segmentation Using the Expectation-Maximization Algorithm", IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 31, NO. 8, AUGUST 2009.
  4. Fan-Chieh Cheng, Shih-Chia Huang, and Shanq-Jang Ruan, Member, IEEE "Illumination-Sensitive Background Modeling Approach for Accurate Moving Object Detection", IEEE TRANSACTIONS ON BROADCASTING, VOL. 57, NO. 4, DECEMBER 2011.
  5. DHIRAJ JOSHI, JAMES Z. WANG and JIA LI, The Pennsylvania State University, "The Story Picturing Engine—A System for Automatic Text Illustration", ACM Transactions on Multimedia Computing, Communications and Applications, Vol. 2, No. 1, February 2006.
  6. Munawar Hayat, Mohammed Bennamoun and Senjian An "Deep Reconstruction Models for Image Set Classification", IEEE Transactions on Pattern Analysis and Machine Intelligence.
  7. Mina Makar, Member, IEEE, Vijay Chandrasekhar, Member, IEEE, Sam S. Tsai, Member, IEEE, David Chen, Member, IEEE, and Bernd Girod, Fellow, IEEE, "Interframe Coding of Feature Descriptors for Mobile Augmented Reality", IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 8, AUGUST 2014.
  8. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, "Content-based image retrieval at the end of the early years," IEEE Trans. PAMI, vol. 22, no. 12, 2000.
  9. M. S. Lew, N. Sebe, C. Djeraba, and R. Jain, "Content-based multimedia information retrieval: State of the art and challenges," ACM Transactions on Multimedia Computing, Communications, and Applications, vol. 2, no. 1, pp. 1–19, Feb. 2006.
  10. A. Mian, M. Bennamoun, and R. Owens, "An efficient multimodal 2d-3d hybrid approach to automatic face recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 29, no. 11, pp. 1927–1943, 2007.
  11. S. Feng, D. Xu, X. Yang, Attention-driven salient edge(s) and region(s) extraction with application to CBIR, Signal Processing 90, pp. 1–15, 2010.
  12. A. Vailaya, A. Jain, H. J Zhang, On Image Classification: City Images vs. Landscape, Proceeding of the IEEE workshop on Content-Based Access of Image and Video Libraries, pp. 3-8, 1998.
  13. J. Shanbehzadeh, F. Mahmoudi, A. Sarafzadeh, A. M. Eftekhari-Moghaddam, Image Retrieval Based on the Directional Edge Similarity, Proceeding of the SPIE: Multimedia Storage and Archiving Systems, Vol. IV, Boston, Massachusetts, USA, pp. 267-271, 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Image Processing Image Segmentation Speech Synthesis Text to Speech Conversion Edge Detection.