CFP last date
20 January 2025
Reseach Article

Online Information Search from Tamil Document Images in World Wide Web

by Abirami.s, Murugappan.s
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 51 - Number 5
Year of Publication: 2012
Authors: Abirami.s, Murugappan.s
10.5120/8039-1350

Abirami.s, Murugappan.s . Online Information Search from Tamil Document Images in World Wide Web. International Journal of Computer Applications. 51, 5 ( August 2012), 31-39. DOI=10.5120/8039-1350

@article{ 10.5120/8039-1350,
author = { Abirami.s, Murugappan.s },
title = { Online Information Search from Tamil Document Images in World Wide Web },
journal = { International Journal of Computer Applications },
issue_date = { August 2012 },
volume = { 51 },
number = { 5 },
month = { August },
year = { 2012 },
issn = { 0975-8887 },
pages = { 31-39 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume51/number5/8039-1350/ },
doi = { 10.5120/8039-1350 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:49:37.327761+05:30
%A Abirami.s
%A Murugappan.s
%T Online Information Search from Tamil Document Images in World Wide Web
%J International Journal of Computer Applications
%@ 0975-8887
%V 51
%N 5
%P 31-39
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Information Retrieval (IR) from Tamil document images present in World Wide Web (WWW) has become a challenging problem today due to its rising popularity. Among the most valuable Web assets, categorizing web images and retrieval of information from the images on the Web is quite difficult. This paper proposes a simple and effective method to separate the document images from the available web image sources and to retrieve the information present in those web document images. This system works in two phases: In the first phase, it concentrates on Automatic Image categorization process over web images by employing a filtering technique to discriminate the document images from other images available in WWW. Filtering technique employed here captures the image information by intensity and frequency histograms to discriminate the web document images. As for information retrieval in the second phase, feature string generation technique has been used to generate feature strings for every word images by extracting its shape this generates a feature string for every word image by extracting its features relying on their statistical properties, such as lines, black and white disposition rates and outline features of characters, instead of recognizing the letters and assigning its ASCII value like OCR. This kind of information retrieval has been initiated over a list of web sites and experimental results are recorded.

References
  1. Abirami . S, Manjula. D, "Feature string-based intelligent information retrieval from Tamil document images", International Journal of Computer Applications in Technology, 2009, Vol. 35, Nos. 2/3/4, pp 150-165. Ding, W. and Marchionini, G. 1997 A Study on Video Browsing Strategies. Technical Report. University of Maryland at College Park.
  2. Arvind Arasu, Junghoo Cho, Hector Garcia-Molina, Andreas Paepcke, and Sriram Raghavan, "Searching the Web", ACM Transactions on Internet Technology, 2002, Vol. 1, No. 1,pp 2-43.
  3. Balasubramanian A. , Meshesha M. and Jawahar C. V. , 'Retrieval from document image collections', Proceedings of the International Workshop on Document Analysis Systems, LNCS 3872, 2006, pp. 1-12.
  4. Chakrabarti, Van den Berg, and Dom. "Focused crawling: a new approach to topic-specific Web resource discovery", In Proceedings of the 8th International World Wide Web Conference, 1999.
  5. Chen F. R. , Wilcox L. D. and Bloomberg D. S. , 'Detecting and locating partially specified keywords in scanned images using hidden markov models', Proceedings of the International conference on Document Analysis and Recognition, 1993, pp. 133-138.
  6. Chen F. R. , Wilcox L. D. and Bloomberg D. S. , 'A comparison of discrete and continuous hidden markov models for phrase spotting in text images', Proceedings of the International conference on Document Analysis and Recognition, 1995, pp. 398-402.
  7. Chen F. R. and Bloomberg D. S. , 'Extraction of thematically relevant text from images', Symposium on Document Analysis and Information Retrieval, 1996, pp. 163-178.
  8. Diligenti. M, Coetzee. F. M, Lawrence, Giles, and Gori. "Focused crawling using context graphs. " In Proceedings of the 26th International Conference on Very Large Data Bases, 2000.
  9. Harit G. , Chaudhury S. , Gupta P. , Vohra N. and Joshi S. D. , 'Model guided Document Image Analysis system', Proceedings of the Sixth International Conference on Document Analysis and Recognition, 2001, pp. 1137-1141.
  10. Harit G. , Chaudhury S. and Paranjpe J. , 'Ontology guided Access to Document Images', Proceedings of the Eighth International Conference on Document Analysis and Recognition, 2005, pp. 292-296.
  11. Harit G. , Garg R. and Chaudhury S. , 'An integrated scheme for compression and interactive access to document images', Proceedings of the International conference on Computing: Theory and Applications, 2007, pp. 506-511.
  12. Jawahar C. V. , Meshesha M. and Balasubramanian A. , 'Searching in Document Images', Proceedings of the International conference on Visualization, Graphics and Image Processing 2004,pp. 622-627.
  13. Jawahar C. V. , Million M. and Balasubramanian A. , 'Word level access to Document Image Datasets', Proceedings of the Workshop on Computer Vision, Graphics and Image Processing,2004,pp. 73-76.
  14. Jung. G. S and Gudivada, "Autonomous tools for information discovery in the world-wide web," School of Electrical Engineering and Computer Science, 1995.
  15. Jorgensen, "Attributes of Images in Describing Tasks," Information Processing and Management, Vol. 34, nos. 2–3, 1998, pp. 161–174.
  16. Kompatsiaris, Triantafyllou and Strintzis M. G. , "A World Wide WebRegion-Based Image Search Engine," 11th International Conference on Image Analysis and Processing (ICIAP'01), 2001.
  17. Lu. Y and Tan. C. L ,"Information Retrieval in Document Image Databases", IEEE Transactions On Knowledge And Data Engineering, 2004, Vol. 16, No. 11, pp. 1398-1401.
  18. Lu Y. and Tan C. L. , 'Word Searching in Document Images Using Word Portion Matching', Document Analysis Systems V, Lecture Notes on Computer science, 2002, Vol. 2423, pp. 319-328.
  19. Najork and Wiener. L. N. "Breadth-first search crawling yields high quality pages. " In Proceedings of the 10th International World Wide Web Conference, 2001.
  20. Neil C. Rowe, "Marie-4: A High-Recall, Self-Improving Web Crawler That Finds Images Using Captions ", IEEE Intelligent Systems Archive, 2002, Vol. 17 , No. 4 , pp: 8 – 14.
  21. Rath T. and Manmatha R. , 'Features for word spotting in historical manuscripts', International conference on Document Analysis and Recognition, 2003, pp. 218-222.
  22. Sclaro, "World wide web image search engines," in NSF Workshop on Visual Information Management, Cambridge, MA, June 1995.
  23. Shen Jin-Xing, "An ontology-based adaptive topical crawling algorithm", 2008.
  24. Smith. J. R and Chang, "Visually searching the Web for Content," IEEE Multimedia Magazine, 1997, Vol. 4, no. 3, pp. 12-20.
  25. Smith S. F. and Chang "An Image and Video SearchEngine for the World-Wide Web", Proceedings of IS&T/SPIE , Storage & Retrieval for Image and Video Databases, 1997.
Index Terms

Computer Science
Information Sciences

Keywords

Web Search Information retrieval Web Image categorization Document Images