CFP last date
20 December 2024
Reseach Article

A Perceptive Method for Arabic Word Segmentation using Bounding Boxes by Morphological Dilation

by Firoj Parwej
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 71 - Number 1
Year of Publication: 2013
Authors: Firoj Parwej
10.5120/12319-8531

Firoj Parwej . A Perceptive Method for Arabic Word Segmentation using Bounding Boxes by Morphological Dilation. International Journal of Computer Applications. 71, 1 ( June 2013), 1-7. DOI=10.5120/12319-8531

@article{ 10.5120/12319-8531,
author = { Firoj Parwej },
title = { A Perceptive Method for Arabic Word Segmentation using Bounding Boxes by Morphological Dilation },
journal = { International Journal of Computer Applications },
issue_date = { June 2013 },
volume = { 71 },
number = { 1 },
month = { June },
year = { 2013 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume71/number1/12319-8531/ },
doi = { 10.5120/12319-8531 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:34:18.598310+05:30
%A Firoj Parwej
%T A Perceptive Method for Arabic Word Segmentation using Bounding Boxes by Morphological Dilation
%J International Journal of Computer Applications
%@ 0975-8887
%V 71
%N 1
%P 1-7
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The character recognition comes under applications of image processing. The character samples are stored in a suitable image format in digital form. Each sample in the image form is properly preprocessed, segmented and the required features defining writer invariants are obtained. The word segmentation is one of the major components in document image analysis. It provides crucial information for skew correction, zone segmentation, and character recognition. The word segmentation is an operation that seeks to decompose an image of a sequence of word into sub images of individual symbols. Its decision, that a pattern isolated from the image is that of a word, can be right or wrong. It is wrong sufficiently often to make a major contribution to the error rate of the system. In this paper we introduced Arabic word segmentation for document images are presented. We are using the bounding box regions to enclose the characters of the Arabic words and then the resulting letter spaces are progressively filled to merge the character bounding boxes to get the Arabic word bounding boxes. The proposed technique is completely avoiding the line segmentation process which normally precedes word segmentation in stuffy methods. We have tested appropriate method on documents of Arabic scripts and theirs have obtained encouraging results from proposed technique.

References
  1. Likforman-Sulem, L. , Zahour, A. and Taconet, B. , "Text line Segmentation of Historical Documents: a Survey", International Journal on Document Analysis and Recognition, Springer, Vol. 9, Issue 2, pp. 123-138, 2007.
  2. Wang Jin, Tang Bin-bin, Piao Chang-hao, Lei Gai-hui, "Statistical method-based evolvable character recognition system",IEEE International Symposium on Industrial Electronics (ISIE), pp. 804-808, July 2009
  3. Dr. Firoj Parwej , "An Empirical Evaluation of Off-line Arabic Handwriting And Printed Characters Recognition System" , for published in the International Journal of Computer Science Issues (IJCSI), ISSN (Online): 1694-0814, which is published by SoftwareFirst Ltd, Doolar Lane, Mahebourg, Republic of Mauritius, vol. 9, Issue 6, pages 29 - 35, November 2012.
  4. Y. Li, Y. Zheng, and D. Doermann, "Detecting text lines in handwritten documents," in Proc. Int'l Conf. Pattern Recognition, 2006, pp. 1030–1033.
  5. S. Chen, R. M. Haralick, and I. Phillips, "Simultaneous word segmentation from document images using recursive morphological closing transform", Proceedings of the 3rd ICDAR, pages 761–764, Aug. 1995.
  6. Jaekyu Ha & Robert M. Haralick Ihsin, T. Phillips, "Document Page Decomposition by the Bounding-Box Projection Technique", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 18, No. 1, pp. 1118-1122, January 1995.
  7. Nafiz Arica , Fatos T. Yarman-Vural, "Correspondence An Overview of Character Recognition focused on Off-Line Handwriting", IEEE Transactions on Systems, Man, and Cybernetics part C: Applications and Reviews, VOl. 31, No. 2, May 2001.
  8. Richard G. Casey and Eric Lecolinet, "A Survey methods and Strategies in Character Segmentation", IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol 18, No. 7, July 1996.
  9. Dr. Firoj Parwej , March 2013, " The State of the Art Recognize in Arabic Script through Combination of Online and Offline " ,International Journal of Computer Science and Telecommunications (IJCST), published by Sysbase Solution (Ltd), UK, London (http://www. ijcst. org) , Vol. 4, Issue 3 , pp 43-48.
  10. E. L. Flores, "A Fast thinning algorithm", Proc. Of the SBT/IEEE International Telecommunications Symposium, vol. 2, pp. 594-599, Aug. 1998.
  11. M. Dillencourt, H. Samet, M. Tamminen. , "A general approach to connected component labeling for arbitrary image representations" , Journal of the ACM, 39(2):253-280, April 1992.
  12. Jaekyu Ha & Robert M. Haralick Ihsin, T. Phillips, "Document Page Decomposition by the Bounding-Box Projection Technique", IEEE Transactions on Systems, Man, and Cybernetics, Vol. 18, No. 1, pp. 1118-1122, January 1995.
  13. T. Pavlidis and J. Zhou, Page Segmentation and Classification, CVGIP, Graphical Models and Image Processing, Vol. 54, pp. 484-496, November 1992.
  14. S. J. Lee, R. M Haralick and L. G. Shapiro, "Morphologic edge detection," IEEE J. Robot. Automat. , 3(2): 142-155, 1987.
  15. H. Hadwiger: "Vorlesungen über Inhalt, Oberfläche und Isoperimetrie", Springer Verlag, (1957)
  16. Luc Vincent, "Morphological grayscale reconstruction in image analysis: Applications and efficient algorithms," IEEE Trans. on Image Processing, 2(2):176-201, 1993.
  17. R. L. Haralick , S. R. Sternberg and X. Zhuang "Image Analysis Using Mathematical Morphology", IEEE Trans. Pattern Anal. Machine Intell. , vol. 9, pp. 523 -550 1987
Index Terms

Computer Science
Information Sciences

Keywords

Arabic Text Line Segmentation Connected Component Labeling Morphological Dilation Arabic Scripts Document Analysis Preprocessing