CFP last date
20 December 2024
Reseach Article

Identification of Receipts in a Multi-receipt Image using Spectral Clustering

by Siddharth Garimella
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 155 - Number 2
Year of Publication: 2016
Authors: Siddharth Garimella
10.5120/ijca2016912261

Siddharth Garimella . Identification of Receipts in a Multi-receipt Image using Spectral Clustering. International Journal of Computer Applications. 155, 2 ( Dec 2016), 14-18. DOI=10.5120/ijca2016912261

@article{ 10.5120/ijca2016912261,
author = { Siddharth Garimella },
title = { Identification of Receipts in a Multi-receipt Image using Spectral Clustering },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2016 },
volume = { 155 },
number = { 2 },
month = { Dec },
year = { 2016 },
issn = { 0975-8887 },
pages = { 14-18 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume155/number2/26576-2016912261/ },
doi = { 10.5120/ijca2016912261 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:00:11.620132+05:30
%A Siddharth Garimella
%T Identification of Receipts in a Multi-receipt Image using Spectral Clustering
%J International Journal of Computer Applications
%@ 0975-8887
%V 155
%N 2
%P 14-18
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In order to submit expense reports, multiple receipts are often scanned on a single page and the scanned images are submitted along with the expense report in order to get expenses reimbursed. These scanned images are manually verified to check the validity of the claimed expenses. In this paper, a method is presented to isolate receipt segments in an image and use Optical Character Recognition (OCR) to identify receipt amounts, reducing validation time and effort. Scanned images are processed to find the contours of all high-contrast objects in receipts, including letters. Minimum bounding rectangles (MBRs) are found for each of the contours. Spectral clustering is used to group these MBRs in order to find receipt clusters which correspond to individual receipts. These are then processed with OCR to aid the user with validation.

References
  1. Mori, S., Suen, C. Y., and Yamamoto, K. 1992. Historical Review of OCR Research and Development, IEEE Proceedings, vol. 80, no. 7, 1029-1058.
  2. Huang, T. S. and Tang, G. T. 1979. A fast two- dimensional median filtering algorithm, IEEE Trans Acoustics, Speech, and Signal Processing, vol.27, no. 1, 13- 18.
  3. John Canny. 1986. A computational approach to edge detection. Pattern Analysis and Machine Intelligence, IEEE Transactions on, PAMI-8(6):679–698.
  4. Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Englewood Cliffs, NJ: Prentice-Hall.
  5. Jain, A. K., Murty, M. N., and Flynn, P. J. 1999. Data clustering: A review. ACM Computing Surveys, vol. 31(3), 264-323.
  6. Ng, A. Y., Jordan, M. I., and Weiss, Y. 2002. On spectral clustering: Analysis and an algorithm. Advances in Neural Information Processing Systems 14, volume 14, 849-856.
  7. Mikhail Belkin and Partha Niyogi. 2003. Laplacian Eigenmaps for Dimensionality Reduction and Data Representation. Neural Computation, vol. 15, 1373-1396
  8. Smile – Statistical Machine Intelligence and Learning Engine (http://haifengl.github.io/smile/).
  9. Ray Smith. 2007. Tesseract OCR Engine, https://tesseract-ocr.googlecode.com/files/TesseractOSCON.pdf, Google, Inc.
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Pattern Recognition Receipt Recognition and Spectral Clustering Algorithm.