We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Approximate String Matching Algorithms: A Brief Survey and Comparison

by Syeda Shabnam Hasan, Fareal Ahmed, Rosina Surovi Khan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 120 - Number 8
Year of Publication: 2015
Authors: Syeda Shabnam Hasan, Fareal Ahmed, Rosina Surovi Khan
10.5120/21247-4048

Syeda Shabnam Hasan, Fareal Ahmed, Rosina Surovi Khan . Approximate String Matching Algorithms: A Brief Survey and Comparison. International Journal of Computer Applications. 120, 8 ( June 2015), 26-31. DOI=10.5120/21247-4048

@article{ 10.5120/21247-4048,
author = { Syeda Shabnam Hasan, Fareal Ahmed, Rosina Surovi Khan },
title = { Approximate String Matching Algorithms: A Brief Survey and Comparison },
journal = { International Journal of Computer Applications },
issue_date = { June 2015 },
volume = { 120 },
number = { 8 },
month = { June },
year = { 2015 },
issn = { 0975-8887 },
pages = { 26-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume120/number8/21247-4048/ },
doi = { 10.5120/21247-4048 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:05:41.676836+05:30
%A Syeda Shabnam Hasan
%A Fareal Ahmed
%A Rosina Surovi Khan
%T Approximate String Matching Algorithms: A Brief Survey and Comparison
%J International Journal of Computer Applications
%@ 0975-8887
%V 120
%N 8
%P 26-31
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Many database applications require similarity based retrieval on stored text and/or multimedia objects. This is an area of increasing research interest in the sectors of database, data mining, information retrieval and knowledge discovery. This paper presents a brief survey on the existing approximate string matching algorithms by primarily demonstrating three families of algorithms — the Brute force, the Lipschitz Embeddings and the Ball Partitioning algorithms. While Brute Force performs approximate string matching based on distance measures of the query object from each string stored in the database, Lipschitz Embeddings uses a far more efficient approach which embeds the stored strings in database in vector space so that the distances of embedded strings approximates the actual distances. Ball Partitioning algorithm, much more efficient than Brute force but less efficient than Lipschitz algorithm, performs search in approximate string matching based on distances where queries operate on an arbitrary search hierarchy. The paper compares and makes an analysis of these three algorithms which are suitable for approximate matching of strings stored in database text files, an issue much required in the context of similarity based retrieval of objects. The work can be extended for future work by taking into account a larger number of algorithms suited to approximate string matching for the benefit of a wider scope of comparisons and picking out the most optimal one.

References
  1. Approximate String Matching. Available Online source: http://en. wikipedia. org/wiki/Approximate_string_m atching. Last accessed: July 2011.
  2. Luis M. S. Russo, Gonzalo Navarro, Arlindo L. Oliveira and Pedro Morales, "Approximate string matching with compressed indexes", Algorithms, Volume 2, Iss ue 3, Pages 1105–1136, September 2009.
  3. Zheng Liu, James Borneman and Tao Jiang, "A fast algorithm for approximate string matching on gene sequences", in 16th Annual Symposium on Combinatorial Pattern Matching, LNCS, Springer- Verlag, pages 79–90, June 2005.
  4. Heikki Hyyro, Kimmo Fredriksson and Gonzalo Navarro, "Increased bit -parallelism for approximate and multiple string matching", Journal of Experimental Algorithmics (JEA), Volume 10, article 2. 6, December 2005.
  5. William I. Chang and Eugene L. Lawler, "Approximate string matching in sublinear expected time", 31st Annual Symposium on Foundations of Computer Science (FOCS 1990), vol. 1, pages 116–124, October 1990.
  6. Gonzalo Navarro, "A guided tour to approximate string matching", Journal of ACM Computing Surveys, (CSUR), Vol. 33, No. 1, pages 31–88, March 2001.
  7. Heikki Hyyro, "Practical methods for approximate string matching", Acta Electronica Universitatis Tamperensis, 2003.
  8. V. Levenshtein, "Binary codes capable of correcting deletions, insertions and reversals", Soviet Physics Doklady, 10 (8), pages 707–710, 1966.
  9. Christopher Manning, Prabhakar Raghavan and Hinrich Schutze, "Introduction to Information Retrieval", Cambridge University Press, 2008.
  10. Gísli R. Hjaltason and Hanan Samet, "Properties of embedding methods for similarity searching in metric space," IEEE transactions on pattern analysis and machine intelligence, Vol. 25, No. 5, pages 530–549, May 2003
  11. Gísli R. Hjaltason and Hanan Samet, "Index-driven similarity search in metric spaces", ACM Transactions on Database Systems, Vol. 28, No. 4, pages 517–580, December 2003.
Index Terms

Computer Science
Information Sciences

Keywords

Approximate String Matching Algorithm Lipschitz Embeddings Algorithm Ball Partitioning Algorithm.