CFP last date
22 December 2025
Call for Paper
January Edition
IJCA solicits high quality original research papers for the upcoming January edition of the journal. The last date of research paper submission is 22 December 2025

Submit your paper
Know more
Random Articles
Reseach Article

Comparative Analysis of Classical String Matching Algorithms with Insights into Applications, Parallel Processing, and Big Data Frameworks

by Ashishkumar Gor, C.K. Bhensdadia
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 187 - Number 53
Year of Publication: 2025
Authors: Ashishkumar Gor, C.K. Bhensdadia
10.5120/ijca2025925896

Ashishkumar Gor, C.K. Bhensdadia . Comparative Analysis of Classical String Matching Algorithms with Insights into Applications, Parallel Processing, and Big Data Frameworks. International Journal of Computer Applications. 187, 53 ( Nov 2025), 1-7. DOI=10.5120/ijca2025925896

@article{ 10.5120/ijca2025925896,
author = { Ashishkumar Gor, C.K. Bhensdadia },
title = { Comparative Analysis of Classical String Matching Algorithms with Insights into Applications, Parallel Processing, and Big Data Frameworks },
journal = { International Journal of Computer Applications },
issue_date = { Nov 2025 },
volume = { 187 },
number = { 53 },
month = { Nov },
year = { 2025 },
issn = { 0975-8887 },
pages = { 1-7 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume187/number53/comparative-analysis-of-classical-string-matching-algorithms-with-insights-into-applications-parallel-processing-and-big-data-frameworks/ },
doi = { 10.5120/ijca2025925896 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2025-11-18T21:10:40.373352+05:30
%A Ashishkumar Gor
%A C.K. Bhensdadia
%T Comparative Analysis of Classical String Matching Algorithms with Insights into Applications, Parallel Processing, and Big Data Frameworks
%J International Journal of Computer Applications
%@ 0975-8887
%V 187
%N 53
%P 1-7
%D 2025
%I Foundation of Computer Science (FCS), NY, USA
Abstract

String matching remains one of the most fundamental problems in computer science, forming the basis for applications in information retrieval, bioinformatics, plagiarism detection, network security, and large-scale data analytics. Although classical algorithms such as the Na¨ıve method, Knuth–Morris–Pratt (KMP), Z-algorithm, Rabin–Karp, Boyer–Moore, and Aho–Corasick were developed decades ago, their relevance has only grown with the scale and diversity of modern data. This paper provides a structured comparative analysis of these algorithms, considering theoretical complexity, memory requirements, runtime behavior, and domain-specific applicability. Beyond classical analysis, we extend the discussion to how these algorithms integrate with parallel processing and big data frameworks such as Hadoop, Spark, and GPU/FPGA-based accelerators, which are critical for handling real-world datasets at large scale. The results demonstrate that no single algorithm dominates universally: choices depend strongly on factors such as alphabet size, pattern length, and application context. We conclude with insights into open challenges and future directions, including hybrid algorithms, approximate matching, compressed data structures, and hardware-aware implementations.

References
  1. D. E. Knuth, J. H. Morris, and V. R. Pratt, “Fast pattern matching in strings,” SIAM Journal on Computing, vol. 6, no. 2, pp. 323–350, 1977.
  2. R. S. Boyer and J. S. Moore, “A fast string searching algorithm,” Communications of the ACM, vol. 20, no. 10, pp. 762– 772, October 1977.
  3. D. Gusfield, Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge, UK: Cambridge University Press, 1997.
  4. M. O. Rabin and R. M. Karp, “Efficient randomized patternmatching algorithms,” IBM Journal of Research and Development, vol. 31, no. 2, pp. 249–260, 1987.
  5. A. V. Aho and M. J. Corasick, “Efficient string matching: An aid to bibliographic search,” Communications of the ACM, vol. 18, no. 6, pp. 333–340, June 1975.
  6. J. Dean and S. Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Communications of the ACM, vol. 51, no. 1, pp. 107–113, January 2008, available at https://doi.org/10.1145/1327452.1327492.
  7. M. Zaharia, M. Chowdhury, M. J. Franklin, S. Shenker, and I. Stoica, “Spark: Cluster computing with working sets,” in Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing (HotCloud), June 2010, available at https://www.usenix.org/conference/hotcloud-10/sparkcluster- computing-working-sets.
  8. P. Carbone, A. Katsifodimos, S. Ewen, V. Markl, S. Haridi, and K. Tzoumas, “Apache flink: Stream and batch processing in a single engine,” Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, vol. 36, no. 4, pp. 28–38, December 2015, available at https://cs.brown.edu/research/pubs/techreports/reports/CS- 15-02.html.
  9. T. Besard, B. V. Vlimmeren, and B. D. Sutter, “Accelerating aho-corasick string matching using gpus,” in Proceedings of the International Conference on High Performance Computing and Simulation (HPCS). IEEE, July 2012, pp. 40–47.
  10. F. Yu, Z. Liang, Y. Zhang, D.Wang, andW. Li, “Gpu accelerated string matching for deep packet inspection,” in Proceedings of the 2010 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS). ACM, October 2010, pp. 1–10.
  11. J. Tuck, L. A. Barroso, C. Kozyrakis, and B. Sinharoy, “Flexible memory-based hardware acceleration for string matching in networking applications,” in Proceedings of the 2004 ACM/IEEE Symposium on Architecture for Networking and Communications Systems (ANCS). ACM, October 2004, pp. 75–84.
  12. R. N. Horspool, “Practical fast searching in strings,” Software: Practice and Experience, vol. 10, no. 6, pp. 501–506, June 1980.
  13. D. M. Sunday, “A very fast substring search algorithm,” Communications of the ACM, vol. 33, no. 8, pp. 132–142, August 1990.
  14. R. A. Baeza-Yates and G. H. Gonnet, “A new approach to text searching,” Communications of the ACM, vol. 35, no. 10, pp. 74–82, October 1992.
  15. G. Navarro and M. Raffinot, “A bit-parallel approach to suffix automata: Fast extended string matching,” Algorithmica, vol. 30, no. 1, pp. 89–117, 2001.
  16. P. Weiner, “Linear pattern matching algorithms,” 14th Annual IEEE Symposium on Switching and Automata Theory (SWAT), pp. 1–11, 1973.
  17. E. M. McCreight, “A space-economical suffix tree construction algorithm,” Journal of the ACM, vol. 23, no. 2, pp. 262– 272, April 1976.
  18. E. Ukkonen, “On-line construction of suffix trees,” Algorithmica, vol. 14, no. 3, pp. 249–260, September 1995.
  19. U. Manber and G. Myers, “Suffix arrays: A new method for on-line string searches,” SIAM Journal on Computing, vol. 22, no. 5, pp. 935–948, October 1993.
  20. P. Ferragina and G. Manzini, “Opportunistic data structures with applications,” 41st Annual Symposium on Foundations of Computer Science (FOCS), pp. 390–398, November 2000.
  21. G. Myers, “A fast bit-vector algorithm for approximate string matching based on dynamic programming,” Journal of the ACM, vol. 46, no. 3, pp. 395–415, May 1999.
  22. G. Navarro, “A guided tour to approximate string matching,” ACM Computing Surveys, vol. 33, no. 1, pp. 31–88, March 2001.
  23. C. Charras and T. Lecroq, Handbook of Exact String Matching Algorithms. London, UK: King’s College Publications, 2004.
  24. A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Kwan, M. I. Sheykh, B. Menon, S. Ghosh, S. Soman, B. Bhattacharyya, and N. Shores, “Storm@twitter,” in Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data. ACM, June 2014, pp. 147– 156.
  25. M. Abouelhoda and S. Alkuroud, “Spark-based scalable dna sequence analysis,” BMC Bioinformatics, vol. 13, no. Suppl 17, p. S10, December 2012.
  26. M. Roesch, “Snort: Lightweight intrusion detection for networks,” in Proceedings of the 13th USENIX Conference on System Administration (LISA), November 1999, pp. 229–238.
Index Terms

Computer Science
Information Sciences

Keywords

String matching Exact pattern matching Knuth–Morris–Pratt (KMP) Boyer–Moore Rabin–Karp Aho–Corasick Big data frameworks (Hadoop Spark) GPU/FPGA acceleration