CFP last date
20 December 2024
Reseach Article

Automatic Spelling Correction based on n-Gram Model

by S. M. El Atawy, A. Abd ElGhany
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 182 - Number 11
Year of Publication: 2018
Authors: S. M. El Atawy, A. Abd ElGhany
10.5120/ijca2018917724

S. M. El Atawy, A. Abd ElGhany . Automatic Spelling Correction based on n-Gram Model. International Journal of Computer Applications. 182, 11 ( Aug 2018), 5-9. DOI=10.5120/ijca2018917724

@article{ 10.5120/ijca2018917724,
author = { S. M. El Atawy, A. Abd ElGhany },
title = { Automatic Spelling Correction based on n-Gram Model },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2018 },
volume = { 182 },
number = { 11 },
month = { Aug },
year = { 2018 },
issn = { 0975-8887 },
pages = { 5-9 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume182/number11/29862-2018917724/ },
doi = { 10.5120/ijca2018917724 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:11:06.114600+05:30
%A S. M. El Atawy
%A A. Abd ElGhany
%T Automatic Spelling Correction based on n-Gram Model
%J International Journal of Computer Applications
%@ 0975-8887
%V 182
%N 11
%P 5-9
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A spell checker is a basic requirement for any language to be digitized. It is a software that detects and corrects errors in a particular language. This paper proposes a model to spell error detection and auto-correction that is based on n-gram technique and it is applied in error detection and correction in English as a global language. The proposed model provides correction suggestions by selecting the most suitable suggestions from a list of corrective suggestions based on lexical resources and n-gram statistics. It depends on a lexicon of Microsoft words. The evaluation of the proposed model uses English standard datasets of misspelled words. Error detection, automatic error correction, and replacement are the main features of the proposed model. The results of the experiment reached approximately 93% of accuracy and acted similarly to Microsoft Word as well as outperformed both of Aspell and Google.

References
  1. Intakhab Alam Khan: Learning difficulties in English: Diagnosis and pedagogy in Saudi Arabia, Educational Research (ISSN: 2141-5161) Vol. 2(7) pp. 1248-1257 July 2011.
  2. Baheej, Kassem: Difficulties that Arab Students Face in Learning English and the Importance of the Writing Skill Acquisition, PHD, Moldova State University, Sep. 2014.
  3. Rababah, Ghaleb: Communication Problems Facing Arab Learners of English, ERIC Processing and Reference Facility,2002.
  4. Refaat, M. M., Ewees, A. A., Eisa, M. M., & Ab Sallam, A. Automated assessment of students Arabic free-text answers. Int. J. Cooperative Inform Syst., 12, 2012. 213-222.
  5. Ewees, A. A., Eisa, M., & Refaat, M. M. Comparison of cosine similarity and k-NN for automated essays scoring. cognitive processing, 3(12). 2014.
  6. Arafa, M. N., Elbarougy, R., Ewees, A. A., & Behery, G. M. A Dataset for Speech Recognition to Support Arabic Phoneme Pronunciation. International Journal of Image, Graphics & Signal Processing, 10(4). 2018.
  7. Bialy, Asmaa Awad, A A. Ewees, & A F ElGamal. A Proposed Method for Summarizing Arabic Single Document. International Journal of Computer Applications 180(34). 2018. 9-14.
  8. McCardle, P and E. Hoff. Childhood bilingualism: research on infancy through school age. Clevedon: Multilingual Matters, 2006.
  9. Abdul Haq, F. An Analysis of Syntactic Errors in the Composition of Jordanian Secondary Students. Unpublished MA Thesis. Jordan. Yarmouk University. 1982.
  10. Hoffman, Charlotte. "Towards a description of trilingual competence." International Journal of Bilingualism 2001. pages 1-17.
  11. Donna J. Montgomery et.al.: The Effectiveness of Word Processor Spell Checker Programs to Produce 'Thrget Words for Misspellings Generated by Students with Learning Disabilities, Journal of SpecialEducation Technology. p.p 27- 42, 16(2), Spring, 2001.
  12. K. Kukich: Techniques for automatically correcting words in text, ACM Computing Surveys, 24(4), 377–439, 1992.
  13. Fred J. Damerau: “technique for computer detection and correction of spelling errors”. Communications of the ACM, Volume 7 Issue 3, March 1964 pp:171–176.
  14. Mitton. Ordering the suggestions of a spellchecker without using context. Natural Language Engineering, 15(2):173–192, 2009.
  15. Whitelaw, B. Hutchinson, G. Chung, and G. Ellis: Using the web for language independent spellchecking and auto correction. In Proceedings of Empirical Methods in Natural Language Processing (EMNLP2009), pages 890–899, Singapore, 2009.
  16. Renato Cordeiro de Amorim and Marcos Zampieri: Effective Spell Checking Methods Using Clustering Algorithms, Proceedings of Recent Advances in Natural Language Processing, pages 172–178, Hissar, Bulgaria, 7-13 September 2013.
  17. Gökhan Dalkılıç and Yalçın Çebi: Turkish Spelling Error Detection and Correction by Using Word N-gram, IEEE, 2009.
  18. M. Hulden: Fast approximate string matching with finite automata. Procesamiento del Lenguaje Natural, 43:57–64, 2009.
  19. V. Ramaswamy and H. A. Girijamma: Conversion of Finite Automata to Fuzzy Automata for String Comparison, International Journal of Computer Applications (0975 – 8887) Volume 37– No.8, January 2012.
  20. Neha Gupta and Pratistha Mathur: Spell Checking Techniques in NLP: A Survey, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 2, Issue 12, December 2012.
  21. J DamerauA: technique for computer detection and correction of spelling error”, Communication ACM, 1964.
  22. Rakesh Kumar, Minu Bala, Kumar Sourabh: A study of spell checking techniques for Indian Languages, JK Research Journal in Mathematics and Computer Sciences, Vol. (1) No. (1) March 2018.
  23. Farag Ahmed, Ernesto William De Luca, and Andreas Nürnberger: Revised N-Gram based Automatic Spelling Correction Tool to Improve Retrieval Effectiveness, Polibits (40) 2009.
  24. V. J. Hodge and J. Austin, “A comparison of standard spell checking algorithms and novel binary neural approach,” IEEE Trans. Know. Dat. Eng., Vol. 15:5, pp. 1073-1081, 2003.
  25. R. A. Wagner and M. J. Fisher, “The string to string correction problem,” Journal of Assoc. Comp. Mach., 21(1):168-173, 1974.
  26. E. J. Yannakoudakis and D. Fawthrop, “An intelligent spelling error corrector,” Information Processing and Management, 19:1, 101-108, 1983.
  27. V.Gupta M. Lennig P. Mermelstein, “A Language Model in a Large-Vocabulary Speech Recognition System,” in Computer Speech & Language Volume 6, Issue 4, October 1992, Pages 331-344.
  28. C. E. Shannon: “Prediction and entropy of printed English,” Bell Sys. Tec. J. (30):50–64, 1951.
  29. Abdelbiifkahmoun and Zakarta Ftbcrrichi: Experimenting N-gram in Text Categorization, The International Arab Journal of Information Technology vol. 4. No. 4. October 2007.
  30. Baljeet Kaur: Review On Error Detection and Error Correction Techniques in NLP, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 6, June 2014.
  31. Hema P. H, Sunitha C: Spell Checker for Non-Word Error Detection: Survey, International Journal of Advanced Research in Computer Science and Software Engineering, Volume 5, Issue 3, March 2015.
  32. Eisa, M. M., Ewees, A. A., Refaat, M. M., & Elgamal, A. F. Effective medical image retrieval technique based on texture features. International Journal of Intelligent Computing and Information Science, 13(2). 2013. 19-33
  33. Wim Peters, “Lexical Resources,” NLP group, Dept. of Comp. Sc., Uni. of Sheffield, 2001.
  34. Atta E. ElAlfi, Moahmed M. ElBasuony and S. M. ElAtawy. Intelligent Arabic text to Arabic Sign Language Translation for Easy Deaf Communication. International Journal of Computer Applications 92(8). 2014. 22-29
  35. Atta E. ElAlfi and EL S. M. Atawy. Intelligent Arabic Sign Language to Arabic text Translation for Easy Deaf Communication. International Journal of Computer Applications 180(41). 2018. 19-26
  36. Hall, P.; Dowling, G. (1980). Approximate String Matching. Computing Surveys 12(4), pages 381–402.
  37. James L. Peterson: “Computer Programs for Detecting and Correcting Spelling Errors”, Communications of the ACM, Volume 23 Number 12, December 198.
  38. Wikipedia, Commonly misspelled English words, https://en.wikipedia.org/wiki/Commonly_misspelled_English_words#Unlimited_misspellings, 2018.
Index Terms

Computer Science
Information Sciences

Keywords

N-gram - Spelling correction - Misspelling detection - Spell checker - Information retrieval.