Arabic Minimal Pairs Word Detection and Disambiguation

Mohamed Taybe Elhadi

Call for Paper

October Edition

IJCA solicits high quality original research papers for the upcoming October edition of the journal. The last date of research paper submission is 22 September 2025

Submit your paper

Know more

The week's pick

Real-Time Video Transmission using Gaussian Minimum Shift Keying (GMSK) on GNU Radio and USRP for Radiation Monitoring Applications in Nuclear Reactors

Nabiha Ben Abid Abdalla M. Khattab Hani A.M. Harb Chokri Souani

Random Articles

Reseach Article

Arabic Minimal Pairs Word Detection and Disambiguation

by Mohamed Taybe Elhadi

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 184 - Number 31

Year of Publication: 2022

Authors: Mohamed Taybe Elhadi

10.5120/ijca2022922371

Mohamed Taybe Elhadi . Arabic Minimal Pairs Word Detection and Disambiguation. International Journal of Computer Applications. 184, 31 ( Oct 2022), 11-20. DOI=10.5120/ijca2022922371

@article{ 10.5120/ijca2022922371,

author = { Mohamed Taybe Elhadi },

title = { Arabic Minimal Pairs Word Detection and Disambiguation },

journal = { International Journal of Computer Applications },

issue_date = { Oct 2022 },

volume = { 184 },

number = { 31 },

month = { Oct },

year = { 2022 },

issn = { 0975-8887 },

pages = { 11-20 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume184/number31/32510-2022922371/ },

doi = { 10.5120/ijca2022922371 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-07T01:22:51.978513+05:30

%A Mohamed Taybe Elhadi

%T Arabic Minimal Pairs Word Detection and Disambiguation

%J International Journal of Computer Applications

%@ 0975-8887

%V 184

%N 31

%P 11-20

%D 2022

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This work is an attempt to solve a common writing problem and pitfall for Arabic language. The problem involves words that contain letters such as (among many others) ظ THA and ض DHA. The problem involves terms that are formally minimal pairs (more precisely near minimal pairs), near homographs (homophones), it requires determination of the right term and resolutions of created ambiguities. It is not just embarrassing to the authors, but in many situations, it results in wrong usage of words and consequently can lead to an ambiguous sentence(s). It becomes difficult to interpret such words or sentences, especially by computer involved in applications such as information retrieval, language translations and summarizations. A very amalgamated determination process was suggested that is comprised of multiple stages of feature selection, classifier selection and classification. A sample set of terms selected with a reasonable success rate making classifiers accuracies vary, but overall, all terms are reasonably accurate and close in values. MCC values are also variable with some reasonable good ranges. It is notable that some classifiers did not converge and the MCC is set to zero. Considering results obtained from classifiers with highest training rates and those with highest MCC, It can be easily concluded that Random Forest algorithm is the champion with high accuracy in most of the terms, and many times very close to the highest rate, classifiers were close. It also scored the highest for the mean values calculation across all terms. We can easily say that a combination of extracted features from a corpus along with machine learning classification techniques, the problem can be solved with high accuracy.

References

Inc., Thinkmap. “Homonym vs. Homophone vs. Homograph on Vocabulary.Com.” Homonym vs. Homophone vs. Homograph: Choose Your Words | cabulary.Com, www.vocabulary.com,https://www.vocabulary.com/articles/chooseyourwords/homonym-homophone-homograph/. Accessed 6 Aug. 2022.
Abu El-khair, I. (2016). Abu El-Khair Corpus: A Modern Standard Arabic Corpus. International Journal of Recent Trends in Engineering & Research, 2(11), 5-13,
Ide, N., Véronis, J., (1998) “Word Sense Disambiguation: The State of the Art”, Computational Linguistics, Vol. 24, No. 1, Pp. 1-40.
Cucerzan, R.S., C. Schafer, and D. Yarowsky, (2002) “Combining classifiers for word sense disambiguation”, Natural Language Engineering, Vol. 8, No. 4, Cambridge University Press, Pp. 327-341.
Nameh, M. S., Fakhrahmad, M., Jahromi, M.Z., (2011) “A New Approach to Word Sense Disambiguation Based on Context Similarity”, Proceedings of the World Congress on Engineering, Vol. I.
Xiaojie, W., Matsumoto, Y., (2003) “Chinese word sense disambiguation by combining pseudo training data”,Proceedings of The International Conference on Natural Language Processing and Knowledge Engineering, Pp. 138-143.
Navigli, R. (2009) “Word Sense Disambiguation: a Survey”, ACM Computing Surveys, Vol. 41, No.2, ACM Press, Pp. 1-69.
“Minimal Pairs Theory.” Minimal Pairs Theory, www.speechlanguage-resources.com, http://www.speechlanguage-resources.com/minimal-pairs-theory.html. Accessed 5 Aug. 2022.
Barlow, J.A. and Gierut J.A. (2002) Minimal Pair Approaches to Phonological Remediation Seminars in Speech and Language, Volume 23, No 1
Bowen, C. (2009) Children's Speech Sound Disorders Wiley-Blackwell
Williams, A.L. McLeod, S. & McCauley R.J. (2010) Interventions for Speech Sound Disorders in Children Paul H Brookes Publishing Co
Williams, A.L. (2006) SCIP Sound Contrasts in Phonology: Evidence Based Treatment Program. User Manual Super Duper Publications
Carpaut, M., and Wu, D., 2005, Word Sense Disambiguation vs. Statistical Machine Translation”, in Proc. of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 387-394.
Chan, Y., Ng, H., and Chiang, D.,2007, “Word Sense Disambiguation Improves Statistical Machine Translation”, in Proc. of the 45rd Annual Meeting of the Association for Computational Linguistics (ACL), pp. 33-40.
Schütze, H., and Pedersen,1995, “Information Retrieval Based on Word Senses”, in Proc. of Symposium on Document Analysis and Information Retrieval (SDAIR’95), pp. 161-175.
Stokoe, C., Oakes, M., and Tait, 2003, “Word Sense Disambiguation in Information Retrieval Revisited”, in Proc. of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 159-166.
Atkins, Sue. 1991. Tools for computer-aided corpus lexicography: The Hector project. Acta Linguistica Hungarica, 41: 5–72.
Jacquemin, B., Brun, C., and Boux, C.,2002, “Enriching a Text by Semantic Disambiguation for Information Extraction”, in Proc. of the Workshop on Using Semantics for Information Retrieval and Filtering in the 3rd International Conference in Language Resources and Evaluation (LREC).
MALLERY, J. C. 1988. Thinking about foreign policy: Finding an appropriate role for artificial intelligence computers. Ph.D. dissertation. MIT Political Science Department, Cambridge, MA.
NG, T. H. 1997. Getting serious about word sense disambiguation. In Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? (Washington D.C.). 1–7.
Elhadi, M. “Arabic News Articles Classification Using Vectorized-Cosine Based on Seed Documents.” (2019).
Edmonds P, Agirre E. Word Sense Disambiguation: Algorithms and Applications.
Manzour, I. (2017) Lisan Al-Arab. www.lesanarab.com
El-Gamml MM, Fakhr MW, Rashwan MA, Al-Said AB. A comparative study for Arabic word sense disambiguation using document preprocessing and machine learning techniques. InArabic Language Technology International Conference, Bibliotheca Alexandrina, CBA (Vol. 11).
Zouaghi A, Merhbene L, Zrigui M. A hybrid approach for arabic word sense disambiguation. International Journal of Computer Processing Of Languages. 2012 Jun;24(02):133-51.
Saad MK, Ashour WM. Osac: Open source arabic corpora. In6th ArchEng Int. Symposiums, EEECS 2010 (Vol. 10).
Diab M, Alkhalifa M, ElKateb S, Fellbaum C, Mansouri A, Palmer M. Semeval-2007 task 18: Arabic semantic labeling. InProceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007) 2007 Jun (pp. 93-98).
Dixit V, Dutta K, Singh P. Word sense disambiguation and its approaches. CPUH-Research Journal. 2015;1(2):54-8.
Albogamy F, Ramsay A, Ahmed H. Arabic tweets treebanking and parsing: A bootstrapping approach. InProceedings of the Third Arabic Natural Language Processing Workshop 2017 Apr (pp. 94-99).
Elmougy S, Taher H, Noaman H. Naïve Bayes classifier for Arabic word sense disambiguation. Inproceeding of the 6th International Conference on Informatics and Systems 2008 Mar 27 (pp. 16-21).
Chalabi A. Sakhr Arabic-English computer-aided translation system. InConference of the Association for Machine Translation in the Americas 1998 Oct 28 (pp. 518-521). Springer, Berlin, Heidelberg.
Hakak SI, Kamsin A, Shivakumara P, Gilkar GA, Khan WZ, Imran M. Exact string matching algorithms: Survey, issues, and future research directions. IEEE access. 2019 Apr 30;7:69614-37.
Merhbene L, Zouaghi A, Zrigui M. Lexical Disambiguation of Arabic Language: An Experimental Study. Polibits. 2012 Dec (46):49-54.
Diab M, Resnik P. An unsupervised method for word sense tagging using parallel corpora. InProceedings of the 40th Annual Meeting of the Association for Computational Linguistics 2002 Jul (pp. 255-262).
Eid MS, Al-Said AB, Wanas NM, Rashwan MA, Hegazy NH. Comparative study of rocchio classifier applied to supervised wsd using arabic lexical samples. InProceedings of the tenth conference of language engeneering (SEOLEC’2010), Cairo, Egypt 2010 Dec 15.
Mirtaheri SL, Shahbazian R. Machine Learning: Theory to Applications. CRC Press; 2022 Sep 29.
Biau G, Scornet E. A random forest guided tour. Test. 2016 Jun;25(2):197-227.
Brereton RG, Lloyd GR. Support vector machines for classification and regression. Analyst. 2010;135(2):230-67.
Kaur G, Oberai EN. A review article on Naive Bayes classifier with various smoothing techniques. International Journal of Computer Science and Mobile Computing. 2014 Oct;3(10):864-8.
Taunk K, De S, Verma S, Swetapadma A. A brief review of nearest neighbor algorithm for learning and classification. In2019 International Conference on Intelligent Computing and Control Systems (ICCS) 2019 May 15 (pp. 1255-1260). IEEE.
Wang QQ, Yu SC, Qi X, Hu YH, Zheng WJ, Shi JX, Yao HY. Overview of logistic regression model analysis and application. Zhonghua yu fang yi xue za zhi [Chinese journal of preventive medicine]. 2019 Sep 1;53(9):955-60.
Abiodun OI, Jantan A, Omolara AE, Dada KV, Mohamed NA, Arshad H. State-of-the-art in artificial neural network applications: A survey. Heliyon. 2018 Nov 1;4(11): e00938.
Zerrouki‏, T. Tashaphyne, Arabic light stemmer‏, 2012. https://pypi.python.org/pypi/Tashaphyne/0.2
Spärck Jones, K. (1972). "A Statistical Interpretation of Term Specificity and Its Application in Retrieval". Journal of Documentation. 28 (1): 11–21. CiteSeerX 10.1.1.115.8343. doi:10.1108/eb026526.
Obeid O, Zalmout N, Khalifa S, Taji D, Oudah M, Alhafni B, Inoue G, Eryani F, Erdmann A, Habash N. CAMeL tools: An open source python toolkit for Arabic natural language processing. InProceedings of the 12th language resources and evaluation conference 2020 May (pp. 7022-7032).https://pypi.org/project/strsimpy/

Index Terms

Computer Science

Information Sciences

Keywords

Minimal Pairs Disambiguation Machine Learning Classifiers Arabic Language.