We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Handling Text Mining Problems in Arabic using Domain-Specific Approach

by Madeeh Al-gedawy, Osman Hegazy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 45 - Number 16
Year of Publication: 2012
Authors: Madeeh Al-gedawy, Osman Hegazy
10.5120/6867-9474

Madeeh Al-gedawy, Osman Hegazy . Handling Text Mining Problems in Arabic using Domain-Specific Approach. International Journal of Computer Applications. 45, 16 ( May 2012), 40-47. DOI=10.5120/6867-9474

@article{ 10.5120/6867-9474,
author = { Madeeh Al-gedawy, Osman Hegazy },
title = { Handling Text Mining Problems in Arabic using Domain-Specific Approach },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 45 },
number = { 16 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 40-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume45/number16/6867-9474/ },
doi = { 10.5120/6867-9474 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:37:47.690575+05:30
%A Madeeh Al-gedawy
%A Osman Hegazy
%T Handling Text Mining Problems in Arabic using Domain-Specific Approach
%J International Journal of Computer Applications
%@ 0975-8887
%V 45
%N 16
%P 40-47
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Latin-based languages work smoothly within the traditional text mining techniques due to being definite and the natural limited alternatives of words meanings. On the other hand, in the Arabic language, we are facing 2 main differences: 1) the way, the Arabic language is being written today without diacritics in 99% of the text will make the text interpretation at the level of two consecutive words and even in some cases at the level of sentences indefinite 2) even with diacritics, Arabic words are very loose; each word in Arabic may bear more than one meaning regarding the context. Hence handling text in Arabic in the same manner that Latin languages do, will be rather time wasting. We need to rely on different techniques in order to enrich the criteria which will be adopted in text analysis. We propose a domain-specific approach that yielded excellent results with some of Arabic text analysis aspects. Several classifiers have been built and tested for this purpose. This approach was compared to others that don't use the domain-specific approach; the paper concludes that the results obtained from the adopted technique are more appealing and promising.

References
  1. Naïve Bayes Classifier for Arabic Word Sense Disambiguation. In Proceedings of the INFOS2008, Cairo-Egypt, March 27-29. Farag, A. , and Andreas, N. 2008.
  2. Agirre, E. ; Lopez de Lacalle, A. ; Soroa, A. (2009) "Knowledge-based WSD on Specific Domains: Performing better than Generic Supervised WSD" Proc. of IJCAI.
  3. Roberto Navigli. Word Sense Disambiguation: A Survey, ACM Computing Surveys, 41(2), 2009, pp. 1–69.
  4. Navigli, R. ; G. Crisafulli. Inducing Word Senses to Improve Web Search Result Clustering. Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing (EMNLP 2010), MIT Stata Center, Massachusetts, USA.
  5. Y. Yuan and M. J. Shaw, Induction of fuzzy decision trees. Fuzzy Sets and Systems (1995), pp. 125–139.
  6. Arabacioglu, B. C. (2010). "Using fuzzy inference system for architectural space analysis". Applied Soft Computing 10 (3): 926–937.
  7. Jan A Hazelzet. Can fuzzy logic make things more clear? Critical care (London, England) 2009
  8. Open Directory RDF Dump, retrieved from: http://rdf. dmoz. org/
  9. David F. Prenatt, Jr. , Life after the Open Directory Project, Traffick. com (June 1, 2000).
  10. An empirical study of the domain dependence of supervised word sense disambiguation systems, 2000 Article. Bibliometrics Data Bibliometrics.
  11. S. Mohammad and G. Hirst. Determining word sense dominance using a thesaurus. In Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL), 2006.
  12. LAPATA, M. AND KELLER, F. 2007. An information retrieval approach to sense ranking. In Proceedings of HLT-NAACL, Rochester, NY. 348–355.
  13. Sara Owsley, Sanjay Sood ,Kristian J. Hammond. Domain Specific Affective Classification of Documents. In Proc. of the AAAICAAW (2006).
  14. bénIzquierdo, Armando Suárez, German Rigau. An empirical study on class-based word sense disambiguation. Proceedings of the 12th Conference of ACL EACL 2009.
  15. Sanderson, Mark. Ambiguous Queries: Test Collections Need More Sense. In Proceedings of the 31st annual international ACM SIGIR (SIGIR '08), pp. 499–506, New York, NY, USA, 2008.
  16. Wikipedia as Sense Inventory to Improve Diversity in Web Search Results C. Santamaria, J. Gonzalo, J. Artiles, Proceedings of ACL 2010.
Index Terms

Computer Science
Information Sciences

Keywords

Arabic Domain-specific – Fuzzy Logic – Weighted Decision Trees– Classification – Word Sense Disambiguation – Query Expansion – Wordnet