International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 45 - Number 16 |
Year of Publication: 2012 |
Authors: Madeeh Al-gedawy, Osman Hegazy |
10.5120/6867-9474 |
Madeeh Al-gedawy, Osman Hegazy . Handling Text Mining Problems in Arabic using Domain-Specific Approach. International Journal of Computer Applications. 45, 16 ( May 2012), 40-47. DOI=10.5120/6867-9474
Latin-based languages work smoothly within the traditional text mining techniques due to being definite and the natural limited alternatives of words meanings. On the other hand, in the Arabic language, we are facing 2 main differences: 1) the way, the Arabic language is being written today without diacritics in 99% of the text will make the text interpretation at the level of two consecutive words and even in some cases at the level of sentences indefinite 2) even with diacritics, Arabic words are very loose; each word in Arabic may bear more than one meaning regarding the context. Hence handling text in Arabic in the same manner that Latin languages do, will be rather time wasting. We need to rely on different techniques in order to enrich the criteria which will be adopted in text analysis. We propose a domain-specific approach that yielded excellent results with some of Arabic text analysis aspects. Several classifiers have been built and tested for this purpose. This approach was compared to others that don't use the domain-specific approach; the paper concludes that the results obtained from the adopted technique are more appealing and promising.