CFP last date
20 December 2024
Reseach Article

DACS Dewey index-based Arabic Document Categorization System

by A. F. Alajmi, E. M Saad, M H Awadalla
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 47 - Number 23
Year of Publication: 2012
Authors: A. F. Alajmi, E. M Saad, M H Awadalla
10.5120/7500-0634

A. F. Alajmi, E. M Saad, M H Awadalla . DACS Dewey index-based Arabic Document Categorization System. International Journal of Computer Applications. 47, 23 ( June 2012), 50-57. DOI=10.5120/7500-0634

@article{ 10.5120/7500-0634,
author = { A. F. Alajmi, E. M Saad, M H Awadalla },
title = { DACS Dewey index-based Arabic Document Categorization System },
journal = { International Journal of Computer Applications },
issue_date = { June 2012 },
volume = { 47 },
number = { 23 },
month = { June },
year = { 2012 },
issn = { 0975-8887 },
pages = { 50-57 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume47/number23/7500-0634/ },
doi = { 10.5120/7500-0634 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:42:40.092600+05:30
%A A. F. Alajmi
%A E. M Saad
%A M H Awadalla
%T DACS Dewey index-based Arabic Document Categorization System
%J International Journal of Computer Applications
%@ 0975-8887
%V 47
%N 23
%P 50-57
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper is devoted to the development of Arabic Text Categorization System. First, a stop-words list is generated using statistical approach which captures the inflation of different Arabic words. Second, a feature representation model based on Hidden Markov Model is developed to extract roots and morphological weights. Third, a semantic synonyms merge technique is presented for feature reduction. Finally a Dewey-Index Based Back-propagation Artificial Neural Network is developed for Arabic Document Categorization. The system was compared with other classifiers and the results reveal a promising architecture.

References
  1. R. Nisbet, J. elder, G. Miner, "Handbook of statistical analysis and data mining applications", academic Press, Elsevier, 2009.
  2. R. Feldman, and J. Sanger, "The text mining handbook", Cambridge university press, 2007.
  3. G. Salton, and M. Mcgill, "An Introduction To Modem Information Retrieval", Mcgraw-Hill, 1983.
  4. G. Wei, X. Gao, and S. Wu, "Study of Text Classification Methods for Data Sets With Huge Features", 2nd International Conference on Industrial and Information Systems, 2010.
  5. M. Shafiei, S. Wang, R. Zhang, E. Milios, B. Tang, J. Tougas, and R. Spiteri, "Document Representation And Dimension Reduction For Text Clustering", IEEE, 2007.
  6. F. Thabtah, M. Eljinini, M. Zamzeer, and W. Hadi, "Naïve Bayesian Based on Chi Square to Categorize Arabic Data", Communications of the IBIMA, Volume 10, 2009.
  7. R. Al-Shalabi, and R. Obeidat, "Improving KNN Arabic Text Classification with N-Grams Based Document Indexing", INFOS2008, March 27-29, 2008 Cairo-Egypt.
  8. S. Alsaleem, "Automated Arabic Text Categorization Using SVM and NB", International Arab Journal of e-Technology, Vol. 2, No. 2, June 2011.
  9. G. Kanaan, M. Yaseen, R. Al-Shalabi, B. Al-Sarayreh, and A. Mustafa, "Using EM for Text Classification on Arabic", 2nd International conference on Arabic language resources & tools, April, , Cairo, 2009
  10. M. El-Kourdi, A. Bensaid, and T. Rachidi, "Automatic Arabic Document Categorization Based on the Naïve Bayes Algorithm", Informatics and Systems (INFOS), 2010 The 7th International Conference on, Cairo
  11. R. Al-Shalabi, G. Kanaan, and M. Gharaibeh, "Arabic Text Categorization Using kNN Algorithm", 6th International Conference on Advanced Information Management and Service (IMS), 2010, Seoul.
  12. H. Noaman, S. Elmougy, A. Ghoneim, and T. Hamza, "Naive Bayes Classifier Based Arabic Document Categorization", The 7th International Conference on Informatics and Systems (INFOS), 2010.
  13. S. Al-Harbi, A. Almuhareb, A. Al-Thubaity, M. S. Khorsheed, A. Al-Rajeh, "Automatic Arabic Text Classification", 9es Journées internationales d'Analyse statistique des Données Textuelles, JADT 2008.
  14. F. Harrag, E. El-Qawasmah, and A. Al-Salman, "Stemming as a Feature Reduction Technique for Arabic Text Categorization", 10th International Symposium on Programming and Systems (ISPS), 2011.
  15. Z. S. Zubi, "Using Some Web Content Mining Techniques for Arabic Text Classification", RECENT ADVANCES on DATA NETWORKS, COMMUNICATIONS, COMPUTERS, 2009.
  16. R. Duwairi, "Arabic Text Categorization", the international Arab Journal of information Technology, vol. 4, No. 2, April 2007
  17. R. A. El-Khoribi and M. A. Ismael, "An Intelligent System Based on Statistical Learning for Searching in Arabic", AIML Journal, Volume (6), Issue (3), September, 2006
  18. M. J. Bawaneh, M. S. Alkoffash and A. I. Al Rabea, "Arabic Text Classification using K-NN and Naive Bayes", Journal of Computer Science 4 (7): 600-605, 2008.
  19. A. Mesleh, "Support Vector Machines based Arabic Language Text Classification System: Feature Selection Comparative Study", 12th WSEAS Int. Conf. on Applied Mathematics, Cairo, Egypt, December 29-31, 2007.
  20. Tarek F. Gharib, Mena B. Habib, and Zaki T. Fayed, "Arabic Text Classification Using Support Vector Machines", The International Journal of Computers and Their Applications ISCA, vol. 16, no. 4, pp. 192-199, Dec 2009
  21. Saeed Raheel, Joseph Dichy, Mohamed Hassoun, "The Automatic Categorization of Arabic Documents by Boosting Decision Trees", Fifth International Conference on Signal Image Technology and Internet Based Systems,2009.
  22. R. Mohamed, J. Watada, "An Evidential Reasoning Based LSA Approach to Document Classification for Knowledge Acquisition", IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), 2010.
  23. B. Al-Shargabi, W. AL-Romimah, and F. Olayah, "A Comparative Study for Arabic Text Classification Algorithms Based on Stop Words Elimination" , ISWSA '11 Proceedings of the International Conference on Intelligent Semantic Web-Services and Applications, 2011.
  24. W. Musa H. Salam, J. . Al-Widian, "Performance of NB and SVM Classifiers in Islamic Arabic Data" , ISWSA '10 Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications.
  25. J. Alwedyan, W. Hadi, M. Salam, H. Y. Mansour, "Categorize Arabic Data Sets Using Multi-Class Classification Based on Association Rule Approach", ISWSA '11 Proceedings of the International Conference on Intelligent Semantic Web-Services and Applications, 2011.
  26. L. Khreisat, "Arabic Text Classification Using N-Gram Frequency Statistics a Comparative Study", Conference on Data Mining, 2006.
  27. B. Al-Salemi and M. J. Ab-Aziz, "Statistical Bayesian Learning for Automatic Arabic Text Categorization", Journal of Computer Science 7 (1): 39-45, 2011.
  28. S. Raheel and J. Dichy, "An Empirical Study on the Feature's Type Effect on the Automatic Classification of Arabic Documents", CICLing 2010, LNCS 6008, pp. 673–686, 2010. Springer-Verlag Berlin Heidelberg 2010.
  29. M. N. Al-Kabi, S. I. Al- Sinjilawi, A Comparative Study of The Efficiency of Different Measures To Classify Arabic Text", University of Sharjah Journal of Pure & Applied Sciences Volume 4, No. 2, 2007.
  30. L. Hao, and L. Hao, "Automatic Identification of StopWords in Chinese Text Classification", International Conference on Computer Science and Software Engineering,2008.
  31. R. B. Myerson, "Fundamentals of social choice theory", Discussion Paper No. 1162, 1996.
  32. L. S. Larkey, L. Ballesteros, and M. E. Connel, Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis, in Proc. of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 275 – 282, 2002.
  33. G. Zheng, and G. gaowa, "The Selection of Mongolian Stop Words", IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS), 2010.
  34. F. Zou, F. L. Wang, X. Deng, S. Han, and L. S. Wang, "Automatic Construction of Chinese Stop Word List", Proceedings of the 5th WSEAS International Conference on Applied Computer Science, Hangzhou, China, April 16-18, 2006 (pp1010-1015).
  35. S. Khoja, and R. Garside. "Stemming Arabic text", Computer Science Department, Lancaster University, Lancaster, UK, 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Arabic Text Processing Natural Language Processing Classification Feature Reduction Feature Representation Morphological Analyzer