CFP last date
20 December 2024
Reseach Article

Influence of stemming on Clustering of Arabic texts: Comparative Study in Document Retrieval

by Abdessalem Kelaiaia, Hayet Farida Merouani
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 63 - Number 14
Year of Publication: 2013
Authors: Abdessalem Kelaiaia, Hayet Farida Merouani
10.5120/10536-5529

Abdessalem Kelaiaia, Hayet Farida Merouani . Influence of stemming on Clustering of Arabic texts: Comparative Study in Document Retrieval. International Journal of Computer Applications. 63, 14 ( February 2013), 36-41. DOI=10.5120/10536-5529

@article{ 10.5120/10536-5529,
author = { Abdessalem Kelaiaia, Hayet Farida Merouani },
title = { Influence of stemming on Clustering of Arabic texts: Comparative Study in Document Retrieval },
journal = { International Journal of Computer Applications },
issue_date = { February 2013 },
volume = { 63 },
number = { 14 },
month = { February },
year = { 2013 },
issn = { 0975-8887 },
pages = { 36-41 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume63/number14/10536-5529/ },
doi = { 10.5120/10536-5529 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:14:22.192399+05:30
%A Abdessalem Kelaiaia
%A Hayet Farida Merouani
%T Influence of stemming on Clustering of Arabic texts: Comparative Study in Document Retrieval
%J International Journal of Computer Applications
%@ 0975-8887
%V 63
%N 14
%P 36-41
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Initially, this paper, sets out to study the influence of stemming on the quality of the Arabic text clustering, and then describes the testing the application of an approach based on this clustering to improve Document Retrieval (DR). A classical local document system generally, employs statistical methods for calculating the similarity between the introduced query and each document in the target collection to finally provide an ordered list of documents (hit list). In the present approach, the collection is submitted to the clustering process, and then the list of documents returned is constructed from formed clusters based on the nearest representative among the representatives of clusters compared to the user's query. The choice of the Arabic language is motivated by its very particular morpho-syntactic characteristics.

References
  1. Aljlayl, M. , and Frieder, O. 2002. On Arabic Search: Improving the Retrieval Effectiveness via a Light Stemming Approach. In the International Conference on Information and Knowledge Management (CIKM), Virginia, USA.
  2. Larkey, L. S. , Ballesteros, L. , and Connell, M. E. 2007. Light Stemming for Arabic Information Retrieval. Arabic Computational Morphology, book chapter, Springer.
  3. Sawaf, H. , Zaplo, J. and Ney, H. 2001. Statistical Classification Methods for Arabic News Articles. In proceedings of the ACL/EACL Workshop on ARABIC Language Processing: Status and Prospects, Toulouse, France.
  4. Huot, Ch. , and Coupet, P. 2005. Le Text Mining sur la langue Arabe : application au traitement des sources ouvertes. TEMIS SA, Paris, France.
  5. Jain, A. K. , Murty, M. N. , and Flynn, P. J. 1999. Data Clustering: A Review. ACM Computing Surveys, Vol. 31, No. 3, pp. 264-323.
  6. Jardino, M. 2004. Recherche de structures latentes dans des partitions de textes de 2 à K classes. 7es Journées internationales d'Analyse statistique des Données Textuelles, France, pp. 661-671.
  7. Steinbach, M. , Karypis, G. , and Kumar, V. 2000. A Comparison of Document Clustering Techniques. In KDD Workshop, Text Mining, Minnesota, USA.
  8. Salton, G. , and Buckley, C. 1988. Term-weighting approaches in automatic text retrieval. Information Processing & Management, Vol. 24 (5), pp. 513-523.
  9. Diab, M. , Hacioglu, K. , and Jurafsky, D. 2004. Automatic Tagging of Arabic Text: From Raw Text to Base Phrase Chunks. In proceedings of the 5th Meeting of the North American Chapter of the Association for Computational Linguistics/Human Language Technologies Conference (HLT-NAACL'04), USA, pp. 149-152.
  10. Darwish, K. , and Oard, D. W. 2002. Evidence combination for Arabic-English retrieval. In TREC, Gaithersburg: NIST, USA, pp. 703-710.
  11. Darwish, K. , Hassan, H. , and Emam, O. 2005. Examining the Effect of Improved Context Sensitive Morphology on Arabic Information Retrieval. In proceedings of the ACL Workshop on Computational Approaches to Semitic Languages, Ann Arbor, USA, pp. 25–30.
  12. El Sulaiti, L. 2003. L'arabe contemporain. Radio Qatar, Qatar.
Index Terms

Computer Science
Information Sciences

Keywords

Text Arabic language stemming preprocessing clustering local document retrieval