International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 94 - Number 13 |
Year of Publication: 2014 |
Authors: M. Kasthuri, S. Britto Ramesh Kumar |
10.5120/16406-6114 |
M. Kasthuri, S. Britto Ramesh Kumar . An Improved Rule based Iterative Affix Stripping Stemmer for Tamil Language using K-Mean Clustering. International Journal of Computer Applications. 94, 13 ( May 2014), 36-41. DOI=10.5120/16406-6114
Stemming is an important step in many of the Information Retrieval (IR) and Natural Language Processing (NLP) tasks. Stemming is usually done by removing any attached suffixes and prefixes (affixes) from index terms before the actual assignment of the term to the index. Stemming is a pre-processing step in Text Mining applications and basic requirement for many areas such as computational linguistics and information retrieval work for improving their recall performance. This paper proposes improved rule based iterative affix stripping algorithm for getting stemmed Tamil word with less computational steps. Further K-Means clustering algorithm utilized to cluster the stemmed Tamil Words in order to improve the performance of Tamil language Information Retrieval and Extraction. The experimental analysis clearly shows that the words stemmed after clustering gives better result compared to words stemmed before clustering.