International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 91 - Number 6 |
Year of Publication: 2014 |
Authors: Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha |
10.5120/15882-3439 |
Abhijit Paul, Arindam Dey, Bipul Syam Purkayastha . An Affix Removal Stemmer for Natural Language Text in Nepali. International Journal of Computer Applications. 91, 6 ( April 2014), 1-4. DOI=10.5120/15882-3439
Stemming is the prerequisite step in Text Mining, Spelling Checker applications as well as a basic requirement for Natural Language Processing (NLP) tasks. Also it is very important in most of the Information Retrieval (IR) systems. This paper describes an affix stripping technique for finding out the stems from context free text in Nepali Language using lexical lookup based and rule based approach. It starts by introducing different types of lexicon, the basic unit of Nepali stemmer and few rules to identify the word in the lexicon. These rules and lexicons are applied in the design and implementation of an extensible architecture of a stemmer system for Nepali text. Finally designed stemmer performance is evaluated over different domains of 1,800 words. These domains include news on Economics, Health & Political in Nepali language, which are based on Devanagari Script. The overall accuracy of the designed system is 90. 48%. Due to the absence of extensive linguistic resources, this technique shows improvement in the performance over simple rule based system.