We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

Mining Text for Meaningful Words with Stemming Algorithm

Published on August 2016 by Priti Shende, V. B. Kute
Advanced Computing and Information Technology
Foundation of Computer Science USA
TACIT2016 - Number 1
August 2016
Authors: Priti Shende, V. B. Kute
f2267d75-9d98-4aa3-8600-8ee75a9d5201

Priti Shende, V. B. Kute . Mining Text for Meaningful Words with Stemming Algorithm. Advanced Computing and Information Technology. TACIT2016, 1 (August 2016), 13-16.

@article{
author = { Priti Shende, V. B. Kute },
title = { Mining Text for Meaningful Words with Stemming Algorithm },
journal = { Advanced Computing and Information Technology },
issue_date = { August 2016 },
volume = { TACIT2016 },
number = { 1 },
month = { August },
year = { 2016 },
issn = 0975-8887,
pages = { 13-16 },
numpages = 4,
url = { /proceedings/tacit2016/number1/25830-it43/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 Advanced Computing and Information Technology
%A Priti Shende
%A V. B. Kute
%T Mining Text for Meaningful Words with Stemming Algorithm
%J Advanced Computing and Information Technology
%@ 0975-8887
%V TACIT2016
%N 1
%P 13-16
%D 2016
%I International Journal of Computer Applications
Abstract

With the growth of explosive Internet information, data availability is easy. However, raw data is useful when mined. Therefore, mining is an important research area. The text mining primarily aims at discovery and retrieval of useful and interesting patterns from a large database. Identification and understanding of appropriate words is important to retrieve appropriate documents. Referring dictionary is time consuming and tedious job for understanding meaning of words every time. This can be prevented by converting different occurrences of word forms to its root. Frequency of words occurrences in a file used to prioritized documents. This works target avoidance of incomplete and meaningless words generation using stemming. We propose a method to compare different forms of words present in the document up to certain length. Sixty percent length of the word considered for comparison. Words having common letters are considered as different forms of same root.

References
  1. Ms. Anjali Ganesh Jivani, "A comparative study of Stemming algorithms", in Int. J. Comp. Tech. Appl. , Vol 2 (6), 1930-1938
  2. Wahiba Ben Abdessalem Karaa, "A new stemmer to improve information retrieval", in International Journal of Network Security And Its Applications(IJNSA), Vol. 5, No. 4, July 2013
  3. Prasenjit Majumder, Mandar Mitra, Swapnil K. Parui and Gobinda Kole , Pabitra Mitra and Kalyankumar Datta, "YASS: Yet Another Suffix Stripper", ACM transactions on information systems, vol. 25, no. 4, article 18, publication date: October 2007
  4. K. K. Agbele, A. O. Adesina, N. A. Azeez , & A. P. Abidoye, "Context-Aware Stemming algorithm for semantically related root words", in African Journal of Computing & ICT Vol 5. No. 4, June 2012
  5. Peter Willet, "The Porter stemming algorithm: then and now", in electronic library and information systems, 40(3). pp. 219-223
  6. M. F. Porter, "An algorithm for suffix stripping", Originally published in Program, Vo1. 4 no. 3, pp 130-137, July 1980.
  7. Danilo Saft and Volker Nissen, "Analysing full text content by means of flexible co-citation analysis inspired text mining method- exploring 15 years of JASSS articles", Int. J. Business Intelligence and Data Mining, Vol. 9, No. 1, 2014
  8. B. P. Pande, Pawan Tamta, H. S. Dhami, "Generation, Implementation and Appraisal of an N-gram based Stemming Algorithm", in press
  9. William B. Frakes, Christopher J. Fox, "Strength and similarity of affix removal stemming algorithm", in press
Index Terms

Computer Science
Information Sciences

Keywords

Complete Words Sixty Percent Length Porter's Stemming Algoithm