We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 December 2024
Reseach Article

A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning

by Nandni Patel, Santosh Vishwakarma
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 172 - Number 8
Year of Publication: 2017
Authors: Nandni Patel, Santosh Vishwakarma
10.5120/ijca2017915199

Nandni Patel, Santosh Vishwakarma . A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning. International Journal of Computer Applications. 172, 8 ( Aug 2017), 34-38. DOI=10.5120/ijca2017915199

@article{ 10.5120/ijca2017915199,
author = { Nandni Patel, Santosh Vishwakarma },
title = { A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2017 },
volume = { 172 },
number = { 8 },
month = { Aug },
year = { 2017 },
issn = { 0975-8887 },
pages = { 34-38 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume172/number8/28274-2017915199/ },
doi = { 10.5120/ijca2017915199 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:19:49.903812+05:30
%A Nandni Patel
%A Santosh Vishwakarma
%T A Comparative Analysis of Various Classifications in Vector Space Model with Absolute Pruning
%J International Journal of Computer Applications
%@ 0975-8887
%V 172
%N 8
%P 34-38
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Text Classification is an important problem in text mining used to categorize an undefined label. In this work, various classification models have been evaluated after pre-processing of the text dataset. The pre-processing steps include tokenization, stop word removal and stemming, after which different term weight scheme have also been implemented. Various pruning techniques have also been implemented to get the maximum count of the terms. Based on this analysis, we summarized that Naïve Bayes method gives the highest accuracy while comparing with other state of the art text classifiers.

References
  1. Zhai, Chengxiang, and John Lafferty. "A study of smoothing methods for language models applied to ad hoc information retrieval." ACM SIGIR Forum. Vol. 51. No. 2. ACM, 2017.
  2. Beel, Joeran, Stefan Langer, and Bela Gipp. "TF-IDuF: A Novel Term-Weighting Scheme for User Modeling based on Users’ Personal Document Collections." Proceedings of the 12th Conference. 2017.
  3. Deng, Zhi-Hong, Kun-Hu Luo, and Hong-Liang Yu. "A study of supervised term weighting scheme for sentiment analysis." Expert Systems with Applications 41.7 (2014): 3506-3513.
  4. Frei, Hans-Peter. "Information retrieval-from academic research to practical applications." In: Proceedings of the 5th Annual Symposium on Document Analysis and Information Retrieval, Las Vegas. 1996.
  5. Cummins, Ronan, and Colm O'Riordan. "An evaluation of evolved term-weighting schemes in information retrieval." Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, 2005
  6. Cummins, Ronan, and Colm O’Riordan. "Determining general term weighting schemes for the vector space model of information retrieval using genetic programming." 15th Artificial Intelligence and Cognitive Science Conference (AICS 2004). 2004.
  7. Jin, Rong, Joyce Y. Chai, and Luo Si. "Learn to weight terms in information retrieval using category information." Proceedings of the 22nd international conference on Machine learning. ACM, 2005.
  8. Reed, Joel W., et al. "TF-ICF: A new term weighting scheme for clustering dynamic data streams." Machine Learning and Applications, 2006.
  9. Ljiljana Dolamic & Jacques Savoy UniNE at FIRE 2010: Hindi, Bengali, and Marathi IR
  10. Paul McNamee and James Mayfield, Character N-gram Tokenization for European Language Text Retrieval. Information Retrieval, 7:73-97, 2004.
  11. Mierswart al, “Rapid prototyping for complex data mining tasks”, In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 935–940. ACM, 2006.
  12. Land Sebastian and Fisher Simon,”RapidMiner in academic use”, 2012 www.rapid-i.com.
  13. Mierswa, I. et al “YALE: Rapid Prototyping for Complex Data Mining tasks”, in Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD-06), pp. 935-940, 2006.
  14. Paolo Palmerini, "On performance of data mining: from algorithms to management systems for data exploration", Technical Report, Universit`a Ca’ Foscari di Venezia, 2004.
  15. Monolingual Information Retrieval using Terrier: FIRE 2010 Experiments based on n-gram indexing Vishwakarma Santosh K., et al. "Monolingual Information Retrieval using Terrier: FIRE 2010 Experiments based on n-gram indexing." Procedia Computer Science 57 (2015): 815-820.
  16. "Text mining: The state of the art and the challenges." Proceedings of the PAKDD 1999 Workshop on Knowledge Discovery from Advanced Databases. Vol. 8, 1999.
Index Terms

Computer Science
Information Sciences

Keywords

Text Classification Models Pruning Methods Vector Space Model Absolute Pruning