CFP last date
20 January 2025
Reseach Article

Performance of Naive Bayes Classifier – Multinomial Model on Different Categories of Documents

Published on February 2012 by Hetal Doshi, Maruti Zalte
National Conference on Emerging Trends in Computer Science and Information Technology
Foundation of Computer Science USA
NCETCSIT - Number 1
February 2012
Authors: Hetal Doshi, Maruti Zalte
5041d871-b5f8-4f4f-842f-7dbb5a3b2a3e

Hetal Doshi, Maruti Zalte . Performance of Naive Bayes Classifier – Multinomial Model on Different Categories of Documents. National Conference on Emerging Trends in Computer Science and Information Technology. NCETCSIT, 1 (February 2012), 10-13.

@article{
author = { Hetal Doshi, Maruti Zalte },
title = { Performance of Naive Bayes Classifier – Multinomial Model on Different Categories of Documents },
journal = { National Conference on Emerging Trends in Computer Science and Information Technology },
issue_date = { February 2012 },
volume = { NCETCSIT },
number = { 1 },
month = { February },
year = { 2012 },
issn = 0975-8887,
pages = { 10-13 },
numpages = 4,
url = { /proceedings/ncetcsit/number1/4753-t003/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Proceeding Article
%1 National Conference on Emerging Trends in Computer Science and Information Technology
%A Hetal Doshi
%A Maruti Zalte
%T Performance of Naive Bayes Classifier – Multinomial Model on Different Categories of Documents
%J National Conference on Emerging Trends in Computer Science and Information Technology
%@ 0975-8887
%V NCETCSIT
%N 1
%P 10-13
%D 2012
%I International Journal of Computer Applications
Abstract

Automatic sorting of documents is progressively becoming vital because manual handling and organization of documents is not a feasible solution as it can be very time consuming given the number of documents. One of the machine learning applications – text classification which is employed for document classification is explored in this paper. Generative learning algorithm – Naïve Bayes classifier is discussed in this paper. Documents from the 20 Newsgroups dataset are distributed in two groups. Group 1 consists of relatively unrelated two categories of documents and group 2 consists of relatively similar two categories of documents. Naïve Bayes classifier - Multinomial model is implemented to perform classification on both groups and it is observed that Accuracy can be improved with increasing the training set size for both the groups and Classification accuracy is higher for category of documents with lower similarity.

References
  1. Karl - Michael “Techniques for Improving the performance of Naïve Bayes for text Classification” University of Passau, department of general Linguistics Innstr. 40, 94032 Passau, Germany
  2. Kevin P. Murphy, “ Naïve Bayes classifier”, Department of Computer Science, University of British Columbia
  3. Andrew McCallum and Kamal Nigam, “A Comparison of Event Models for Naïve Bayes Text Classification ”, In: Learning for Text Categorization: Papers from the AAAI workshop, AAAI pressc(1998) 41 – 48 Technical report Ws – 98 - 05
  4. “Text Classification using Naïve Bayes”, Steve Renals, Learning and Data lecture 7, Informatics 2B, http://www.inf.ed.ac.uk/teaching/courses/inf2b/learnnotes/inf2b11-learnlec07-nup.pdf
  5. S. A. Noah and F. Ismail, “Automatic Classification of Malay Proverbs using Naïve Bayesian Algorithm”, in Information Technology Journal 7 (7): 1016-1022, 2002 ISSN 1812-5638
  6. Carl Liu, “Experiments on Spam Detection with Boosting, SVM and Naïve Bayes”, CMPS 242, Final project, Winter 2008, UCSC
  7. “Generative learning algorithm”, lecture notes2 for CS229, Department of Computer Science, University of Stanford. Available online: http://www.stanford.edu/class/cs229/notes/cs229-notes2.pdf
  8. George Tzanis, Ioannis Katakis, Ioannis Partalas, Ioannis Vlahavas, “ Modern Applications of Machine Learning”, in Proceedings of the 1st Annual SEERC Doctoral Student Conference – DSC 2006
Index Terms

Computer Science
Information Sciences

Keywords

Naive Bayes Multinomial Model