Reseach Article

Analysis of Random Forest and Naive Bayes for Spam Mail using Feature Selection Catagorization

by Rachana Mishra, R. S. Thakur
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 80 - Number 3
Year of Publication: 2013
Authors: Rachana Mishra, R. S. Thakur

Today, internet users are increases Spam mail is the major problem and big challenges for researcher to reduce it . Spam is commonly defined as unsolicited email messages and the goal of spam categorization is to distinguish between spam and legitimate email messages. This paper shows classification of spam mail and solving various problems is related to web space. Many machine learning algorithm are used to classified the spam and legitimate mail. This paper identify the best classification approach using bench mark dataset . The dataset consist of 9324 records and 500 attributes used for (training and testing) to build the model. This paper can play significant role to help eliminate unsolicited commercial e-mail, viruses, Trojans, and worms, as well as frauds perpetrated electronically and other undesired and troublesome e-mail. Three machines learning supervised algorithms namely naive bayes, Random Tree and Random Forest have applied on spam mail dataset using two feature selection algorithms.

Index Terms

Computer Science
Information Sciences


spam problem spam classification weka