We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Designing Spam Model- Classification Analysis using Decision Trees

by Shweta Rajput, Amit Arora
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 75 - Number 10
Year of Publication: 2013
Authors: Shweta Rajput, Amit Arora
10.5120/13145-0549

Shweta Rajput, Amit Arora . Designing Spam Model- Classification Analysis using Decision Trees. International Journal of Computer Applications. 75, 10 ( August 2013), 6-12. DOI=10.5120/13145-0549

@article{ 10.5120/13145-0549,
author = { Shweta Rajput, Amit Arora },
title = { Designing Spam Model- Classification Analysis using Decision Trees },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 75 },
number = { 10 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 6-12 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume75/number10/13145-0549/ },
doi = { 10.5120/13145-0549 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:43:54.033557+05:30
%A Shweta Rajput
%A Amit Arora
%T Designing Spam Model- Classification Analysis using Decision Trees
%J International Journal of Computer Applications
%@ 0975-8887
%V 75
%N 10
%P 6-12
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

A spam has diluted the message pool, causing frustration so require an automatic processing of emails. This study is to construct a spam model using classification technique in data mining. To accomplish this, experiments were conducted on spam dataset downloaded from the UCI machine learning repository which was classified using a popular data mining tool called WEKA. The final classification result should be '1' if it is finally spam, otherwise, it should be '0'. Email is popular mode of communication and its users are growing day by day. But, due to social networks and electronic business, most of the emails contain unsolicited bulk e-mail called spam. Several solutions have been proposed to overcome the spam problem, filtering using decision tree classifiers is the one of the most significant techniques. Machine learning classifiers, J48, J48graft and Simple CART were used for classifying spam messages from e-mail. These trees are induced first and then prune sub trees to improve classification accuracy and size of tree. It helps to reduce size, complexity and to achieve better predictive accuracy of final classifier. Grafting is then applied as a post process to an inferred decision tree. Results showed that J48graft had pretty good prediction accuracy as compared to CART and J48 algorithms.

References
  1. J. Quinlan, Simplifying decision trees, Int. J. Human Computer Studies.
  2. SamDrazin and MattMontag, Decision Tree Analysis using WEKA, Machine Learning-Project II, University of Miami.
  3. J. R. Quinlan, Induction of decision trees, Machine Learning, vol. 1, no. 1, pp. 81–106, 1986.
  4. I. Bratko and M. Bohanec, Trading accuracy for simplicity in decision trees, Machine Learning 15, 223-250, 1994.
  5. C4. 5:Programs for Machine Learning. Morgan Kaufmann, 1993, ISBN 1-55860-238-0.
  6. F. Esposito,D. Malerba, and G. Semeraro,A comparative Analysis of Methods for Pruning Decision Trees", IEEE transactions on pattern analysis and machine intelligence, vol. 19(5): pp. 476-491, 1997.
  7. UCI Machine Learning Repository Irvine, CA: University of California, School of Information and Computer Science. Accessed online from http://www. ics. uci. edu/~mlearn/MLRepository. html.
  8. T. M Mitchell. Machine Learning. McGraw-Hill, New York, 1997.
  9. Dipti D. Patil, V. M. Wadhai, J. A. Gokhale. Evaluation of Decision Tree Pruning Algorithms for Complexity and Classification Accuracy, Volume 11– No. 2, December 2010.
  10. Max Bramer," Pre-pruning Classification Trees to reduce Overfitting in Noisy Domains", Faculty of Technology, University of Portsmouth, UK.
Index Terms

Computer Science
Information Sciences

Keywords

Weka Simple CART J48 J48graft Spam filtration Post pruning Pre pruning Classification Grafting