CFP last date
20 December 2024
Reseach Article

An Anti-Spam System using Naive Bayes Method and Feature Selection Methods

by Masoome Esmaeili, Arezoo Arjomandzadeh, Reza Shams, Morteza Zahedi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 165 - Number 4
Year of Publication: 2017
Authors: Masoome Esmaeili, Arezoo Arjomandzadeh, Reza Shams, Morteza Zahedi
10.5120/ijca2017913842

Masoome Esmaeili, Arezoo Arjomandzadeh, Reza Shams, Morteza Zahedi . An Anti-Spam System using Naive Bayes Method and Feature Selection Methods. International Journal of Computer Applications. 165, 4 ( May 2017), 1-5. DOI=10.5120/ijca2017913842

@article{ 10.5120/ijca2017913842,
author = { Masoome Esmaeili, Arezoo Arjomandzadeh, Reza Shams, Morteza Zahedi },
title = { An Anti-Spam System using Naive Bayes Method and Feature Selection Methods },
journal = { International Journal of Computer Applications },
issue_date = { May 2017 },
volume = { 165 },
number = { 4 },
month = { May },
year = { 2017 },
issn = { 0975-8887 },
pages = { 1-5 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume165/number4/27558-2017913842/ },
doi = { 10.5120/ijca2017913842 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:11:28.754235+05:30
%A Masoome Esmaeili
%A Arezoo Arjomandzadeh
%A Reza Shams
%A Morteza Zahedi
%T An Anti-Spam System using Naive Bayes Method and Feature Selection Methods
%J International Journal of Computer Applications
%@ 0975-8887
%V 165
%N 4
%P 1-5
%D 2017
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Electronic mail is one of the important means of communication. Thus, this useful tool has invaded by invaders for different purposes. One such Invasion is the posting of useless, unwanted e-mails known as spam or junk e-mails. Several methods of spam detection exist, but each has certain weaknesses. This paper address these weaknesses by implementing and describing a spam detection system in text classification mode, which uses Bayesian method vs. PCA to filter out written spam mails from the user’s mail box. In the proposed method first extract all tokens that exist in body of emails for classifying emails based on them. But sum of these tokens aren’t useful. Sum of them are repeated in two categories spam and non-spam mails equally, so they aren’t appropriate for distinguishing two types of emails. So proposed method finds best tokens as main features using feature selection methods such as genetic algorithm (GA), forward and backward feature selection methods.

References
  1. A. Brodsky, D. Brodsky. “A Distributed Content Independent Method for Spam Detection”.
  2. J. Dudley, “Improving the Performance of Heuristic Spam Detection using a Multi-Objective Genetic Algorithm”, School of Computer Science and Software Engineering, The University of Western Australia, 2007
  3. ”Bayesian spam filtering”, http:// www.wikipedia.com
  4. R. Zitar, A. hamdan, ”Genetic optimized artificial immune system in spam detection”, Artificial Intelligence Review, pp. 1-73, 2011.
  5. T. Liang, Y. Pi, "On Spam Detection Based on Cognitive Pattern Recognation" , International Conference on Computational Intelligence and Security Workshops, 2007
  6. Ch.M. Bishop, “Pattern Recognition and Machine Learning”, 2006
  7. M. Justin “Filtering Spam With SpamAssassin” HEANet Annual Conference, 2002.
  8. G. Harik, F. Lobo, M. Kaufmann,” A Parameter-Less Genetic Algorithm”, Proceedings of the Genetic and Evolutionary Computation Conference, pp. 258-265, 1999.
  9. A. Lad, “Spam Net-Spam Detection Using PCA and Neural Networks”, CIT'04 Proceedings of the 7th international conference on Intelligent Information Technology, 2004.
  10. “Principle component analysis”, http:// www.wikipedia.com
  11. Y. Begriche, H. Labiod, “A Posterior Distribution for Anti-Spam Bayesian Statistical Model”,Network and Information Systems Security (SAR-SSI), pp. 1-6, 2011
  12. M.Sahami, S.Dumais , D. Heckerman, E. Horvitz, “A bayesian approach to filtering junk e-mail”, In AAAI-98 Workshop on Learning for Text Categorization,1998.
  13. A. Gray, M. Haahr ,“Personalised, collaborative spam filtering Using E-Mail Networks ”, Fourth Conference on Email and Anti-Spam, 2007
  14. J. JUNG, E. SIT “An empirical study of spam traffic and the use of DNS black lists”, In Proc of the 4th ACM SIGCOMM Conference on Internet Measurement ,2004.
  15. R.A Zitar, A.H Mohammad, “Spam Detection Using Genetic Assisted Artificial Immune System”, International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI), Vol. 25, pp. 1275-1295, 2011.
  16. R.A. Qasim, T. Eldos, “Population Sizing Scheme for Genetic Algorithms”, International Conference on Computer Systems and Applications AICCSA, pp. 381-384,2007.
  17. G. Kunzmann, A. Binzenhoefer, ”Autonomically improving the security and robustness of structured P2P overlays”, In International Conference on Systems and Networks Communications ,2006.
  18. A. Ramachandran, D. Dagon, N. Feamster, “DNS-based blacklists keep up with bots”, In Third Conference on Email and Anti-Spam, 2006.
  19. I.H. Witten, E. Frank, M.A. Hall, ”Data Mining: Practical Machine Learning Tools and Techniques”, Elsevier, Jan 30, 2011.
  20. J. Liu, Y. Xiao, K. Ghaboosi, H. Deng, J. Zhang, "Botnet: classification, attacks, detection, tracing, and preventive measures," EURASIP Journal on Wireless Communications and Networking, vol. 2009, Article ID 692654, 11 pages, ,2009.
  21. Zh. Yang, X Nie, W Xu, J Guo, “An Approach to Spam Detection by Naive Bayes Ensemble Based on Decision Induction”, ISDA '06. Sixth International Conference on , PP. 861-866 , Oct. 2006.
  22. L. Shengen, N. Xiaofei, L. Peiqi, W. Lin;  “Generating New Features Using Genetic Programming to Detect Link Spam “,Intelligent Computation Technology and Automation (ICICTA), International Conference , pp. 135 – 138, 2011.
  23. S.De Capitani Di Vimercati, S Paraboschi, P Samarati, ”P2P-based collaborative spam detection and filtering”, In Proc. of 4th IEEE Conference on P2P, PP. 176-183, 2004.
  24. J. Kong, P. Boykiny, B. Rezaei, N. Sarshar, V. Roychowdhury, “Scalable and reliable collaborative spam filters”, Harnessing the global social email networks. In 3rd Annual Workshop on the Weblogging Ecosystem: Aggregation, Analysis and Dynamics, 2006.
  25. T. Oda, T. White. “Spam Detection using an Artificial Immune System”.
  26. Ch. Kim, K.B. Hwang, “Naive Bayes Classifier Learning With feature selection for Spam Detection In Social Bookmarking”, Korea.
  27. J. Klensin, “Simple Mail Transfer Protocol”, http://tools.ietf.org/html/rfc2821, April 2001.
  28. Guenther, Showalter, “A Mail Filtering Language”, http://tools.ietf.org/html/rfc3028, January 2008.
  29. A.M. Goweder, T. Rashed, A. Elbekaie, H.A. Alhammi, ”An Anti-Spam System Using Artificial Neural Networks and Genetic Algorithms”.
  30. SenderBase. http://www.senderbase.org, 2007.
  31. M. Sergeant, “Internet-Level Spam Detection and SpamAssassin “, Spam Conference, 2003.
  32. P. Pantel ,D. Lin “Spam: A Spam Classification & Organization Program”, thanks allah41,AAAI-98 Workshop in Learning for Text Categorization1998.
Index Terms

Computer Science
Information Sciences

Keywords

Spam Electronic Emails Genetic Algorithms Text Classification Forward backward feature selection Naïve Bayesian