CFP last date
20 December 2024
Reseach Article

Email classification for Spam Detection using Word Stemming

by D.Karthika Renuka, T.Hamsapriya
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 5
Year of Publication: 2010
Authors: D.Karthika Renuka, T.Hamsapriya
10.5120/125-241

D.Karthika Renuka, T.Hamsapriya . Email classification for Spam Detection using Word Stemming. International Journal of Computer Applications. 1, 5 ( February 2010), 45-47. DOI=10.5120/125-241

@article{ 10.5120/125-241,
author = { D.Karthika Renuka, T.Hamsapriya },
title = { Email classification for Spam Detection using Word Stemming },
journal = { International Journal of Computer Applications },
issue_date = { February 2010 },
volume = { 1 },
number = { 5 },
month = { February },
year = { 2010 },
issn = { 0975-8887 },
pages = { 45-47 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume1/number5/125-241/ },
doi = { 10.5120/125-241 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:44:23.020725+05:30
%A D.Karthika Renuka
%A T.Hamsapriya
%T Email classification for Spam Detection using Word Stemming
%J International Journal of Computer Applications
%@ 0975-8887
%V 1
%N 5
%P 45-47
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Unsolicited emails, known as spam, are one of the fast growing and costly problems associated with the Internet today. Among the many proposed solutions, a technique using Bayesian filtering is considered as the most effective weapon against spam. Bayesian filtering works by evaluating the probability of different words appearing in legitimate and spam mails and then classifying them based on that probabilities.Most of the current spam email detection systems use keywords to detect spam emails.These keywords can be written as misspellings eg: baank or bannk instead of bank. Misspellings are changed from time to time and hence spam email detection system needs to constantly update the blacklist to detect spam emails containing misspellings. It’s impossible to predict all possible misspellings for a given keyword and add those to the blacklist. In this paper a better and more successful approach for improving E-mail content classification for spam control is proposed. It used the Word Stemming or Word Hashing Technique for improving the efficiency of the content based spam filter.The proposed system extract the base or stem of a misspelled or modified word, to detect spam emails. It considers every misspelled keyword applies a word stemming technique and passes the base word to the content based filter. Using a proposed if-then rule, we can decide whether or not this unknown mail is spam [1].This paper also provides an Email archiving solution which classifies the E-mail relating to a person, family, corporation, association, community, or nation.

References
  1. Leonard and Hsu, 2001. Bayesian methods: an analysis for statisticians and interdisciplinary researchers. Cambridge University Press, Cambridge.
  2. Bernardo and Smith, 1994. Bayesian theory, John Wiley and Sons, Chi Chester.
  3. Clayton, R. (2004). Stopping spam by extrusion detection. Proceedings of the First Conference on Email and Anti-Spam (CEAS).
  4. Orwant J. et al. Mastering Algorithms with Perl. O’Reilly and Associates, ISBN: 1-56592-398-7, 1999.
  5. Amavisd-new Home Page, http://www.ijs.si/software/amavisd, Accessed 01 July 2004.
  6. Send mail Home Page, http://www.sendmail.org, Accessed 01, July 2004.
  7. Spam Assassin Home Page, http://www.spamassassin.org, Accessed 01, July 2004.
  8. Proc mail Home Page, http://www.procmail.org, Accessed 03, Mar 2004.
  9. Graham, P. Better Baysian Filtering. In Proceedings of Spam Conference, 2003.
  10. http://www.Blog Spam Database.com
  11. http://www.Email Spam Filter Word List.com
  12. http://www.ceas.cc/papers-2004/172.pdf.
  13. Internet Users and Spam: What the attitudes and behavior of Internet users can tell us about fighting spam ,Deborah Fallows Pew Internet & American Life Project, Washington, DC, 20036 USA.
Index Terms

Computer Science
Information Sciences

Keywords

Spam Filters Bayesian content based spam filter Word Stemming Email Email archiving