International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 90 - Number 8 |
Year of Publication: 2014 |
Authors: Kavita Patel |
10.5120/15595-4341 |
Kavita Patel . Recognizing Spam Domains by Extracting Features from Spam Emails using Data Mining. International Journal of Computer Applications. 90, 8 ( March 2014), 25-30. DOI=10.5120/15595-4341
This paper attempts to develop an algorithm to recognize spam domains using data mining techniques with the focus on law enforcement forensic analysis. Spam filtering has been the major weapon against spam, but failed to reduce the number of spam emails sent to an indiscriminate set of recipients. The proposed algorithm accepts as input, spam mails of personal account and extracts features such as stylistic, semantic, related email subjects and URLs present in the emails. The individual features are then clustered and evaluated. Further, these clusters are mapped with their respective domains. These spam domains are the URL of the webpage that spammer is trying to promote. The WHOIS information of the domain helps to get information about the source of that domain. Parameters like overall purity and the number of emails present in the cluster with highest purity is used to measure result of the individual features. An Experimental result shows that clustering of spam mails by stylistic and semantic parameter 20% less pure than other two features of spam mails.