Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection

H. S. Hota; Akhilesh Kumar Shrivas; S. K. Singhai

Call for Paper

July Edition

IJCA solicits high quality original research papers for the upcoming July edition of the journal. The last date of research paper submission is 20 June 2025

Submit your paper

Know more

The week's pick

Designing Multi-Tenant E-Learning Systems in the Cloud: A Process-Oriented Approach for Higher Education

Sameh Azouzi Sonia Ayachi Ghannouchi

Random Articles

Data Mining using Modified GFMM Neural Network

April

2015

Monitoring System using GSM

May

2015

ON Tiling Patterns Involving Islamic Stars with an Odd Number of Vertices

March

2013

Design and Implementation of Scalable, Fully Distributed Web Crawler for a Web Search Engine

February

2011

Reseach Article

Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection

by H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 67 - Number 25

Year of Publication: 2013

Authors: H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai

10.5120/11744-7322

H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai . Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection. International Journal of Computer Applications. 67, 25 ( April 2013), 20-25. DOI=10.5120/11744-7322

@article{ 10.5120/11744-7322,

author = { H. S. Hota, Akhilesh Kumar Shrivas, S. K. Singhai },

title = { Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection },

journal = { International Journal of Computer Applications },

issue_date = { April 2013 },

volume = { 67 },

number = { 25 },

month = { April },

year = { 2013 },

issn = { 0975-8887 },

pages = { 20-25 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume67/number25/11744-7322/ },

doi = { 10.5120/11744-7322 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:26:25.494124+05:30

%A H. S. Hota

%A Akhilesh Kumar Shrivas

%A S. K. Singhai

%T Tuned Artificial Neural Network Model for E-mail Data Classification with Feature Selection

%J International Journal of Computer Applications

%@ 0975-8887

%V 67

%N 25

%P 20-25

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

With the rapid development of Internet, e-mail has become effective means of communication to share information. Through e-mail, we can send text messages, images, audio and video clips across the world within a fraction of time. In recent years, e-mail users are facing problem due to spam e-mails. Spam e-mails are unsolicited commercial/bulk e-mails sent by spammers. There are many serious problems associated with spam e-mails, e. g. it may contain hyperlink which may lead to a bogus website which might ask you for your personal information like username, password, bank account number etc. . Spam e-mail is not only wastage of storage space but also wastage of time. In order to tackle problems faced by users due to spam e-mail, it is necessary to classify them with the help of intelligent and robust classifier. These classifiers should have the capability to classify spam e-mail against non-spam e-mail. The spam e-mail classifier performance can be greatly enhanced with the use of artificial neural network classification algorithm. An Artificial Neural Network (ANN) is a powerful tool used for classification of data , it has capability of learning huge amount of data with high dimensionality in better way, there are various parameters of ANN to be set to tune for the better performance of neural network model, these are learning rate, architecture of ANN and momentum, these all parameters play a very important role in improving the accuracy of ANN model. In this paper Error Back Propagation Network (EBPN) techniques based on ANN are explored with different value of learning rate from 0. 2 to 0. 9. An EBPN model is derived from e-mail data set obtained from UCI repository site with three different partitions. Due to high dimensionality of data set, we have applied feature selection technique for the best model. This model is tested with various combinations of feature and it is concluded that model is producing highest accuracy of 98. 49% on testing samples with 52 features. The derived model is also measured with precision, recall and F-measure and achieved 98. 34%, 99. 07% and 98. 70% respectively.

References

El-Sayed M. El-Alfy et al. , "Using GMDH-based networks for improved spam detection and email feature analysis", Applied soft computing, vol. 11, pp. 477-488, 2011.
Ismaila Idris, "E-mail spam classification with ANN and Negative selection algorithms", International Journal of Computer Science & Communication Networks, Vol. 1(3), pp 227-231, 2011.
Jiawei Han and Micheline Kamber, "Data Mining Concepts and Techniques ", Morgan Kaufmann, San Francisco, Second Edition, 2006.
Omar Saad et al. , "A Survey of machine learning Techniques for spam filtering", IJCSNS International Journal of Computer Science and Network Security, vol. 12 No. 2,2012.
W. A. Awad, "Machine Learning methods for email classification", International Journal of Computer Applications vol. 16– No. 1, pp. 0975 – 8887, 2011.
Hota H. S. et al. ,"Data mining techniques and its ensemble model applied for classification of e-mail data", proceeding of review of business and technology research in International conference EPPICTM ,vol. 5 ,No. 1, ,pp. 473-479,2012.
Hota H. S. et al. ,"E-mail and its security: A modern way of teaching and research", proceeding of International conference on Innovation and Research in technology for Sustainable Development (ICIRT) pp. 168-170, ISBN 978-93-82338-21-5,2012.
Lei SHI, et al. ,"Spam E-mail classification using decision tree ensemble", Journal of Computational Information Systems, vol 8,N0. 3 pp. 949-956,2012.
K. , J. , Cios et al. , "Data mining methods for knowledge discovery", 3rd printing, kluwer academic publishers, (USA),2000.
UCI Machine Learning Repository of machine learning databases (2010). University of California, school of Information and Computer Science, Irvine. C. A. http://archive. ics. uci. edu/ml/datasets/Spambase, August 2012.
SPSS Clementine help file http//www. spss. com last accessed on Oct 2012.

Index Terms

Computer Science

Information Sciences

Keywords

Spam e-mail Classification Error Back Propagation Network (EBPN) Feature Selection