CFP last date
20 December 2024
Reseach Article

An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor

by Hiral Padhiyar, Purvi Rekh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 101 - Number 6
Year of Publication: 2014
Authors: Hiral Padhiyar, Purvi Rekh
10.5120/17689-8652

Hiral Padhiyar, Purvi Rekh . An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor. International Journal of Computer Applications. 101, 6 ( September 2014), 7-11. DOI=10.5120/17689-8652

@article{ 10.5120/17689-8652,
author = { Hiral Padhiyar, Purvi Rekh },
title = { An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor },
journal = { International Journal of Computer Applications },
issue_date = { September 2014 },
volume = { 101 },
number = { 6 },
month = { September },
year = { 2014 },
issn = { 0975-8887 },
pages = { 7-11 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume101/number6/17689-8652/ },
doi = { 10.5120/17689-8652 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:31:57.232437+05:30
%A Hiral Padhiyar
%A Purvi Rekh
%T An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor
%J International Journal of Computer Applications
%@ 0975-8887
%V 101
%N 6
%P 7-11
%D 2014
%I Foundation of Computer Science (FCS), NY, USA
Abstract

With the development of Internet and the emergence of a large number of text resources, the automatic text classification has become a research hotspot. Emails is one of the fastest and cheapest communication ways that today it has became the part of communication means of millions of people. It has become a part of everyday life for millions of people, changing the way we work and collaborate. The large percentage of the total traffic over the internet is the email. Email data is also growing rapidly, creating needs for automated analysis. In many security informatics applications it is important to detect deceptive communication in email. In the iterative process in the standard EM-based semi-supervised learning, there are two steps: firstly, use the current classifier constructed in the previous iteration to predict the labels of all unlabeled samples; then, reconstruct a new classifier based on the new training samples set. In this work, an EM based Semi-Supervised Learning algorithm using Naïve Bayesian is proposed in which unlabeled documents are divided into two parts, reliable and misclassified. An Ensemble technique is used to add only reliable unlabeled documents to the training set. Also preprocessing of unlabelled documents is performed before learning process of Naïve Bayesian and K-NN classifiers during first step of EM to reduce time of preprocessing, so with this proposed work accuracy of classifier will be increased and execution time will be decreased.

References
  1. S. Appavu and R. Rajaram, "Learning to classifying threaten email", 2008 IEEE.
  2. Lei SHI, Qiang WANG "Spam e-mail classification using Decesion tree Ensemble", 2012.
  3. Xinghua Fan and Houfeng Ma, "An improved EM-based Semi-supervised learning method", 2009 IEEE.
  4. Xiaojin Zhu, "Semi-Supervised Learning Literature Survey", Computer Sciences TR 1530, University of Wisconsin – Madison, 2005.
  5. Jun-ming Xu, Giorgio Fumera, Fabio Roli and Zhi-Hua Zhou "Training SpamAssassin with Active Semi-supervised Learning", CEAS 2009.
  6. Haibin Mei and Minghua zhang, "A semi supervised IDS alert classification model based on alert context", ICCSEE 2013.
  7. Ye Tian, Gary M. Weiss and Qiang Ma, "A semi-supervised approach for web spam detection using combinatorial feature-fusion", 2007.
  8. Vinod Patidar, Divakar Singh, "A Survey on Machine Learning Methods in Spam Filtering", International Journal of Advanced Research in Computer Science and Software Engineering, Page(s): 964-972, October 2013
  9. Jalili, S. , Bitarafan, "Increase the efficiency of text categorization based on the improved feature selection method", 2006.
  10. MohammadReza FeiziDerakhshi and Nayer TalebiBeyrami, "The Feature Selection and Dimensionality Reduction Methods for Email Classification", Journal of Basic and Applied Scientific Research , 633-636, 2013.
  11. Xiaojin Zhu, "Semi-Supervised Learning Literature Survey", Computer Sciences TR 1530, University of Wisconsin – Madison, 2005.
Index Terms

Computer Science
Information Sciences

Keywords

Email Classification Naïve Bayes K-NN SSL.