International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 101 - Number 6 |
Year of Publication: 2014 |
Authors: Hiral Padhiyar, Purvi Rekh |
10.5120/17689-8652 |
Hiral Padhiyar, Purvi Rekh . An Improved Expectation Maximization based Semi-Supervised Email Classification using Naive Bayes and K- Nearest Neighbor. International Journal of Computer Applications. 101, 6 ( September 2014), 7-11. DOI=10.5120/17689-8652
With the development of Internet and the emergence of a large number of text resources, the automatic text classification has become a research hotspot. Emails is one of the fastest and cheapest communication ways that today it has became the part of communication means of millions of people. It has become a part of everyday life for millions of people, changing the way we work and collaborate. The large percentage of the total traffic over the internet is the email. Email data is also growing rapidly, creating needs for automated analysis. In many security informatics applications it is important to detect deceptive communication in email. In the iterative process in the standard EM-based semi-supervised learning, there are two steps: firstly, use the current classifier constructed in the previous iteration to predict the labels of all unlabeled samples; then, reconstruct a new classifier based on the new training samples set. In this work, an EM based Semi-Supervised Learning algorithm using Naïve Bayesian is proposed in which unlabeled documents are divided into two parts, reliable and misclassified. An Ensemble technique is used to add only reliable unlabeled documents to the training set. Also preprocessing of unlabelled documents is performed before learning process of Naïve Bayesian and K-NN classifiers during first step of EM to reduce time of preprocessing, so with this proposed work accuracy of classifier will be increased and execution time will be decreased.