International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 182 - Number 36 |
Year of Publication: 2019 |
Authors: Olasehinde Olayemi Oladimeji |
10.5120/ijca2019918354 |
Olasehinde Olayemi Oladimeji . Text Analysis and Machine Learning Approach to Phished Email Detection. International Journal of Computer Applications. 182, 36 ( Jan 2019), 11-16. DOI=10.5120/ijca2019918354
Phishing;, an identity theft of sensitive information poses a serious challenge to security of personal information, it has worrisome effect on countless number of internet users bringing about a huge financial demand on business and victims alike. Text mining is a branch of Data mining used in analyzing large volume of unstructured text data in order to extract meaningful information from it, Machine learning (ML) is an aspect of artificial Intelligence (AI) that uses the method of data mining to find out new or existing characteristics from a set of gathered data which can be relevant for classification. Machine learning methods has been found to achieve much better result than other phished email detection techniques such as blacklists, visual similarity and heuristic techniques. In this work, text mining of phished and ham emails were carried out, three machine learning techniques:- Naive Bayes, K-Nearest Neighbor and Support Vector Machine were used in identifying phished email on a standard analyzed phished email and Ham corpora. From the result, Naive bayes was found to have highest classification accuracy of 99.0% as against the other two machine learning techniques SVM (98.6%) and KNN (96.9%).