CFP last date
20 January 2025
Reseach Article

Text Analysis and Machine Learning Approach to Phished Email Detection

by Olasehinde Olayemi Oladimeji
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 182 - Number 36
Year of Publication: 2019
Authors: Olasehinde Olayemi Oladimeji
10.5120/ijca2019918354

Olasehinde Olayemi Oladimeji . Text Analysis and Machine Learning Approach to Phished Email Detection. International Journal of Computer Applications. 182, 36 ( Jan 2019), 11-16. DOI=10.5120/ijca2019918354

@article{ 10.5120/ijca2019918354,
author = { Olasehinde Olayemi Oladimeji },
title = { Text Analysis and Machine Learning Approach to Phished Email Detection },
journal = { International Journal of Computer Applications },
issue_date = { Jan 2019 },
volume = { 182 },
number = { 36 },
month = { Jan },
year = { 2019 },
issn = { 0975-8887 },
pages = { 11-16 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume182/number36/30297-2019918354/ },
doi = { 10.5120/ijca2019918354 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:13:27.270873+05:30
%A Olasehinde Olayemi Oladimeji
%T Text Analysis and Machine Learning Approach to Phished Email Detection
%J International Journal of Computer Applications
%@ 0975-8887
%V 182
%N 36
%P 11-16
%D 2019
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Phishing;, an identity theft of sensitive information poses a serious challenge to security of personal information, it has worrisome effect on countless number of internet users bringing about a huge financial demand on business and victims alike. Text mining is a branch of Data mining used in analyzing large volume of unstructured text data in order to  extract meaningful information from it, Machine learning (ML) is an aspect of artificial Intelligence (AI) that uses the method of data mining to find out new or existing characteristics from a set of gathered data which can be relevant for classification. Machine learning methods has been found to achieve much better result than other phished email detection techniques such as blacklists, visual similarity and heuristic techniques. In this work, text mining of phished and ham emails were carried out, three machine learning techniques:- Naive Bayes, K-Nearest Neighbor and Support Vector Machine were used in identifying phished email on a standard analyzed phished email and Ham corpora. From the result, Naive bayes was found to have highest classification accuracy of 99.0% as against the other two machine learning techniques SVM (98.6%) and KNN (96.9%).

References
  1. Beardsley, T., (2005) Phishing Detection and Prevention: Practical Counter-Fraud Solutions, White Paper, 3Com Corporation, Retrieved 15 April 2017, from: http://www.planbsecurity.net/wp/ 503167001_PhishingDetectionandPr evention.pdf
  2. Emigh, A., (2005) “Online Identity theft: Phishing technology, Choke Points and Countermeasures”, White paper from Radix Labs. Retrieved 15 April 2017, from: http://www.antiphishing.org/ Phishing-dhs-report.pdf
  3. Anti-Phishing Working Group, (2006), Phishing Activity Trends Report, Retrieved 11th, April 2018 from http://www.antiphishing.org/reports/apwg_report_mar_06.pdf
  4. Anti-Phishing Working Group,(2014) attack trends report, 2014, Retrieved 7th April 2018 from https://docs.apwg.org/reports/ apwg_trends_report_q4_2014.p
  5. Anti-Phishing Working Group, (2015) attack trends report, Retrieved 7th, April 2018 from https://docs.apwg.org/reports /apwg_trends_report_q1-q3_2015.pdf
  6. Anti-Phishing Working Group (2011) Phishing Activity Trends Report, Retrieved 7th, April 2018 from http://www.anti-phishing.org
  7. Prakash P., Kumar M., Kompella R. R., and Gupta M., (2010) PhishNet: predictive blacklisting to detect phishing attacks, in Proceedings of the IEEE Conference on Computer Communications (IEEE INFOCOM ’10), IEEE, San Diego, Calif, USA, Ma pp. 1–5.
  8. Bergholz A., Beer de, J., Glahn S., Moens M-F., Paaß G., and Strobel S., (2010) New filtering approaches for phishing email, Journal of Computer Security - EU-Funded ICT Research on Trust and Security, 18(1), P 7-35
  9. Ma L., Ofoghi B., Watters P., and Brown S., (2009) “Detecting phishing emails using hybrid features,” in Proccedings of the Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing (UIC- ATC ’09), IEEE, Brisbane, Australia, pp. 493–497,
  10. Fette I., Sadeh N., and Tomasic A.,(2007) Learning to detect phishing emails, in Proceedings of the 16th International World Wide Web Conference (WWW '07), Alberta, Canada pp. 649–656,
  11. Abu-Nimeh, S., Nappa, D., Wang, X., and Nair, S.(2007): A comparison of machine learning techniques for phishing detection. In: ACM Proceeding. Anti-phishing Working Group’s 2nd Annual eCrime Researchers Summit, pp. 60–69.
  12. Abu-Nimeh (2008) A distributed architecture for phishing detection using Bayesian Additive Regression Trees. Retrieved 16th, April 2018 from http://ieeexplore.ieee.org/ document /4696965/
  13. Yu, W., Nargundkar, S., Tiruthani, N. (2009): Phishcatch-a phishing detection tool. In: 33rd IEEE Int’l Computer Software and Applications Conf., pp. 451–456
  14. Irani, D., Webb, S., Giffin, J., Pu, C. (2008): Evolutionary study of phishing. In: 3rd Anti-Phishing Working Group eCrime Researchers Summit
  15. Radev, D. (2008), CLAIR collection of fraud email, ACL Data and Code Repository, ADCR2008T001, http://aclweb.org/aclwiki or https://www.kaggle.com/rtatman/fraudulent-email-corpus/version/1
  16. Almeida T. A. , Almeida J.  and Yamakami A. (2011) Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers, Journal of Internet Services and Applications, Springer, Vol 1, No 11
  17. Yue Zhang, Serge Egelman, Lorrie Cranor, and Jason Hong,(2007) Phinding Phish: Evaluating Anti-Phishing Tools In Proceedings of the 14th Annual Network & Distributed System Security Symposium.
  18. Biju I., Raymond C and Seibu M. J. (2006) Analysis of Phishing Attacks and Countermeasures. Information Security Research Lab, Swinburne University of Technology, Kuching, Malaysia.. Retrieved 08 April 2018, from: https://arxiv.org/ftp/ arxiv/papers/1410/1410.4672.pdf
  19. Litan, A. (2014) Phishing Attack Victims Likely Targets for Identity Theft. Gartner Research (2004). Published: 04 May 2004.
Index Terms

Computer Science
Information Sciences

Keywords

Identity theft Text mining Machine Learning Sensitive information