International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 122 - Number 20 |
Year of Publication: 2015 |
Authors: Doaa Hassan |
10.5120/21813-5191 |
Doaa Hassan . On Determining the Most Effective Subset of Features for Detecting Phishing Websites. International Journal of Computer Applications. 122, 20 ( July 2015), 1-7. DOI=10.5120/21813-5191
Phishing websites are a form of mimicking the legitimate ones for the purpose of stealing user 's confidential information such as usernames, passwords and credit card information. Recently machine learning and data mining techniques have been a promising approach for detection of phishing websites by distinguishing between phishing and legitimate ones. The detection process in this approach is preceded by extracting various features from a website dataset to train the classifier to correctly identify phishing sites. However, not all extracted features are effective in classification or equivalent in their contribution to its performance. In this paper, we investigate the effect of feature selection on the performance of classification for predicting phishing sites. We evaluate various machine learning algorithms using a number of feature subsets selected from an extracted feature set by various feature selection techniques in order to determine the most effective subset of features that results in best classification performance. Empirical results shows that using our new proposed methodology for selecting features by removing redundant ones that equally contribute to the classification accuracy, the decision tree classifier achieves the best performance with an overall accuracy of 95. 40%, false positive rate (FPR) of 0. 046 and false negative rate (FNR) of 0. 065.