CFP last date
20 December 2024
Reseach Article

An Analysis on the Performance of a Classification based Outlier Detection System using Feature Selection

by Kurian M.J., Gladston Raj S.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 132 - Number 8
Year of Publication: 2015
Authors: Kurian M.J., Gladston Raj S.
10.5120/ijca2015907497

Kurian M.J., Gladston Raj S. . An Analysis on the Performance of a Classification based Outlier Detection System using Feature Selection. International Journal of Computer Applications. 132, 8 ( December 2015), 15-21. DOI=10.5120/ijca2015907497

@article{ 10.5120/ijca2015907497,
author = { Kurian M.J., Gladston Raj S. },
title = { An Analysis on the Performance of a Classification based Outlier Detection System using Feature Selection },
journal = { International Journal of Computer Applications },
issue_date = { December 2015 },
volume = { 132 },
number = { 8 },
month = { December },
year = { 2015 },
issn = { 0975-8887 },
pages = { 15-21 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume132/number8/23614-2015907497/ },
doi = { 10.5120/ijca2015907497 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:28:47.574018+05:30
%A Kurian M.J.
%A Gladston Raj S.
%T An Analysis on the Performance of a Classification based Outlier Detection System using Feature Selection
%J International Journal of Computer Applications
%@ 0975-8887
%V 132
%N 8
%P 15-21
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Outlier detection can be viewed as a classification problem if a training data set with class labels is available. Generally, in a typical medical dataset such as a cancer data set, if there are samples available with class information, then it is possible to apply a classification based outlier detection method. The general idea of classification-based outlier detection method is to train a classification model that can distinguish normal data from outliers [7]. Previous work had implemented and evaluated using the three classifications based outlier detection algorithms and found that the k-neighborhood algorithm was capable of identifying and classifying the outliers better than the other two compared algorithm in terms of accuracy, f-score, Sensitivity/Recall, error rate. Further, the cpu time of the k-neighborhood algorithm also minimum [23]. In this work, the performance of outlier detection using feature selection algorithms are evaluated but the results clearly shows that the impact of feature selection algorithm on the cancer dataset is very low and does not improve the overall classification performance.

References
  1. DASH, M., & LIU, H (1997) Feature selection for classification. Intelligent Data Analysis, 131- 156.
  2. R Kohavi, G John, Wrappers for feature subset selection. Artif Intell J Spec Issue Relevance97(1–2), 273–324 (1997)
  3. Simon Hawkins, Hongxing He, Graham Williams and Rohan Baxter, “Outlier Detection Using Replicator Neural Networks, DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery Pages 170-180
  4. Graham Williams, Rohan Baxter, Hongxing He, Simon Hawkins and Lifang Gu, “A Comparative Study of RNN for Outlier Detection in Data Mining”, ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining, Page 709.
  5. YU, L. & LIU, H. (2003) Feature Selection for High Dimensional Data: A Fast Correlation-Based Filter Solution. Proceedings of the Twentieth International Conference on Machine Leaning (ICML-03). Washington, D.C .
  6. Y. Lu and J. Han, "Cancer classification using gene expression data," Information Systems, vol. 28, pp. 243- 268, 2003.
  7. Hodge, V.J. and Austin, J. (2004) A survey of outlier detection methodologies. Artificial Intelligence Review, 22 (2). pp. 85-126.
  8. Huan Liu, Lei Yu (2005) Toward Integrating Feature Selection Algorithms for Classification and Clustering, IEEE Transactions On Knowledge and Data Engineering, VOL. 17, NO. 4, April 2005
  9. Ella Bingham, Aristides Gionis, Niina Haiminen, Heli Hiisil¨a, Heikki Mannila, Evimaria Terzi, ”Segmentation and dimensionality reduction”. 2006 SIAM Conference on Data Mining, pp. 372-383.
  10. Guyon, S Gunn, M Nikravesh, L Zadeh, Feature Extraction, Foundations and Applications (Springer, Berlin, 2006)
  11. Y. Saeys, I. Inza, and P. Larrañaga, "A review of feature selection techniques in bioinformatics," Bioinformatics, vol. 23, pp. 2507-2517, 2007.
  12. A. Faizah Shaari, B. Azuraliza Abu Bakar, C. Abdul Razak Hamdan, "On New Approach in Mining Outlier" Proceedings of the International Conference on Electrical Engineering and Informatics, Indonesia June 17-19, 2007
  13. Y. Song, J. Huang, D. Zhou, H. Zha, and C. Giles, "Iknn: Informative k-nearest neighbor pattern classification," Knowledge Discovery in Databases: PKDD 2007, pp. 248- 264, 2007
  14. Yumin Chen, Duoqian Miao, Hongyun Zhang, "Neighborhood outlier detection", Expert Systems with Applications 37 (2010) 8745-8749, 2010 Elsevier .
  15. Xiaochun Wang, Xia Li Wang, D. Mitch Wilkes, “A Minimum Spanning Tree-Inspired Clustering-Based Outlier Detection Technique”, Advances in Data Mining. Applications and Theoretical Aspects, Lecture Notes in Computer Science Volume 7377, 2012, pp 209-223
  16. Jiawei Han, Micheline Kamber and Jian Pei, "Data Mining Concepts and Techniques (Third Edition)", Morgan Kaufmann Publishers is an imprint of Elsevier, c 2012 by Elsevier Inc.
  17. Binita Kumari (2012) “Feature Subset Selection in large Dimensionality using Correlation based GA-SVM” International Journal of Computer Applications Vol.45. No.6. pp 5-8 May 2012.
  18. Gouda I. Salama, M.B.Abdelhalim, and Magdy Abd-elghany Zeid, Breast Cancer Diagnosis on Three Different Datasets Using Multi-Classifiers, International Journal of Computer and Information Technology (2277 - 0764), Volume 01- Issue 01, September 2012.
  19. L. Rutkowski, L. Pietruczuk, P. Duda, and M. Jaworski, “Decision Trees for Mining Data Streams Based on the McDiarmid’s Bound,” IEEE Trans. Knowledge and Data Eng., vol. 25, no. 6, pp. 1272- 1279, 2013.
  20. Ammu P.K and Preeja V (2013) “ Review on Feature Selection Techniques of DNA Microarray Data” International Journal of Computer Applications ,Vol. 61 No. 12 , pp 39-44 January 2013.
  21. E.T. Venkatesh and A. Kalyana Saravanan (2013)“ New Scheme to identify Intrusion Outliers by Machine learning Technique “ International Journal of Computer Applications , Vol. 84. No.13. pp 13 -16 Dec. 2013
  22. T.Ediwin Prabakaran and S.Venkata Lakshmi (2014) “ Application of K-Nearest Neighbour Classification Method for Intrusion Detection in Network Data” International Journal of Computer Application ,Vol.97-No.7 , pp 34- 37 ,July 2014
  23. Kurian M.J and Gladston Raj S,(2015) ” Outlier Detection in Multidimensional Cancer Data Using Classification Based approach “ International Journal of Applied Engineering Research ,Vol.10, No.79,pp. 342-348 , 2015.
Index Terms

Computer Science
Information Sciences

Keywords

Outlier Gini Index Information Gain Chi-Square