We apologize for a recent technical issue with our email system, which temporarily affected account activations. Accounts have now been activated. Authors may proceed with paper submissions. PhDFocusTM
CFP last date
20 November 2024
Reseach Article

Constraint based Cluster Ensemble to Detect Outliers in Medical Datasets

by Visakh. R, Lakshmipathi. B
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 45 - Number 15
Year of Publication: 2012
Authors: Visakh. R, Lakshmipathi. B
10.5120/6854-9393

Visakh. R, Lakshmipathi. B . Constraint based Cluster Ensemble to Detect Outliers in Medical Datasets. International Journal of Computer Applications. 45, 15 ( May 2012), 9-15. DOI=10.5120/6854-9393

@article{ 10.5120/6854-9393,
author = { Visakh. R, Lakshmipathi. B },
title = { Constraint based Cluster Ensemble to Detect Outliers in Medical Datasets },
journal = { International Journal of Computer Applications },
issue_date = { May 2012 },
volume = { 45 },
number = { 15 },
month = { May },
year = { 2012 },
issn = { 0975-8887 },
pages = { 9-15 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume45/number15/6854-9393/ },
doi = { 10.5120/6854-9393 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:37:39.208008+05:30
%A Visakh. R
%A Lakshmipathi. B
%T Constraint based Cluster Ensemble to Detect Outliers in Medical Datasets
%J International Journal of Computer Applications
%@ 0975-8887
%V 45
%N 15
%P 9-15
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Outlier analysis in medical datasets can reveal very significant traits regarding behavioral pattern of genes. Presence of outliers may indicate symptoms of genetic disorders or mutant tumors. In case of genetic disorders, designing curative medicines is possible only after studying the gene-gene and gene-tumor relationships. This means that identification of outlier observations alone is insufficient to clarify the source of outliers, i. e. to which tumors they are related. Most of the existing works adopt single clustering algorithms to detect outlier patterns from bio-molecular data. However, single clustering algorithms lack robustness, stability and accuracy. This work uses a form of semi-supervised cluster ensemble to analyze outlier patterns based on their relations to clusters. Specifically, the prior knowledge of a dataset is fed to the cluster ensemble in the form of constraints. The clusters produced are analyzed for detecting outliers by filtering out insignificant clusters. Then, the outlier-cluster association is calculated using a fuzzy approach. The combined fuzzy- constraint based cluster ensemble approach can be used to effectively analyze outliers in medical datasets.

References
  1. M. I. Petrovskiy, "Outlier Detection Algorithms in data mining systems", Programming and Computer Software, Vol. 29, No. 4, 2003.
  2. Z. He, X. Xu, S. Deng, "Discovering Cluster-Based Local Outliers", Pattern Recognition Letters, June 2003.
  3. N. A. Yousri, M. S. Kamel and M. Ismail, "A Fuzzy Approach for Analyzing Outliers in Gene Expression Data", International Conf. on BioMedical Engg. And Informatics, 2008.
  4. Moh'd B. Al. Zoubi, "An Effective Clustering-Based Approach for Outlier Detection", European Journal for Scientific Research, 2009.
  5. A. Strehl and J. Ghosh, "Cluster Ensembles — A Knowledge Reuse Framework for Combining Multiple Partitions", Journal of Machine Learning Research 3, pp. 583-617, 2002.
  6. X. Z. Fern and C. E. Brodley, "Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach", Proc. 20th Int'l Conf. Machine Learning, pp. 186-193, 2003.
  7. Z. Yu, H. S. Wong, J. You, Q. Yang, H. Liao, "Knowledge based Cluster Ensemble for Cancer Discovery from Bio-Molecular Data", IEEE Transactions on Nanobioscience, 2011.
  8. S. Monti, P. Tamayo, J. Mesirov, and T. Golub, "Consensus Clustering: A Resampling Based Method for Class Discovery and Visualization of Gene Expression Microarray Data", J. Machine Learning, vol. 52, pp. 1-2, 2003.
  9. D. Greene, A. Tsymbal, N. Bolshakova, and P. Cunningham, "Ensemble Clustering in Medical Diagnostics", Technical Report TCD-CS- 2004-12, Dept. of Computer Science, Trinity College, Dublin, Ireland, 2004.
  10. Han, J. and Kamber, M. , Data Mining: Concepts and Techniques, Morgan Kaufmann, Second edition, 2006.
  11. A. Asuncion, and D. J. Newman, "UCI Machine Learning Repository [http://www. ics. uci. edu/mlearn/MLRepository. html]," Irvine, CA: University of California, School of Information and Computer Science. 2007.
  12. K. Wagstaff, C. Cardie, S. Rojers and S. Schroedl, "Constrained K-means Clustering with Background Knowledge", Eighteenth International Conference on Machine Learning, 2001.
Index Terms

Computer Science
Information Sciences

Keywords

Cluster Ensemble Spectral Clustering Fuzzy-combined Constraint Based Cluster Ensemble Outlier Detection Constraints Semi-supervised Clustering