International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 45 - Number 15 |
Year of Publication: 2012 |
Authors: Visakh. R, Lakshmipathi. B |
10.5120/6854-9393 |
Visakh. R, Lakshmipathi. B . Constraint based Cluster Ensemble to Detect Outliers in Medical Datasets. International Journal of Computer Applications. 45, 15 ( May 2012), 9-15. DOI=10.5120/6854-9393
Outlier analysis in medical datasets can reveal very significant traits regarding behavioral pattern of genes. Presence of outliers may indicate symptoms of genetic disorders or mutant tumors. In case of genetic disorders, designing curative medicines is possible only after studying the gene-gene and gene-tumor relationships. This means that identification of outlier observations alone is insufficient to clarify the source of outliers, i. e. to which tumors they are related. Most of the existing works adopt single clustering algorithms to detect outlier patterns from bio-molecular data. However, single clustering algorithms lack robustness, stability and accuracy. This work uses a form of semi-supervised cluster ensemble to analyze outlier patterns based on their relations to clusters. Specifically, the prior knowledge of a dataset is fed to the cluster ensemble in the form of constraints. The clusters produced are analyzed for detecting outliers by filtering out insignificant clusters. Then, the outlier-cluster association is calculated using a fuzzy approach. The combined fuzzy- constraint based cluster ensemble approach can be used to effectively analyze outliers in medical datasets.