CFP last date
20 January 2025
Reseach Article

Efficient Clustering for Gene Expression Data

by Jacinth Salome J, R M Suresh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 47 - Number 5
Year of Publication: 2012
Authors: Jacinth Salome J, R M Suresh
10.5120/7186-9925

Jacinth Salome J, R M Suresh . Efficient Clustering for Gene Expression Data. International Journal of Computer Applications. 47, 5 ( June 2012), 30-35. DOI=10.5120/7186-9925

@article{ 10.5120/7186-9925,
author = { Jacinth Salome J, R M Suresh },
title = { Efficient Clustering for Gene Expression Data },
journal = { International Journal of Computer Applications },
issue_date = { June 2012 },
volume = { 47 },
number = { 5 },
month = { June },
year = { 2012 },
issn = { 0975-8887 },
pages = { 30-35 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume47/number5/7186-9925/ },
doi = { 10.5120/7186-9925 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:41:07.344798+05:30
%A Jacinth Salome J
%A R M Suresh
%T Efficient Clustering for Gene Expression Data
%J International Journal of Computer Applications
%@ 0975-8887
%V 47
%N 5
%P 30-35
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In the past decade there have been advance in technologies, the amount of biological data such as DNA sequences and microarray data have been increased tremendously. To obtain knowledge from the data, explore relationships between genes, understanding severe diseases and development of drugs for patterns from the databases of large size and high dimensionality. Information retrieval and data mining are powerful tools to extract information from the databases and/or information repositories. The integrative cluster analysis of both clinical and gene expression data has shown to be an effective alternative to overcome the abovementioned problems. In this paper, we focus on how to improve the searching and the clustering performance in genomic data from commonly used clustering techniques. In the proposed gene clustering technique, firstly, the high dimensionality of the microarray gene data is reduced using LPP. The LPP is chosen for the dimensionality reduction because of its ability of preserving locality of neighborhood relationship. Secondly, through performance experiments on real data sets, the proposed method fuzzy C-means is shown to achieve higher efficiency, clustering quality and automation than other clustering method.

References
  1. Satchidananda Dehuri and Sung-Bae Cho, "Multi-objective Classification Rule mining Using Gene Expression Programming," in proceedings of Third International Conference on convergence and Hybrid Information Technology, Vol. 2, pp. 754-760, 11-13 November, Busan, 2008. .
  2. Andrew K. Rider, Geoffrey Siwo, Scott J. Emrich, Michael T. Ferdig, Nitesh V, "A Supervised Learning Approach to the Ensemble Clustering of Genes", International Journal of Data Mining and Bioinformatics, Vol. 3, No. 3, pp. 229-259, 2009.
  3. Sushmita Mitra, Sankar K. Pal and Pabitra Mitra, "Data Mining in Soft Computing Framework: A Survey," IEEE Transactions On Neural Networks, Vol. 13, No. 1, 2002.
  4. Slavkov, I. , Dzeroski, S. , Struyf, J. , Loskovska, S. "Constrained Clustering Of Gene Expression Profiles" in Proceedings of the Conference on Data Mining and Data Warehouses at the 7th International Multi-conference on Information Society, pp. 212-215, October 10-17, Slovenia, 2005. Sannella, M. J. 1994 Constraint Satisfaction and Debugging for Interactive User Interfaces. Doctoral Thesis. UMI Order Number: UMI Order No. GAX95-09398. , University of Washington.
  5. Prabhjot Kaur, Anjana Gosain "A density oriented fuzzy C-means clustering algorithm for recognising original cluster shapes from noisy" International Journal of Innovative Computing and Applications 2011 - Vol. 3, No. 2 pp. 77 - 87
  6. Y. Y. Leung and Y. S. Hung, "An Integrated Approach To Feature Selection And Classification For Microarray Data With Outlier Detection," in proceedings of 8th Annual International Conference on Computational Systems Bioinformatics, August 10-12, 2009
  7. Jian J. Dai, Linh Lieu, and David Rocke, "Dimension reduction for classification with gene expression microarray data," Statistical Applications in Genetics and Molecular Biology, Vol. 5, No. 1, pp. 1–21, 2006.
  8. D. Napoleon, S. Pavalakodi, "A New Method for Dimensionality Reduction using KMeans Clustering Algorithm for High Dimensional Data Set", International Journal of Computer Applications Volume 13– No. 7,pp. 41-46 January 2011
  9. P. Valarmathie, Dr MV Srinath, Dr T. Ravichandran. "Hybrid Fuzzy C-Means Clustering Technique for Gene Expression Data", International Journal of Research and Reviews in Apld Sci, Vol 1, No 1, pp. 33-37, October 09
  10. Jian Wen, "Ontology Based Clustering for Improving Genomic IR", Twentieth IEEE International Symposium on Computer-Based Medical Systems, pp. 225 – 230, June 07
  11. Yuen, Man-chun, "Genomic sequence search and clustering using Q-gram", Bioinformatics thesis 2007.
  12. Wai-Ho Au, Keith C. C. Chan, Andrew K. C. Wong and Yang Wang, "Attribute clustering for grouping, selection, and classification of gene expression data," IEEE/ACM Transactions on Computational Biology and Bioinformatics, Vol. 2, No. 2, pp. 83-101, 2005.
  13. Jacinth Salome and Suresh, "An Effective Classification Technique for Microarray Gene Expression by Blending
  14. of LPP and SVM", European Journal of Scientific Research, Vol. 64, No. 1, pp. 34-43, 2011
  15. X. He and P. Niyogi, "Locality preserving projections," in Advances in Neural Information Processing Systems, Cambridge, MA: MIT Press, 2003.
  16. Microarray gene samples of human acute leukemia and colon cancer data http://www. broadinstitute. org/cgi-bin/cancer/datasets. cgi
Index Terms

Computer Science
Information Sciences

Keywords

Clustering Microarray Locality Preserving Projection (lpp) Fuzzy C-means (fcm) K-means