CFP last date
20 February 2025
Reseach Article

Enhanced Hierarchical Clustering for Gene Expression data

by Geetha.T, Michael Arock
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 1 - Number 22
Year of Publication: 2010
Authors: Geetha.T, Michael Arock
10.5120/436-665

Geetha.T, Michael Arock . Enhanced Hierarchical Clustering for Gene Expression data. International Journal of Computer Applications. 1, 22 ( February 2010), 92-98. DOI=10.5120/436-665

@article{ 10.5120/436-665,
author = { Geetha.T, Michael Arock },
title = { Enhanced Hierarchical Clustering for Gene Expression data },
journal = { International Journal of Computer Applications },
issue_date = { February 2010 },
volume = { 1 },
number = { 22 },
month = { February },
year = { 2010 },
issn = { 0975-8887 },
pages = { 92-98 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume1/number22/436-665/ },
doi = { 10.5120/436-665 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T19:47:54.324494+05:30
%A Geetha.T
%A Michael Arock
%T Enhanced Hierarchical Clustering for Gene Expression data
%J International Journal of Computer Applications
%@ 0975-8887
%V 1
%N 22
%P 92-98
%D 2010
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Micro arrays are used to assess the transcriptome of many biological systems that has generated an enormous amount of data. Cluster analysis is a technique used to group and analyze micro array data. Identification of groups of genes that manifest similar expression patterns is a key step in the analysis of gene expression data. Hierarchical clustering is the one of the clustering techniques used for this purpose. In this paper, we design an enhanced hierarchical clustering algorithm which scans the dataset and calculates distance matrix only once unlike other papers, (up to authors' knowledge). Our main contribution is to reduce time, even when a large database is analyzed. Also, the results of hierarchical clustering are represented as a binary tree which gives clarity in grouping and further helps to find clustered objects easily. Our algorithm is able to retrieve number of clusters with the help of cut distance and measures the quality with validation index in order to obtain the best one; does not require initial parameter like number of clusters.

References
  1. Akinobu Sugiyama., Manabu Kotani., 2002, Analysis of gene expression Data Using Self-organizing Maps and k-means Clustering, IEEE, 1342-1345.
  2. Alon. U, Barkai, D.A., Notterman, K., Gish, S., Ybarra, D., Mack, and Levine, A.J., 1999, Broad Patterns of Gene Expression Revealed by Clustering Analysis of Tumor and Normal Colon Tissues Probed by Oligonucleotide Arrays, In Proc. Natl. Academy of Sciences, 96, 6745-6750.
  3. Ao. S.I, Kevin Y.P, Michael Ng, David Cheung, Fong .P, Ian Melhado and Sham. C., 2005, CLUSTAG: hierarchical clustering and graph methods for selecting SNPs, Oxford University press, 21(5), 1735-1736.
  4. Bandyopadhyay, S., and Maulik, U., 2002. An evolutionary technique based on K-means algorithm for optimal clustering in RN, Information Science, 146, 221-237.
  5. Cheng Y., Church GM., 2000, Biclustering of expression data. Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology (ISMB), 8:93-103, 2000.
  6. Chen, C.Y., and Ye, F., 2004. Particle swarm optimization algorithm and its application to clustering analysis. In Proceedings of the 2004 IEEE International Conference on Networking, Sensing and Control, 789-794.
  7. Clark, F., Olson, 1995, Parallel algorithms for hierarchical clustering Parallel Computing, 21, 1313-1325
  8. Day, W.H,E., and Edelsbrunner, H., 1984, Efficient algorithms for agglomerative hierarchical clustering methods, J. Classification, l(1), 7-24.
  9. Defays, D., 1977, An efficient algorithm for a complete link method, Comput. J, 20, 364-366
  10. Du. Z, Lin. F, 2005, A novel parallelization approach for hierarchical clustering. Parallel Computing, 31, 523-527.
  11. Duran B. S. and Odell. B. S., 1974, Cluster Analysis, A Survey, volume 100 of Lectures Notes in Economics and Mathematical Systems. Springer.
  12. Eisen M., Spellman P., Brown P., Botstein D., 1998, Cluster analysis and display of genome-wide expression patterns. In Proc Natl. Acad. Science USA, 95(25), 14863-14868.
  13. Eisen, M.B., Brown, P.O., 1999, DNA arrays of gene expression, In: methods in enzymology, 303, 179-205.
  14. Getz G., Levine E., and Domany E.,, 2000, Coupled two-way clustering analysis of gene microarray data. In Proc. Natl. Acad. Sci. USA, 97(22), 12079-12084.
  15. Han, J.W., and Kamber. M., 2001, Data Mining Concepts and Techniques. Higher Education Press, Beijing.
  16. Hisashi Koga., Tetsuo Ishibashi., ToshinoriWatanabe., 2007, Fast agglomerative hierarchical clustering algorithm using Locality-Sensitive Hashing, Knowledge Inf. Syst. 12(1), 25-53
  17. Huang, Michael. K. Ng,, 1999, A fuzzy k-modes algorithm for clustering categorical data. IEEE Trans. Fuzzy Systems. 7(4), 446-452.
  18. Jain, A. K., Murty, M. N., and Flynn, P. J., 1999, Data clustering: A review. ACM Computing Surveys, 31(3):264-323.
  19. Karypis G., Han E., Kumar V., 1999. CHAMELEON: hierarchical clustering using dynamic modeling. IEEE Comput. 32(8), 68-75.
  20. Kohonen, T. 1990, The Self-Organizing Map, Proc. IEEE,78( 9),1464-1479.
  21. Kennedy, J., and Eberhart, R. C., 1995, Particle swarm optimization. In proceedings of the IEEE International Joint Conference on Neural Network, 4, 1942-1948.
  22. Le. S.Q., Ho.T.B., 2003, A K-sets clustering algorithm for categorical and mixed data, In. Proc of the 6th SANKEN. Int. Symbosium, 124-128.
  23. Luo F, Tang K, Khan L., 2003, Hierarchical clustering of gene expression data. Proceedings of the Third IEEE Symposium on BioInformatics and BioEngineering.
  24. Marcilio CP de Souto, Ivan G Costa, Daniel SA de Araujo, Teresa B Ludermir, and Alexander Schliep, 2008, Clustering cancer gene expression data: a comparative study, BMC Bioinformatics, 9, 497.
  25. Minsoo Lee, Yun-mi Kim, Yearn Jeong Kim, Yoon- kyung Lee, and Hyejung Yoon , 2007, An Ant-based Clustering system for Knowledge Discovery in DNA Chip Analysis Data , In Proc of WASET, 23, 261-266.
  26. Murthy, C. A., & Chowdhury, N., 1996 . In search of optimal clusters using genetic algorithms. Pattern Recognition Letters, 17, 825-832.
  27. Paterlini, S., and Krink, T., 2006, Differential evolution and particle swarm optimization in partitional clustering, Computational Statistics and Data Analysis, 50, 1220-1247
  28. Raja Loganantharaj, Satish Cheepala, and John Clifford, 2006, Metric for Measuring the Effectiveness of Clustering of DNA Microarray Expression, BMC Bioinformatics; 7(2), S5.
  29. Sathiyabhama, B., Gopalan, 2006, N.P., Enhanced Correlation Search Technique for Clustering Cancer Gene Expression data, WSEAS Transactions on Information Science and Applications 12(3), 2477- 2484.
  30. Shuanhu Wu1,, Alan Wee Chung Liew2., and Hong Yan3., 2005, OPTOC-Based Clustering Analysis of Gene Expression Profiles in Spectral Space, Springer-Verlag Berlin Heidelberg, LNCS 3498, 709-718.
  31. Sibson, R., 1973, SLINK : An Optimally Efficient Algorithm for the Single Link Cluster Method. Computer Journal, 16, pages 30-34, 1973.
  32. Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E.S and Golub, T.R., 1999, Interpreting Patterns of Gene Expression With Self-Organizing Maps: Methods and Application to Hematopoietic Differentiation, Proc. Natl. Academy of Sciences, 96( 6), 2907- 2912.
  33. Vinvent S. Tseng and Ching-Pin Kao, 2005, Efficiently Mining Gene Expression Data via a Novel Parameter less Clustering Method, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2(4), 355-365.
  34. Yi-Tung Kao, Erwie Zahara, I-Wei Kao, 2008, A hybridized approach to data clustering, Expert Systems with Applications, 34, 1754-1762.
Index Terms

Computer Science
Information Sciences

Keywords

Micro array Hierarchical clustering Gene expression data Binary Tree