International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 1 - Number 18 |
Year of Publication: 2010 |
Authors: Shyama Das, Sumam Mary Idicula |
10.5120/385-576 |
Shyama Das, Sumam Mary Idicula . Iterative Search with Incremental MSR Difference Threshold for Biclustering Gene Expression Data. International Journal of Computer Applications. 1, 18 ( February 2010), 35-43. DOI=10.5120/385-576
The goal of biclustering in a gene expression data matrix is to find a submatrix such that the genes in the submatrix show highly correlated activities across all conditions in the submatrix. A measure called Mean Squared Residue (MSR) is used to simultaneously evaluate the coherence of rows and columns within a submatrix. In this paper a new method for biclustering gene expression data is developed. In the first step high quality bicluster seeds are generated using K-Means clustering algorithm. Then more genes and conditions (node) are added to the bicluster. Before adding a node the MSR X of the bicluster is calculated. After adding the node again the MSR Y is calculated. The added node is deleted if Y minus X is greater than MSR difference threshold or if Y is greater than d (MSR threshold) which depends on the dataset. The MSR difference threshold is different for gene list and condition list and it depends on the dataset also. Proper values should be identified through experimentation in order to obtain biclusters of large size. Since it is very difficult to calculate the value of MSR difference threshold, in this algorithm an iterative search is used where MSR difference threshold is initialized with a small value and it is incremented after each iteration. A bicluster is obtained from Yeast dataset with a unique structural appearance. This proves that the newly introduced concept of MSR difference threshold will result in high quality biclusters. The results obtained on bench mark datasets prove that this algorithm is better than many of the existing biclustering algorithms.