International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 71 - Number 24 |
Year of Publication: 2013 |
Authors: Briti Deb, Satish Narayana Srirama |
10.5120/12691-9486 |
Briti Deb, Satish Narayana Srirama . Parallel K-Means Clustering for Gene Expression Data on SNOW. International Journal of Computer Applications. 71, 24 ( June 2013), 26-30. DOI=10.5120/12691-9486
The exponential growth in the amount of data brings in new challenges for data analysis. Gene expression dataset is one such type of data necessitating analytical methods to mine patterns implicit in it. Although clustering has been a popular way to analyze such dataset, the increase in size of dataset necessitates the need for improving the efficiency of clustering methods. In this paper, we study the use of using Principal Components (PCs) as a pre-processing step to provide a more efficient data structure to a parallel formulation of the sequential K-Means algorithm, utilizing multiple cores available in a desktop computer, via the Simple Network of Workstations (SNOW) package. Initial result suggests that SNOW package provides an intuitive way for biologists to parallelize algorithms and speedup job execution, particularly for jobs like K-Means clustering which depends on random starting centroid locations.