International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 151 - Number 7 |
Year of Publication: 2016 |
Authors: Sita Rani, O. P. Gupta |
10.5120/ijca2016911779 |
Sita Rani, O. P. Gupta . Empirical Analysis and Performance Evaluation of various GPU Implementations of Protein BLAST. International Journal of Computer Applications. 151, 7 ( Oct 2016), 22-27. DOI=10.5120/ijca2016911779
Bioinformatics applications are compute and data intensive by nature. As the size of molecular databases is growing from day to day experiments performed in the field of molecular biology, thoughtful steps need to be taken to exploit various methods to accelerate bioinformatics applications. Many efforts have already been put in the field to optimize most of the bioinformatics algorithms. By incorporating Graphical Processing Units (GPUs), many bioinformatics applications have benefited hugely. Compute Unified Device Architecture (CUDA) is a hardware and software platform, used to exploit multi-threaded architecture of GPUs. Basic Local Alignment Search Tool (BLAST) is one of the most frequently used algorithms for bioinformatics applications. Different GPU implementations of protein BLAST have already been proposed by different authors. For each implementation, the authors claimed different speedups. But these implementations are on different hardware platforms and also were experimented with different databases, so it’s difficult to compare their performance accurately. In this paper four different GPU implementations of protein BLAST are explored in detail. To compare their performance, these GPU versions of BLAST are implemented on a common hardware platform, i.e. NVIDIA M2050 GPU with 448 processing cores, 3GB of memory and two hex-core Intel, Xeon 2.93 GHz processors. Experiments are performed on 2.38 GB protein database. Performance is analyzed and compared with standard NCBI-BLASTP. Parameter considered for performance analysis and comparison is the execution time. In the current environment speedup obtained by different implementations varied from 2.3X to 9.8X.