International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 78 - Number 9 |
Year of Publication: 2013 |
Authors: Robinson Silvester. A, J. Cruz Antony, M. Pratheepa |
10.5120/13516-1295 |
Robinson Silvester. A, J. Cruz Antony, M. Pratheepa . Fast and Efficient Hashing for Sequence Similarity Search using Substring Extraction in DNA Sequence Databases. International Journal of Computer Applications. 78, 9 ( September 2013), 13-17. DOI=10.5120/13516-1295
Emergent interest in genomic research has resulted in the creation of huge biological sequence databases, however search and retrieval of relevant information from these databases takes a lot of processing time, when performed conventionally as size of databases containing DNA sequences is huge. Hence, providing an efficient searching mechanism is mandatory. In this paper we present an efficient search mechanism using Hashing techniques. Initially, the data is hashed and indexed according to different window sizes. During this process, we eliminate redundancies and only record patterns with distinct elements and provide them with corresponding hash values. During the search phase, the search string is checked for the size of the window and if it exceeds the maximum limit of 4, then it is divided. The first part is considered as the search string and the search is made. After the confirmation of the index, the strings that follow the current indexed string are matched with the search string and finally the confirmation is made. The simulation results show that the current methodology provides faster results, while occupying lesser memory.