International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 57 - Number 1 |
Year of Publication: 2012 |
Authors: Md. Syed Mahamud Hossein |
10.5120/9075-6994 |
Md. Syed Mahamud Hossein . A Compression and Encryption Algorithms on DNA Sequences using R^2CP and modified Huffman Technique. International Journal of Computer Applications. 57, 1 ( November 2012), 1-10. DOI=10.5120/9075-6994
A lossless compression algorithm, for genetic sequences, based on two phase, 1st phase- searching for exact R2CP is reported. The compression results obtained in the algorithm show that the exact R2CP are one of the main hidden regularities in DNA sequences. The proposed DNA sequence compression algorithm is based on R2CP substring and creates online Library file acting as a Look Up Table. The R2CP substring is replaced by corresponding ASCII character. Information security is the most challenging question to protect the data from piracy. This proposed method may protect the data from hackers. For better security purpose we have introduced a new security technique in 2nd phase that is selection encryption method. In this technique the data are encrypted either in the Look Up table or in compressed file or in both. It can also provide the data security, by using ASCII code and online library file acting as a signature. The size of library file is too small with respect to compressed file. Compressing the genome sequence will help to increase the effect of their uses. Speed of encryption and security levels are two important measurements for evaluating any encryption system. Selective encryption, where a part of message is encrypted keeping the remaining part unencrypted, can be a viable proposition for running encryption system in resource constraint. This algorithm is tested on benchmark DNA sequences. The running time of this algorithm is very few second and the complexity is O(n2). The algorithm can approach a compression rate of 3. 447387 bit/base using 1st phase compression technique, again the output of the 1st phase compression are used in 2nd phase compression techniques, at the end ultimate the resultant compression rate of 2. 01 bit/bases. When a user searches for any sequence for an organism, a encrypted compressed sequence file can be sent from the data source to the user. The encrypted compressed file then can be decrypted & decompressed at the client end resulting in reduced transmission time over the Internet. A encrypted compression algorithm that provides a moderately high compression with encryption rate with minimal decryption with decompression time.