International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 76 - Number 15 |
Year of Publication: 2013 |
Authors: Moin Mahmud Tanvee, Shaikh Jeeshan Kabeer, Tareque Mohmud Chowdhury, Asif Ahmed Sarja, Md. Tayeb Hasan Shuvo |
10.5120/13322-0819 |
Moin Mahmud Tanvee, Shaikh Jeeshan Kabeer, Tareque Mohmud Chowdhury, Asif Ahmed Sarja, Md. Tayeb Hasan Shuvo . Mining Maximal Adjacent Frequent Patterns from DNA Sequences using Location Information. International Journal of Computer Applications. 76, 15 ( August 2013), 26-32. DOI=10.5120/13322-0819
The rapid development of bioinformatics has resulted in the explosion of DNA sequence data which is characterized by large number of items. Studies have shown that biological functions are dictated by contagious portions of the DNA sequence. Finding contiguous frequent patterns from long data sequences such as DNA sequences is a particularly challenging task and can pave the way towards new breakthroughs. Apriori based techniques were among the first to be used in frequent contagious pattern mining. Later improved approaches like GSP, Prefix Span were also applied but the approaches required either large number of sequence scans, generated large number of candidates or required higher number of intermediate sequential patterns. In this paper an improvement of the positional based approach for contagious frequent pattern mining is DNA sequences is proposed. The proposed algorithm improves the existing positional based approach by introducing a new amalgamated sorting and joining technique which helps to reduce time and space complexity. The proposed approach outperforms traditional existing contiguous frequent mining approaches.