International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 95 - Number 14 |
Year of Publication: 2014 |
Authors: S. Rajasekaran, L. Arockiam |
10.5120/16661-6646 |
S. Rajasekaran, L. Arockiam . Frequent Contiguous Pattern Mining Algorithms for Biological Data Sequences. International Journal of Computer Applications. 95, 14 ( June 2014), 15-20. DOI=10.5120/16661-6646
Transaction sequences in market-basket analysis have large set of alphabets with small length, whereas bio-sequences have small set of alphabets of long length with gap. There is the difference in pattern finding algorithms of these two sequences. The chances of repeatedly occurring small patterns are high in bio-sequences than in the transaction sequences. These repeatedly occurring small patterns are called as Frequent Contiguous Patterns (FCP). The challenging task in pattern finding of bio-sequences is to find FCP. FCP gives clues for genetic discovery, functional analysis and also helps to assemble a whole genome of species. Most of the existing FCP algorithms are all based on Apriori method. They require repeated scanning of the database and large number of intermediate tables to produce the results. So, these algorithms require large space and high computational time. In this paper, we are analyzing few of the currently available FCP algorithms with their advantages and disadvantages.