CFP last date
20 December 2024
Reseach Article

Mining Maximal Adjacent Frequent Patterns from DNA Sequences using Location Information

by Moin Mahmud Tanvee, Shaikh Jeeshan Kabeer, Tareque Mohmud Chowdhury, Asif Ahmed Sarja, Md. Tayeb Hasan Shuvo
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 76 - Number 15
Year of Publication: 2013
Authors: Moin Mahmud Tanvee, Shaikh Jeeshan Kabeer, Tareque Mohmud Chowdhury, Asif Ahmed Sarja, Md. Tayeb Hasan Shuvo
10.5120/13322-0819

Moin Mahmud Tanvee, Shaikh Jeeshan Kabeer, Tareque Mohmud Chowdhury, Asif Ahmed Sarja, Md. Tayeb Hasan Shuvo . Mining Maximal Adjacent Frequent Patterns from DNA Sequences using Location Information. International Journal of Computer Applications. 76, 15 ( August 2013), 26-32. DOI=10.5120/13322-0819

@article{ 10.5120/13322-0819,
author = { Moin Mahmud Tanvee, Shaikh Jeeshan Kabeer, Tareque Mohmud Chowdhury, Asif Ahmed Sarja, Md. Tayeb Hasan Shuvo },
title = { Mining Maximal Adjacent Frequent Patterns from DNA Sequences using Location Information },
journal = { International Journal of Computer Applications },
issue_date = { August 2013 },
volume = { 76 },
number = { 15 },
month = { August },
year = { 2013 },
issn = { 0975-8887 },
pages = { 26-32 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume76/number15/13322-0819/ },
doi = { 10.5120/13322-0819 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:48:45.257960+05:30
%A Moin Mahmud Tanvee
%A Shaikh Jeeshan Kabeer
%A Tareque Mohmud Chowdhury
%A Asif Ahmed Sarja
%A Md. Tayeb Hasan Shuvo
%T Mining Maximal Adjacent Frequent Patterns from DNA Sequences using Location Information
%J International Journal of Computer Applications
%@ 0975-8887
%V 76
%N 15
%P 26-32
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The rapid development of bioinformatics has resulted in the explosion of DNA sequence data which is characterized by large number of items. Studies have shown that biological functions are dictated by contagious portions of the DNA sequence. Finding contiguous frequent patterns from long data sequences such as DNA sequences is a particularly challenging task and can pave the way towards new breakthroughs. Apriori based techniques were among the first to be used in frequent contagious pattern mining. Later improved approaches like GSP, Prefix Span were also applied but the approaches required either large number of sequence scans, generated large number of candidates or required higher number of intermediate sequential patterns. In this paper an improvement of the positional based approach for contagious frequent pattern mining is DNA sequences is proposed. The proposed algorithm improves the existing positional based approach by introducing a new amalgamated sorting and joining technique which helps to reduce time and space complexity. The proposed approach outperforms traditional existing contiguous frequent mining approaches.

References
  1. Shuang Bai, Si-Xue Bai, "The Maximal Frequent Pattern Mining of DNA Sequence", GrC, pp 23-26, 2009.
  2. T. H Kang, J. S Yoo and H, Y Kim, "Mining frequent contiguous sequence patterns in biological sequences", in proceeding of the 7th IEEE International Conference on Bioinformatics and Bioengineering, pp 723-8, 2007.
  3. R. Wanger and M. Fischer "The string-to-string Correction Problem" J. of the ACM (JACM), Vol 21, No 1, pp. 168-173, 1974.
  4. D. Hirschberg "Algorithms for the longest common subsequence problem" J. of the ACM (JACM). Vol 24, No 4, pp. 664-675, 1977.
  5. M. Garofalakis, R. Rastogi, and K. Shim, "Spirit: Sequential pattern mining with regular expression constraints. " In Proc. 1999 Int. Conf. Very Large Data Bases (VLDB'99), pages 223–234, Edinburgh, UK, Sept. 1999.
  6. R. Srikant and R. Agrawal, "Mining sequential patterns: Generalizations and performance improvements. " In Proc. 5th Int. Conf. Extending Database Technology (EDBT'96), pages 3–17, Avignon, France, Mar. 1996.
  7. R. Agrawal and R. Srikant, "Fast algorithms for mining association rules. " In Proc. 1994 Int. Conf. Very Large Databases (VLDB'94), pages 487–499, Santiago, Chile, Sept. 1994.
  8. J. Pei, J. Han, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. C. Hsu, "PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth. " InICDE'01, Germany, April 2001.
  9. J. Pan, p. Wang, W. Wang, B. Shi and G. Yang, "Efficient algorithms for mining maximal frequent concatenate sequences in biological datasets", in proceeding of the fifth International Conference on Computer and Information Technology (CIT), , pp 98-104, 2005.
  10. . Zerin SF, Ahmed CF, Tanbeer SK, Jeong BS, "A fast in-dexed-based contiguous sequential pattern mining tech-nique in biological data sequences. " In: Proceeding of 2nd International Conference on Emerging Databases (EBD'10), 2010 Aug 30-31, Jeju.
  11. Rashid MM, Karim MR, Hossain MA, Jeong BS, "An ef-ficient approach for mining significant contiguous fre-quent patterns in biological sequences. " In: Proceeding of 3rd International Conference on Emerging Databases (EBD'11), 2011 Aug 25-27, Incheon
  12. Rashid MM, Karim MR, Hossain MA, Jeong BS and -Jin Choi, "Efficient Mining of Interesting Patterns in Large Biological Sequences. " Genomics & Informatics Vol. 10(1) 44-50, March 2012
  13. Jiawei Han & Micheline Kamber , "Data Mining Concepts and Techniques", Elsevier, 2006.
  14. Pan-Ning Tan, Vipin Kumar, Michael Steinbach , " Introduction to Data Mining", Pearson Education Inc, 2006
Index Terms

Computer Science
Information Sciences

Keywords

DNA Sequence Data Maximal Contagious Frequent Pattern Mining Sequence Information.