CFP last date
20 January 2025
Reseach Article

A DNA based Approach to find Closed Repetitive Gapped Subsequences from a Sequence Database

by B.Lavanya, A.Murugan
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 29 - Number 5
Year of Publication: 2011
Authors: B.Lavanya, A.Murugan
10.5120/3558-4893

B.Lavanya, A.Murugan . A DNA based Approach to find Closed Repetitive Gapped Subsequences from a Sequence Database. International Journal of Computer Applications. 29, 5 ( September 2011), 45-49. DOI=10.5120/3558-4893

@article{ 10.5120/3558-4893,
author = { B.Lavanya, A.Murugan },
title = { A DNA based Approach to find Closed Repetitive Gapped Subsequences from a Sequence Database },
journal = { International Journal of Computer Applications },
issue_date = { September 2011 },
volume = { 29 },
number = { 5 },
month = { September },
year = { 2011 },
issn = { 0975-8887 },
pages = { 45-49 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume29/number5/3558-4893/ },
doi = { 10.5120/3558-4893 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:15:02.279731+05:30
%A B.Lavanya
%A A.Murugan
%T A DNA based Approach to find Closed Repetitive Gapped Subsequences from a Sequence Database
%J International Journal of Computer Applications
%@ 0975-8887
%V 29
%N 5
%P 45-49
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

In bioinformatics, the discovery of transcription factor binding affinities is important. This is done by sequence analysis of micro array data. The determination of continuous and gapped motifs accurately from the given long sequence of data, say genetic data is challenging and requires a detailed study. In this paper, we propose an algorithm that can be used for finding short continuous, short gapped, long continuous, long gapped and negative existence of motifs. We propose a new DNA algorithmic approach which solves the accurate determination of motifs both continuous and gapped, parallely with optimum time. Using the proposed algorithm, firstly a modified Position Weight Matrix is generated according to the searched motif pattern, which contains the position of its appearance in the given database, using DNA operations. Then, this Position Weight Matrix is used for searching of continuous and gapped subsequences. The proposed algorithm can be used to search genetic, scientific as well as commercial databases. Implementation results shown the correctness of the algorithm. Finally, the validity of the algorithm is checked and its complexity is analyzed.

References
  1. H.M. Annila, H.Toivonen, and A.I.Verkamo, 1997, Discovery of frequent episodes sequences. Data Mining and Knowledge Discovery, 1(3):259-289.
  2. J. Ayres, J.Flannick, J.Gehrke, and T.Yiu.,2002, Sequential Pattern mining using a bitmap representation. Int. Conf. on Knowledge Discovery and Data Mining, pages 429-435.
  3. C. M. Bergman, J.W. Carlson, and Celniker, 2005, DNase I footprint database: A systematic genome Annotation of Transcription Factor Binding sites in the Fruitfly Bioinformatics, 21 : 1747 – 1749.
  4. J.C. Bryne, E. Valen, MH. Tang, T. Marstrand, O.Winther, da Piedade, A. Krogh, B. Lenhard, and A. Sandelin., 2008, JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update. Nucleic Acid Res, pages 102-6.
  5. Isabelle da Piedade, Man-Hung Eric Tang, and Olivier Elemento., 2009, DISPARE: discriminative pattern refinement for position weight matrices. BMC Bioinformatics, 10(388):1471-2105.
  6. Bolin Ding, David Lo, Jiawei Han, and Siau- Cheng Khoo., 2009, Efficient mining of closed repetitive gapped subsequences from a sequence database. Int. Conf. on Bioinformatics and Biomedical Engineering, pages 1024-1035, june.
  7. D.Lo, S.C.Khoo, and C.Liu., 2007, Efficient mining of iterative patterns for software specification discovery. Int. Conf. on Knowledge Discovery and Data Mining, pages 460-469.
  8. L. Holm et al., 1992, A database of protein structure families with common folding motifs. Protein Science, pages 1691-1698.
  9. G.Hertz and G.Stormo., 1999, Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, 15(7-8):563-577.
  10. Mathieu Giraud and Jean-Stephane Varre. , 2009, Par- allel position weight matrices algorithms. Inter- national Symposium on Parallel and Distributed Computing, pages 65-69.
  11. L.Kyle Jensen, P. Mark Styczynski, Isidore Rigoutsos, and N. Gregory Stephanopoulos, 2006, A generic motif discovery algorithm for sequential data. Bioinformatics, 22(1):21-28.
  12. J.Pei, J.Han, B.Mortazavi-Asl, H.Pinto, Q.Chen, U.Dayal, and M.C.Hsu., 2001 Prefixspan: Mining sequential patterns efficiently by prefix projected pattern growth. Int. Conf. Data Engineering, (215-224).
  13. J.Wang and J.Han, 2004, BIDE: efficient mining of frequent closed sequences. Int. Conf. on Data Engineering, pages 79-90, Aug.
  14. M.H. Kuo and C.D Allis., 1999, In vivo cross-linking and immunoprecipitation for studying dynamic protein:DNA associations in a chromatin envi- ronment. Methods, 19:425-433.
  15. V. Matys, E. Fricke, R. Geffers, E. Gossling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A. Kel, O. Kel-Margoulis, D. Kloos, B. Lewicki- Potapov, H. Michael, R.Munch, I. Reuter, S. Rotert, H. Saxel, M. Scheer, S. Thiele, and E. Wingender, 2003, TRANSFAC: transcriptional regulation from patterns to profiles. Nucleic Acid Res, 31:374-378.
  16. M.El-Ramly, E.stroulia, and P.Sorenson, 2002, From run-time behavior to usage scenarios: an interaction -pattern mining approach. Knowledge Discovery and Data Mining, pages 315-324.
  17. M.Tompa, N.Li, TL.Bailey, GM.Church, B.De Moor, E. Eskin, AV.Favorov, MC.Frith, Y.Fu, WJ.Kent, VJ.Makeev, AA.Mironov, WS.Noble, G.Pavesi, G.Pesole, M.Rgnier, N.Simonis, S.Sinha, G.Thijs, J.Van Helden, M.Vandenbogaert, Z.Weng, C.Workman, C.Ye, and Z.Zhu,2005, Assessing computational tools for the discovery of transcription factor binding sites. Nat Biotechnology, 23:137-144.
  18. A. Murugan and B.Lavanya.,2010, DNA algorithmic approach to solve GCS problem. Journal of Computational Intelligence in Bioinformatics, 3(2) :239-247.
  19. A. Murugan, B.Lavanya, and K. Shyamala, 2011, A novel programming approach for DNA computing, International Journal of Computational Intelligence Research, 7(2):199-209 .
  20. M.Zhang, B.Kao, D.Cheung, and K.Yip., 2005, Mining periodic patterns with gap requirement from sequences. SIGMOD Int. Conf. on Management of Data,. pages 623-633.
  21. Li N and M.Tompa., 2006, Analysis of computational approaches for motif discovery. Algorithms Mol Biol, pages 1-8.
  22. R.Agarwal and R.Srikant., 1995, Mining sequential patterns. Int.Conf. on Data Engineering.
  23. R.Agarwal and R.Srikant., 1996, Mining sequential patterns: Generalizations and performance improvements Extending Data Base Technology, pages 3-17.
  24. Saurabh Sinha, 2006,. On counting position weight matrix matches in a sequence, with application to discriminative motif finding. Bioinformatics, 22(14):454-463.
  25. R. Staden, 1984, Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res, 12:505-519.
  26. G. Stormo, 2000, DNA binding sites: representation and discovery. Bioinformatics, 16:16-23.
  27. M. Tompa., 1999, An exact method for finding short motifs in sequences with application to ribosome binding site problem. Proc. Seventh Int’l Conf Intelligent Systems for Molecular Biology, pages 262-271.
  28. X.Yan, J.Han, and R.Afhar., 2003, Colspan: Min- ing closed sequential patterns in large datasets. SIAM Int. Conf. Data Mining, pages 166-177.
Index Terms

Computer Science
Information Sciences

Keywords

DNA computation DNA operations Motifs PWM