CFP last date
20 December 2024
Reseach Article

A Genetic Algorithm with Clustering for Finding Regulatory Motifs in DNA Sequences

Published on None 2011 by Shripal Vijayvargiya, Pratyoosh Shukla
Artificial Intelligence Techniques - Novel Approaches & Practical Applications
Foundation of Computer Science USA
AIT - Number 1
None 2011
Authors: Shripal Vijayvargiya, Pratyoosh Shukla
9a55046d-fcfb-4198-95e7-5dbb40fd883a

Shripal Vijayvargiya, Pratyoosh Shukla . A Genetic Algorithm with Clustering for Finding Regulatory Motifs in DNA Sequences. Artificial Intelligence Techniques - Novel Approaches & Practical Applications. AIT, 1 (None 2011), 6-10.

@article{
author = { Shripal Vijayvargiya, Pratyoosh Shukla },
title = { A Genetic Algorithm with Clustering for Finding Regulatory Motifs in DNA Sequences },
journal = { Artificial Intelligence Techniques - Novel Approaches & Practical Applications },
issue_date = { None 2011 },
volume = { AIT },
number = { 1 },
month = { None },
year = { 2011 },
issn = 0975-8887,
pages = { 6-10 },
numpages = 5,
url = { /specialissues/ait/number1/2826-207/ },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Special Issue Article
%1 Artificial Intelligence Techniques - Novel Approaches & Practical Applications
%A Shripal Vijayvargiya
%A Pratyoosh Shukla
%T A Genetic Algorithm with Clustering for Finding Regulatory Motifs in DNA Sequences
%J Artificial Intelligence Techniques - Novel Approaches & Practical Applications
%@ 0975-8887
%V AIT
%N 1
%P 6-10
%D 2011
%I International Journal of Computer Applications
Abstract

Identification of Transcription Factor Binding Sites (TFBS) also called as motifs, from the promoter region of genes remains a highly important and unsolved problem of computational biology. Motifs are short, recurring patterns in DNA sequences that are presumed to have a biological function. In this paper, we propose an evolutionary approach to identify transcription factor binding sites. This approach is based on the genetic algorithm with population clustering. A simple genetic algorithm favors selection of fittest, and this selective pressure tends to remove the diversity of population. Sometimes promoter sequences of some genes consists multiple motifs that also need to be identified. The proposed algorithm uses clustering scheme to partition population in clusters and the mating is allowed only within cluster. This scheme enables algorithm to retain diversity of population over the generations, against the selection pressure and to find out multiple motifs in promoter sequences of co-regulated genes. We applied this approach on various data sets and the results show that it can find correct results for binding sites.

References
  1. Lockhart D., Winzeler E., 2000. Genomics, Gene Expression and DNA Arrays. Nature, 405, 827-836.
  2. Stormo G.D., 2000. DNA binding sites: representation and discovery. Bioinformatics, vol 16, 16-23.
  3. V. Matys, E. Fricke, R. Geffers, E. Gssling, M. Haubrock, R. Hehl, K. Hornischer, D. Karas, A.E. Kel, O.V. Kel-Margoulis, D. U. Kloos, S. Land, B. Lewicki-Potapov, H. Michael, R. Munch, I. Reuter, S. Robert, H. Saxel, M. Scheer, S. Thiele and E. Wingender, 2003. TRANSFAC: Transcriptional Regulation, from Patterns to Profiles. Nucleic Acids Research, vol. 31, no. 1, pp. 374-378.
  4. A. Sandelin, W. Alkema, P. Engstrom, W.W. Wasserman, and B. Lenhard, 2004. JASPAR: An Open-Access Database for Eukaryotic Transcription Factor Binding Profiles. Nucleic Acids Research, vol. 32, pp. D91-D94.
  5. Tak Ming Chan, Kwong Sak Leung and Kin Hong Lee, 2008. TFBS identification based on genetic algorithm with combined representations and adaptive post-processing. Bioinformatics, Vol. 24 no. 3, pages 341–349.
  6. Stormo G.D., 1988. Computer methods for analyzing sequence recognition of nucleic acids. Annual Review BioChem, vol 17, 241–263.
  7. Modan K Das and Ho-Kwok Dai, 2007. A survey of DNA motif finding algorithms. BMC Bioinformatics, (Suppl 7), S21.
  8. Bailey T.L. and Elkan C., 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology, AAAI Press, Menlo Park, California, pp. 28-36.
  9. Thompson W., Rouchka E.C. and Lawrence C.E., 2003. Gibbs Recursive Sampler: Finding transcription factor binding sites. Nucleic Acids Research, Vol.31, pp. 3580-3585.
  10. Hertz G.Z., Hartzell G.W. and Stormo G.D. 1990. Identification of consensus patterns in unaligned DNA sequences known to be functionally related. Bioinformatics, Vol.6, pp. 81-92.
  11. Liu F.F.M et al. 2004. FMGA: Finding Motifs by Genetic Algorithm. Proceedings of the Fourth IEEE Symposium on Bioinformatics and Bioengineering, pp.459-466.
  12. Stine M., Dasgupta D. and Mukatira S., 2003. Motif Discovery in Upstream Sequences of Coordinately Expressed Genes. The 2003 Congress on Evolutionary Computation, pp.1596-1603.
  13. Vijayvargiya S., Shukla P., 2011. A Structured Evolutionary Algorithm for Identification of Transcription Factor Binding Sites in Unaligned DNA Sequences. International Journal of Advancements in Technology, Vol 2: No 1, page no. 100 – 107.
  14. Wang T, Stormo GD. 2003. Combining phylogenetic data with coregulated genes to identify regulatory motifs. Bioinformatics, vol 19, pp. 2369-2380.
  15. Sinha S, Blanchette M, Tompa M., 2004. PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics, 5:170.
  16. Wei Z. and Jensen S.T., 2006. GAME: detecting cis-regulatory elements using a genetic algorithm. Bioinformatics, vol 22, pp. 1577–1584.
Index Terms

Computer Science
Information Sciences

Keywords

motif transcription factor regulatory binding sites genetic algorithms clustering regulatory binding sites genetic algorithms clustering