International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 70 - Number 4 |
Year of Publication: 2013 |
Authors: Christiane Hespel, Farida Benmakrouha, Danielle Quichaud |
10.5120/11947-7763 |
Christiane Hespel, Farida Benmakrouha, Danielle Quichaud . The Weighted Factors Automaton : A Tool for DNA Sequences Analysis. International Journal of Computer Applications. 70, 4 ( May 2013), 1-7. DOI=10.5120/11947-7763
A lot of computing tools are often used for analyzing DNA sequences like trees, automata, dictionaries, every one being reserved for a particular problem. A. Blumer and al. have proposed a more general computing tool : the smaller automaton recognizing the subwords of a text (DAWG). In this paper we propose the concept of "weighted factors automaton" producing every occurrence of any factor. Its transitions are labeled by the read letter and weighted by the set of the indices of the factors beginnings. The factors are obtained by concatenating the read letters and the indices of the factors beginnings are obtained by computing the intersection of the weighting sets, when advancing from the initial state to a final state. We think that this automaton can be more easily processed than DAWG and we present a comparison between DAWG and our automaton: the set of the factors beginnings indices and the factors frequency are more easily obtained by our automaton and the restriction of our automaton to the factors of length k maintains the automaton structure, when DAWG cannot be easily restricted. The applications are numerous: By selecting factors of length 1, we obtain the coding regions, factors of length 3, we obtain the expression level of some gene. The "weighted factors automaton" allows us to find matches of pattern, to study homology, FASTA and BLAST algorithms being significantly simplified