CFP last date
20 January 2025
Reseach Article

A New GC Based HMM Algorithm for Disease Classification

by Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 16 - Number 5
Year of Publication: 2011
Authors: Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya
10.5120/2009-2710

Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya . A New GC Based HMM Algorithm for Disease Classification. International Journal of Computer Applications. 16, 5 ( February 2011), 19-22. DOI=10.5120/2009-2710

@article{ 10.5120/2009-2710,
author = { Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya },
title = { A New GC Based HMM Algorithm for Disease Classification },
journal = { International Journal of Computer Applications },
issue_date = { February 2011 },
volume = { 16 },
number = { 5 },
month = { February },
year = { 2011 },
issn = { 0975-8887 },
pages = { 19-22 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume16/number5/2009-2710/ },
doi = { 10.5120/2009-2710 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T20:04:04.732142+05:30
%A Dr.V.Anuradha
%A S.K.M.Habeeb
%A A.Praveena
%A AmalaPriya
%T A New GC Based HMM Algorithm for Disease Classification
%J International Journal of Computer Applications
%@ 0975-8887
%V 16
%N 5
%P 19-22
%D 2011
%I Foundation of Computer Science (FCS), NY, USA
Abstract

This paper presents a hidden markov model which classifies proteins into classes: the normal protein and the diseased proteins. Using a dataset of 50 protein sequences, the method was able to classify the proteins with a better accuracy of 81%. We used the HMM based software called Matlab to train the data. Matlab uses some of the HMM functions to classify the normal and diseased proteins based with the 16 combinations of amino acids. First the patterns are extracted using 2-gram amino acid encoding method. Here we have 16 patterns which codes for GC. Then scores of these 16 patterns are given as an input for hidden markov model. The hidden markov model was trained on two classes of the proteins based on the known patterns and the trained model was used to classify the dataset. Therefore, the method was able to classify the proteins with an accuracy of 81%. The results of this algorithm provide insights that can help biologists and computer scientists design high-performance protein classification systems of high quality.

References
  1. Tom M. Mitchell, 2006, The Discipline of Machine Learning.
  2. Swanson, R. 1984. A unifying concept for the amino acid code. Bull. Math. Biol., 46, 187-207.
  3. Bosnacki, D., ten Eikelder, H.M.M., Hilbers, P.A.J. Genetic Code as a Gray Code Revisited. In the Proceedings of International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences.
  4. Bernardi, G. 2000. Isochores and the evolutionary genomics of vertebrates, Gene, 241: 3-17
  5. Aïssani, B., and Bernardi, G. 1991. CpG islands, genes and isochores in the genomes of vertebrates, Gene, 106:185-195.
  6. Oliver, JL. and Marín, A. 2004. A Relationship Between GC Content and Coding-Sequence Length Journal of Molecular Evolution, 43(3)216-223.
  7. Hurst, LD. and Merchant, AR. 2001. High Guanine-Cytosine Content is Not an Adaptation to High Temperature: A Comparative Analysis amongst Prokaryotes Proceedings: Biological Sciences, 268(466) 493-497.
  8. Lingang Zhang., Simon Kasif., Charles R. Cantor and Natalia E. Broude. 2004. GC/AT-content spikes as genomic punctuation marks, 101: 48, 16855–16860.
  9. Yeramian, E and Jones L. 2003. GeneFizz: A web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives. Nucleic Acids Res 31: 3843–3849.
  10. Vinogradov, A. E. 2001. Mol. Biol. Evol. 18, 2195–2200.
  11. Mizuno, M., and Kanehisa, M. 1994. Distribution profiles of GC content aroundthe translation initiation site in different species. FEBS Lett 352, 7­10.
  12. Wang, J. T. L., Ma, Q., Shasha, D., and Wu, C. H. 2001. New techniques for extracting features from protein sequences. IBM: Systems Journal, 40(2):426–441.
Index Terms

Computer Science
Information Sciences

Keywords

HMM Matlab 2-gram