A New GC Based HMM Algorithm for Disease Classification

Dr.V.Anuradha; S.K.M.Habeeb; A.Praveena; AmalaPriya

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

A New GC Based HMM Algorithm for Disease Classification

by Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 16 - Number 5

Year of Publication: 2011

Authors: Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya

10.5120/2009-2710

Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya . A New GC Based HMM Algorithm for Disease Classification. International Journal of Computer Applications. 16, 5 ( February 2011), 19-22. DOI=10.5120/2009-2710

@article{ 10.5120/2009-2710,

author = { Dr.V.Anuradha, S.K.M.Habeeb, A.Praveena, AmalaPriya },

title = { A New GC Based HMM Algorithm for Disease Classification },

journal = { International Journal of Computer Applications },

issue_date = { February 2011 },

volume = { 16 },

number = { 5 },

month = { February },

year = { 2011 },

issn = { 0975-8887 },

pages = { 19-22 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume16/number5/2009-2710/ },

doi = { 10.5120/2009-2710 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T20:04:04.732142+05:30

%A Dr.V.Anuradha

%A S.K.M.Habeeb

%A A.Praveena

%A AmalaPriya

%T A New GC Based HMM Algorithm for Disease Classification

%J International Journal of Computer Applications

%@ 0975-8887

%V 16

%N 5

%P 19-22

%D 2011

%I Foundation of Computer Science (FCS), NY, USA

Abstract

This paper presents a hidden markov model which classifies proteins into classes: the normal protein and the diseased proteins. Using a dataset of 50 protein sequences, the method was able to classify the proteins with a better accuracy of 81%. We used the HMM based software called Matlab to train the data. Matlab uses some of the HMM functions to classify the normal and diseased proteins based with the 16 combinations of amino acids. First the patterns are extracted using 2-gram amino acid encoding method. Here we have 16 patterns which codes for GC. Then scores of these 16 patterns are given as an input for hidden markov model. The hidden markov model was trained on two classes of the proteins based on the known patterns and the trained model was used to classify the dataset. Therefore, the method was able to classify the proteins with an accuracy of 81%. The results of this algorithm provide insights that can help biologists and computer scientists design high-performance protein classification systems of high quality.

References

Tom M. Mitchell, 2006, The Discipline of Machine Learning.
Swanson, R. 1984. A unifying concept for the amino acid code. Bull. Math. Biol., 46, 187-207.
Bosnacki, D., ten Eikelder, H.M.M., Hilbers, P.A.J. Genetic Code as a Gray Code Revisited. In the Proceedings of International Conference on Mathematics and Engineering Techniques in Medicine and Biological Sciences.
Bernardi, G. 2000. Isochores and the evolutionary genomics of vertebrates, Gene, 241: 3-17
Aïssani, B., and Bernardi, G. 1991. CpG islands, genes and isochores in the genomes of vertebrates, Gene, 106:185-195.
Oliver, JL. and Marín, A. 2004. A Relationship Between GC Content and Coding-Sequence Length Journal of Molecular Evolution, 43(3)216-223.
Hurst, LD. and Merchant, AR. 2001. High Guanine-Cytosine Content is Not an Adaptation to High Temperature: A Comparative Analysis amongst Prokaryotes Proceedings: Biological Sciences, 268(466) 493-497.
Lingang Zhang., Simon Kasif., Charles R. Cantor and Natalia E. Broude. 2004. GC/AT-content spikes as genomic punctuation marks, 101: 48, 16855–16860.
Yeramian, E and Jones L. 2003. GeneFizz: A web tool to compare genetic (coding/non-coding) and physical (helix/coil) segmentations of DNA sequences. Gene discovery and evolutionary perspectives. Nucleic Acids Res 31: 3843–3849.
Vinogradov, A. E. 2001. Mol. Biol. Evol. 18, 2195–2200.
Mizuno, M., and Kanehisa, M. 1994. Distribution profiles of GC content aroundthe translation initiation site in different species. FEBS Lett 352, 710.
Wang, J. T. L., Ma, Q., Shasha, D., and Wu, C. H. 2001. New techniques for extracting features from protein sequences. IBM: Systems Journal, 40(2):426–441.

Index Terms

Computer Science

Information Sciences

Keywords

HMM Matlab 2-gram