International Journal of Computer Applications |
Foundation of Computer Science (FCS), NY, USA |
Volume 8 - Number 2 |
Year of Publication: 2010 |
Authors: Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S |
10.5120/1189-1661 |
Smitha Sunil Kumaran Nair, N. V. Subba Reddy, Hareesha K. S . Article:An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences. International Journal of Computer Applications. 8, 2 ( October 2010), 1-6. DOI=10.5120/1189-1661
Amyloidogenic regions in polypeptide chains are associated with a number of diseases. Experimental evidence is compelling in favor of the hypothesis that small segments of proteins are responsible for its amyloidogenic behavior. Thus, identifying these short peptides is critical for understanding diseases associated with protein misfolding and developing sequence-targeted anti-aggregation drugs. The in silico approaches using phenomenological models based on bio-physio-chemical properties of amino acids suffer from “curse of dimensionality”. Therefore, before adopting standard classification algorithms to predict such fibril motifs, the “curse of dimensionality” needs to be solved. The present study evaluates the performance of feature selection algorithms namely filter, wrapper and embedded models in conjunction with Support Vector Machine classifier. We also propose a novel integrated feature selection strategy based on Genetic Algorithm and Support Vector Machine to get an optimal number of features in predicting the amyloid fibril-forming short stretches of peptides. In addition, we investigated the performances of feature selection models that resulted in new and complementary set of properties and concludes that the proposed integrated dimensionality reduction technique outperforms all other methods and achieves the highest sensitivity and specificity of 86% and 82% respectively.