CFP last date
20 December 2024
Reseach Article

Predicting Cellular Protein localization Sites on Ecoli

by Safari Yonasi, Rose Nakasi, Yashik Singh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 181 - Number 13
Year of Publication: 2018
Authors: Safari Yonasi, Rose Nakasi, Yashik Singh
10.5120/ijca2018917723

Safari Yonasi, Rose Nakasi, Yashik Singh . Predicting Cellular Protein localization Sites on Ecoli. International Journal of Computer Applications. 181, 13 ( Aug 2018), 1-8. DOI=10.5120/ijca2018917723

@article{ 10.5120/ijca2018917723,
author = { Safari Yonasi, Rose Nakasi, Yashik Singh },
title = { Predicting Cellular Protein localization Sites on Ecoli },
journal = { International Journal of Computer Applications },
issue_date = { Aug 2018 },
volume = { 181 },
number = { 13 },
month = { Aug },
year = { 2018 },
issn = { 0975-8887 },
pages = { 1-8 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume181/number13/29875-2018917723/ },
doi = { 10.5120/ijca2018917723 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T01:05:51.045085+05:30
%A Safari Yonasi
%A Rose Nakasi
%A Yashik Singh
%T Predicting Cellular Protein localization Sites on Ecoli
%J International Journal of Computer Applications
%@ 0975-8887
%V 181
%N 13
%P 1-8
%D 2018
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Several Machine Learning Classification Techniques have been applied in predicting Protein Localization sites of E.coli using a number of techniques. However, research done is limited to no prediction of Localization sites of Proteins on Ecoli0s minimal dataset with the most informative features obtained using different feature selection techniques. This study investigated several Machine learning Classification and Feature Selection Techniques as applied on Ecoli0s minimal dataset. The implementation of classifiers aided in predicting localization sites of E.coli0s minimal subset using its informative features obtained by feature selection techniques. Results were achieved in four parts including; (Data Collection, Cleaning and Preprocessing), Feature selection where the most informative features are selected, Classification where prediction of the localization of proteins is done and then Evaluation of the Classifiers to assess their performance using a number of measures including Accuracy from Cross-validation, and AUROCC to enable in recommending the best Classifier at the end. Among the Classifiers used, Extra Tree Classifier and Gradient Boosting are seen to be the best at performance followed by Random forest as seen from Precision, Recall and F-measure scores. AdaBoost is the worst at 83%.

References
  1. Nerino Allocati, Michele Masulli, Mikhail F, and et al. Article: Escherichia coli in europe: An overview. International Journal of Environmental Research and Public Health, 10(12):6235–6254, 2013.
  2. A.Nisthana, H.Hannah Inbarani, and E.N. Sathish Kumar. Performance analysis of unsupervised feature selection methods, June 2013. https://arxiv.org/pdf/1306.1326.pdf.
  3. A D Aristoklis and M D George. Analysing the localisation sites of proteins through neural networks ensembles. Neural ComputApplic, 6(162), Jan 2006.
  4. P Trouiller J Pinel B Pcoul, P Chirac. Access to essential drugs in poor countries: a lost battle? Journal of the American Medical Association, 281(4):361–367, 1999.
  5. H Bouziane, B Messabih, and A Chouarfa. Isolation and antibiotic susceptibility of e. coli from urinary tract infections in a tertiary care hospital. International Journal of Computer Theory and Engineering, 5(4), 2013.
  6. H Chih-Wei, C Chih-Chung, and L Chih-Jen Lin. A practical guide to support vector classification. 2010.
  7. HT Debas, R Laxminarayan, and title = SE Straus’.
  8. V H Gunnar N Henrik E Olof, B Sren. Locating proteins in the cell using targetp, signalp and related tools. Nature Protocols, 2:953–971, April 2007.
  9. T. Fawcett. An introduction to roc analysis. Pattern Recogn. Lett, 27(8):861–874, 2006.
  10. D. Gould. Prevention and treatment of escherichia coli infections. Nurs Stand, 24:50–6, 2010. PubMed PMID: 20441035.
  11. S. R. Gunn. Support vector machines for classification and regression.
  12. HafidaBouziane, BelhadriMessabih, and AbdallahChouarfia. Meta-learning for escherichia coli bacteria patterns classification. 2012.
  13. J. He and B. Thiesson. Asymmetric gradient boosting with application to spam filtering. August 2007.
  14. P Horton and K Nakai. Better prediction of protein localization sites with the k nearest neighbours classifier. 1997.
  15. K-W Hsu. A theoretical analysis of why hybrid ensembles work. Computational Intelligence and Neuroscience, July 2017.
  16. MJ Iqbal, I Faye, and BB Samir. Efficient feature selection and classification of protein sequence data in bioinformatics. The Scientific World Journal, 2014.
  17. Z Hui H Trevor J Zhu, SaharonRosset. Multi-class adaboost. Jan 2006.
  18. S. Keerthi, O. Chapelle, and D. DeCoste. Building support vector machines with reduced classifier complexity. Journal of Machine Learning Research, 7:1493–1515, 2006.
  19. R.-H. Li and G. G. Belford. Instability of decision tree classification algorithms. pages 570–575, 2002.
  20. O Mitsunori Li Tao Li, Z Shenghuo. Using discriminant analysis for multi-class classification: an experimental investigation. nowledge and Information Systems, 10(4):453, 2006.
  21. M Lichman. Uci machine learning repository. 2013.
  22. L Liqi, Y Sanjiu, and X Weidong et.all. Prediction of bacterial protein subcellular localization by incorporating various features into chou’s pseaac and a backward feature selection approach. Biochimie, 20(5):100–107, 2014.
  23. M. E. MacIntyre, B. G. Warner, and R. M. Slawson. Escherichia coli control in a surface flow treatment wetland. Journal of Water and Health, 4(2), 2006.
  24. Z Nina and W Lipo. A novel support vector machine with class-dependent features for biomedical data. October 2006.
  25. F. Pedregosa, G. Varoquaux, and A. Gramfort. Scikit-learn: Machine learning in python. Journal of Machine Learning Research, 12:2825–2830, 2011.
  26. title = Q Yanjun’.
  27. FS Brinkman S Rey, JL Gardy. Assessing the precision of high-throughput computational and laboratory approaches for the genome-wide identification of protein subcellular localization in bacteria. BMC Genomics, 6(162), 2005.
  28. R E. Schapire. A brief introduction to boosting. 1999.
  29. S.Sabir. Isolation and antibiotic susceptibility of e. coli from urinary tract infections in a tertiary care hospital. Pakistan Journal of Medical Sciences, 30(2):389–392, 2014.
  30. S Sterckx. Patents and access to drugs in developing countrimid: 1508637es: an ethical analysis. Dev World Bioeth, 4(1):58–75, 2004.
  31. T Akutsu T Tamura. Subcellular location prediction of proteins using support vector machines with alignment of block sequences utilizing amino acid composition. BMC Bioinformatics, 8(466), Jan 2007.
  32. K Tae-Hyun, P Dong-Chul, and W Dong-Min. Multi-class classifier-based adaboost algorithm. Springer, 2011.
  33. Z. Voulgaris and G. D. Magoulas. Extensions of the k nearest neighbour methods for classification problems.
  34. C Yetian. Predicting the cellular localization sites of proteins using decision tree and neural networks. 2016.
  35. J Zhong, J Wang, and W Peng et.all. A feature selection method for prediction essential protein. IEEE, 20(5):491– 499, 2015.
Index Terms

Computer Science
Information Sciences

Keywords

Predicting Ensemble and Non Ensemble Classifiers and Machine Learning Techniques