CFP last date
20 December 2024
Reseach Article

An Efficient Method for Predicting the 5-year Survivability of Breast Cancer

by Turan Jahanbazi, Mohammad H. Nadimi
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 155 - Number 8
Year of Publication: 2016
Authors: Turan Jahanbazi, Mohammad H. Nadimi
10.5120/ijca2016912378

Turan Jahanbazi, Mohammad H. Nadimi . An Efficient Method for Predicting the 5-year Survivability of Breast Cancer. International Journal of Computer Applications. 155, 8 ( Dec 2016), 11-19. DOI=10.5120/ijca2016912378

@article{ 10.5120/ijca2016912378,
author = { Turan Jahanbazi, Mohammad H. Nadimi },
title = { An Efficient Method for Predicting the 5-year Survivability of Breast Cancer },
journal = { International Journal of Computer Applications },
issue_date = { Dec 2016 },
volume = { 155 },
number = { 8 },
month = { Dec },
year = { 2016 },
issn = { 0975-8887 },
pages = { 11-19 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume155/number8/26624-2016912378/ },
doi = { 10.5120/ijca2016912378 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-07T00:00:43.613353+05:30
%A Turan Jahanbazi
%A Mohammad H. Nadimi
%T An Efficient Method for Predicting the 5-year Survivability of Breast Cancer
%J International Journal of Computer Applications
%@ 0975-8887
%V 155
%N 8
%P 11-19
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Breast cancer is one of the most sever type of cancers and is the most common cause of death among the female cancer patients. In order to ease the process of decision making and financial arrangements, it is essential to be aware of survivability of patients. In recent years, effective data-mining techniques have been employed to predict the 5-year survivability of cancer patients, showing reasonable accuracy. The efficiency of these models can be improved by making them accessible on smartphones. In order to achieve this, it is essential to reduce the maximum required memory occupied by the prediction models, since a smartphone has a limited available memory. This issue, which is still an open area of research, is the concern of the present study. A hybrid method is enhanced by combining synthetic minority over-sampling technique (SMOTE), information gain attribute evaluation (InfoGainAttributeEval), AdaBoost.M1 algorithm and a decision tree. The more effective attributes are selected using InfoGainAttributeEval and the less effective nodes are removed by decision tree pre-pruning during the tree building. The hybrid method is further simplified by employing the post-pruning technique on the decision tree after its creation. The proposed method was subjected to a 5-year cancer survivability dataset, showing considerable reduction in the maximum required memory while maintaining the accuracy of prediction.

References
  1. C. DeSantis, J. Ma, L. Bryan, and A. Jemal, "Breast cancer statistics, 2013," CA: a cancer journal for clinicians, vol. 64, pp. 52-62, 2014.
  2. K.-J. Wang, B. Makond, K.-H. Chen, and K.-M. Wang, "A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients," Applied Soft Computing, vol. 20, pp. 15-24, 2014.
  3. N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, "SMOTE: synthetic minority over-sampling technique," Journal of artificial intelligence research, vol. 16, pp. 321-357, 2002.
  4. L. Pelayo and S. Dick, "Applying novel resampling strategies to software defect prediction," in NAFIPS 2007-2007 Annual Meeting of the North American Fuzzy Information Processing Society, 2007, pp. 69-72.
  5. X. M. Zhao, X. Li, L. Chen, and K. Aihara, "Protein classification with imbalanced data," Proteins: Structure, function, and bioinformatics, vol. 70, pp. 1125-1132, 2008.
  6. Q. Gu, Z. Cai, and L. Zhu, "Classification of imbalanced data sets by using the hybrid re-sampling algorithm based on isomap," in International Symposium on Intelligence Computation and Applications, 2009, pp. 287-296.
  7. J. Novakovic, "Using information gain attribute evaluation to classify sonar targets," in 17th Telecommunications forum TELFOR, 2009, pp. 24-26.
  8. Y. Freund, R. Schapire, and N. Abe, "A short introduction to boosting," Journal-Japanese Society For Artificial Intelligence, vol. 14, p. 1612, 1999.
  9. J. Thongkam, G. Xu, and Y. Zhang, "AdaBoost algorithm with random forests for predicting breast cancer survivability," in Neural Networks, 2008. IJCNN 2008.(IEEE World Congress on Computational Intelligence). IEEE International Joint Conference on, 2008, pp. 3062-3069.
  10. Y. Liu, D. Zhang, and G. Lu, "Region-based image retrieval with high-level semantics using decision tree learning," Pattern Recognition, vol. 41, pp. 2554-2570, 2008.
  11. K. Park, A. Ali, D. Kim, Y. An, M. Kim, and H. Shin, "Robust predictive model for evaluating breast cancer survivability," Engineering Applications of Artificial Intelligence, vol. 26, pp. 2194-2205, 2013.
  12. D. Delen, G. Walker, and A. Kadam, "Predicting breast cancer survivability: a comparison of three data mining methods," Artificial intelligence in medicine, vol. 34, pp. 113-127, 2005.
  13. A. Bellaachia and E. Guven, "Predicting breast cancer survivability using data mining techniques," Age, vol. 58, pp. 10-110, 2006.
  14. L. Ya-Qin, W. Cheng, and Z. Lu, "Decision tree based predictive models for breast cancer survivability on imbalanced data," in Bioinformatics and Biomedical Engineering, 2009. ICBBE 2009. 3rd International Conference on, 2009, pp. 1-4.
  15. E. Mair, M. Augustine, B. Jäger, A. Stelzer, C. Brand, D. Burschka, et al., "A biologically inspired navigation concept based on the Landmark-Tree map for efficient long-distance robot navigation," Advanced Robotics, vol. 28, pp. 289-302, 2014.
  16. C. Edeki and S. Pandya, "Comparative Study of Data Mining and Statistical Learning Techniques for Prediction of Cancer Survivability," Mediterranean Journal of Social Sciences, 2012.
  17. SEER (2014) Surveillance, Epidemiology, and End Results (SEER) Program(www.seer.cancer.gov) Research Data (1973-2012). National Cancer Institute,DCCPS, Surveillance Research Program, Cancer Statistics Branch, released April 2014 based on the November 2013 submission.
Index Terms

Computer Science
Information Sciences

Keywords

Breast cancer Decision tree Synthetic minority over-sampling technique Information gain attribute evaluation maximum required memory smartphones hybrid method.