Parallel Computing to Predict Breast Cancer Recurrence on SEER Dataset using Map-Reduce Approach

Umesh D. R.; B. Ramachandra

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Parallel Computing to Predict Breast Cancer Recurrence on SEER Dataset using Map-Reduce Approach

by Umesh D. R., B. Ramachandra

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 149 - Number 12

Year of Publication: 2016

Authors: Umesh D. R., B. Ramachandra

10.5120/ijca2016911669

Umesh D. R., B. Ramachandra . Parallel Computing to Predict Breast Cancer Recurrence on SEER Dataset using Map-Reduce Approach. International Journal of Computer Applications. 149, 12 ( Sep 2016), 31-35. DOI=10.5120/ijca2016911669

@article{ 10.5120/ijca2016911669,

author = { Umesh D. R., B. Ramachandra },

title = { Parallel Computing to Predict Breast Cancer Recurrence on SEER Dataset using Map-Reduce Approach },

journal = { International Journal of Computer Applications },

issue_date = { Sep 2016 },

volume = { 149 },

number = { 12 },

month = { Sep },

year = { 2016 },

issn = { 0975-8887 },

pages = { 31-35 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume149/number12/26052-2016911669/ },

doi = { 10.5120/ijca2016911669 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T23:54:36.473323+05:30

%A Umesh D. R.

%A B. Ramachandra

%T Parallel Computing to Predict Breast Cancer Recurrence on SEER Dataset using Map-Reduce Approach

%J International Journal of Computer Applications

%@ 0975-8887

%V 149

%N 12

%P 31-35

%D 2016

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Due to the late overpowering development rate of large scale data, the advancement of handling faster processing algorithms with optimal execution has turned into a critical need of the time. In this paper, parallel Map-Reduce algorithm is proposed, that encourages concurrent participation of various computing hubs to develop a classifier on SEER breast cancer data set. Our algorithm can prompt supported models whose speculation execution is near the respective baseline classifier. By exploiting their own parallel architecture the algorithm increases noteworthy speedup. In addition, the algorithm don't require singular processing hubs to communicate with each other, to share their data or to share the knowledge got from their data and consequently, they are powerful in safeguarding privacy of computation also. This paper utilized the Map-Reduce framework to implement the algorithms and experimented onSEER breast cancer data sets to exhibit the execution as far as classification accuracy and speedup.

References

Bacardit J, Llorà X (2013) Large-scale data mining using genetics-based machine learning. Wiley Interdiscip Rev Data Min Knowl Disc 3(1):37–61.
Chang EY, Bai H, Zhu K (2009) Parallel algorithms for mining large-scale rich-media data. In: Proceedings of the 17th ACM International Conference on Multimedia. ACM, New York, NY, USA. pp 917–918.
Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1):107–113.
White T (2012) Hadoop: The Definitive Guide. " O’Reilly Media, Inc.", California.
Venner J, Cyrus S (2009) Pro Hadoop. vol. 1. Springer, New York.
Lam C (2010) Hadoop in Action. Manning Publications Co., New York.
Chu C, Kim SK, Lin YA, Yu Y, Bradski G, Ng AY, Olukotun K (2007) Map-reduce for machine learning on multicore. Advance neural Info processing systems 19:281.
Kearns M (1998) efficient noise-tolerant learning from statistical queries. J ACM (JACM) 45(6):983–1006.
Malewicz G, Austern MH, Bik AJ, Dehnert JC, Horn I, Leiser N, Czajkowski G (2010) Pregel: a system for large-scale graph processing. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, USA. pp 135–146.
Bu Y, Howe B, Balazinska M, Ernst MD (2010) Haloop: Efficient iterative data processing on large clusters. Proc of the VLDB Endowment 3(1-2):285–296.
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae SH, Qiu J, Fox G (2010) Twister: a runtime for iterative mapreduce. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. ACM, New York, NY, USA. pp 810–818.
Agarwal A, Chapelle O, Dudík M, Langford J (2014) A reliable effective terascale linear learning system. J Mach Learn Res 15:1111–1133.
Zaharia M, Chowdhury M, Das T, Dave A, Ma J, McCauley M, Franklin MJ, Shenker S, Stoica I (2012) Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In: Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. USENIX Association, Berkeley, CA, USA. pp 2–2.
Rosen J, Polyzotis N, Borkar V, Bu Y, Carey MJ, Weimer M, Condie T, Ramakrishnan R (2013) Iterative mapreduce for large scale machine learning. arXiv preprint arXiv:1303.3517.
J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, no. 1, pp. 107–113, 2008.
Y. Freund and R.E. Schapire, “A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting,” J. Computer and System Science, vol. 55, no. 1, pp. 119-139, 1997.
J. Friedman, T. Hastie, and R. Tibshirani, “Additive Logistic Regression: A Statistical View of Boosting,” The Annals of Statistics, vol. 38, no. 2, pp. 337-407, 2000.
J. K. Bradley and R. E. Schapire, “Filterboost: Regressionand classification on large datasets,” in NIPS, 2007.
G. Escudero, L. M`arquez, and G. Rigau, “Boosting applied toe word sense disambiguation,” in ECML, 2000, pp. 129–141.
R. Busa-Fekete and B. K´egl, “Bandit-aided boosting,” in Proceedings of 2nd NIPS Workshop on Optimization for Machine Learning, 2009.
G. Wu, H. Li, X. Hu, Y. Bi, J. Zhang, and X. Wu, “Mrec4.5: C4.5 ensemble classification with map-reduce,” in ChinaGrid, Annual Conference, 2009, pp. 249–255.
B. Panda, J. Herbach, S. Basu, and R. J. Bayardo, “Planet: Massively parallel learning of tree ensembles with mapreduce,” PVLDB, vol. 2, no. 2, pp. 1426–1437, 2009.
A. Lazarevic and Z. Obradovic, “Boosting algorithms for parallel and distributed learning,” Distributed and Parallel Databases, vol. 11, no. 2, pp. 203–229, 2002.
W. Fan, S. J. Stolfo, and J. Zhang, “The application of adaboost for distributed, scalable and on-line learning,” in KDD, 1999, pp. 362–366.
S. Gambs, B. K´egl, and E. A¨ımeur, “Privacy-preserving boosting,” Data Min. Knowl. Discov., vol. 14, no. 1, pp. 131–170, 2007.
R. E. Schapire and Y. Singer, “Improved boosting algorithms using confidence-rated predictions,” Machine Learning, vol. 37, no. 3, pp. 297–336, 1999.

Index Terms

Computer Science

Information Sciences

Keywords

Breast cancer Big dataanalytics Classification Parallel Computing MapReduce SEER.