CFP last date
20 January 2025
Reseach Article

Boosting the Performance of MapReduce by Better Resource Utilization in Cluster

by Pooja Malikwade, S.B.Jadhav
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 112 - Number 16
Year of Publication: 2015
Authors: Pooja Malikwade, S.B.Jadhav
10.5120/19753-1535

Pooja Malikwade, S.B.Jadhav . Boosting the Performance of MapReduce by Better Resource Utilization in Cluster. International Journal of Computer Applications. 112, 16 ( February 2015), 29-33. DOI=10.5120/19753-1535

@article{ 10.5120/19753-1535,
author = { Pooja Malikwade, S.B.Jadhav },
title = { Boosting the Performance of MapReduce by Better Resource Utilization in Cluster },
journal = { International Journal of Computer Applications },
issue_date = { February 2015 },
volume = { 112 },
number = { 16 },
month = { February },
year = { 2015 },
issn = { 0975-8887 },
pages = { 29-33 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume112/number16/19753-1535/ },
doi = { 10.5120/19753-1535 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T22:49:40.737786+05:30
%A Pooja Malikwade
%A S.B.Jadhav
%T Boosting the Performance of MapReduce by Better Resource Utilization in Cluster
%J International Journal of Computer Applications
%@ 0975-8887
%V 112
%N 16
%P 29-33
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

MapReduce implementations are being used for processing large data sets. MapReduce performs parallel computations to speed up the job processing. When performing parallel computations the skew that arises due large indivisible records or uneven distribution of data slows down the job execution process and lowers the cluster throughput. We provide a solution, by proposing an automatic system that handles skew which is compatible with MapReduce framework and is transparent to users. The proposed system makes use of idle resources in the cluster for skew handing. Task repartitioning method is implemented for the purpose of skew handling. The output order is maintained even after task repartitioning. The proposed system requires no extra input from the users and imposes minimum overhead in the absence of skew.

References
  1. J. Dean and S. Ghemawat, “Mapreduce: simplified data processing on large clusters,” Commun. ACM, vol. 51, pp. 107–113, January 2008.
  2. K. Ren, Y. Kwon, M. Balazinska, and B. Howe, “Hadoops adolescence: A comparative workload analysis from three research clusters,” in Proceedings of IEEE 8th International Conference on e-Business Engineering, ser. ICEBE’2011, 2011.
  3. “Apache hadoop, http://hadoop.apache.org/.”
  4. M. Isard, M. Budiu, Y. Yu, A. Birrell, and D. Fetterly, “Dryad: distributed data-parallel programs from sequential building blocks,” in Proc.of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007, ser. EuroSys ’07, 2007.
  5. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving mapreduce performance in heterogeneous environments,” in Proc. of the 8th USENIX conference on Operating systems design and implementation, ser. OSDI’08, 2008.
  6. G. Ananthanarayanan, S. Kandula, A. Greenberg, I. Stoica, Y. Lu, B. Saha, and E. Harris, “Reining in the outliers in map-reduce clusters using mantri,” in Proc. of the 9th USENIX conference on Operating systems design and implementation, ser. OSDI’10, 2010.
  7. Q. Chen, C. Liu, and Z. Xiao, “Improving mapreduce performance using smart speculative execution strategy,” IEEE Transactions on Computers, vol. 99, no. PrePrints, p. 1, 2013.
  8. Z. Guo, M. Pierce, G. Fox, and M. Zhou, “Automatic task re-organization in mapreduce,” in Proceedings of the 2011 IEEE International Conference on Cluster Computing, ser. CLUSTER ’11. Washington, DC, USA: IEEE Computer Society, 2011, pp. 335–343.
  9. K. Morton, A. Friesen, M. Balazinska, and D. Grossman. Estimating the progress of MapReduce pipelines. In Proc. of the 26nd ICDE Conf., Mar. 2010.
  10. R. Chaiken, B. Jenkins, P.-A. Larson, B. Ramsey, D. Shakib, S. Weaver, and J. Zhou, “Scope: easy and efficient parallel processing of massive data sets,” Proc. VLDB Endow., vol. 1, pp. 1265–1276, August 2008.
  11. X. Pan, J. Tan, S. Kavulya, R. Gandhi, and P. Narasimhan, “Ganesha: blackbox diagnosis of mapreduce systems,” SIGMETRICS Perform. Eval. Rev., vol. 37, pp. 8–13, January 2010.
  12. H.-c. Yang, A. Dasdan, R.-L. Hsiao, and D. S. Parker, “Map-reducemerge: simplified relational data processing on large clusters,” in Proc. of the 2007 ACM SIGMOD international conference on Management of data, ser. SIGMOD ’07, 2007.
  13. M. C. Schatz. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics, 25(11):1363{1369, June 2009.
  14. M. Shah, J. Hellerstein, and E. Brewer. Highly-available, fault-tolerant, parallel dataows. In Proc. of the SIGMOD Conf., June 2004.
Index Terms

Computer Science
Information Sciences

Keywords

Data skew MapReduce parallel database systems performance gain skew handling