CFP last date
20 January 2025
Reseach Article

A Survey on Big Data Management and Job Scheduling

by Sreedhar C., N. Kasiviswanath, P. Chenna Reddy
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 130 - Number 13
Year of Publication: 2015
Authors: Sreedhar C., N. Kasiviswanath, P. Chenna Reddy
10.5120/ijca2015907161

Sreedhar C., N. Kasiviswanath, P. Chenna Reddy . A Survey on Big Data Management and Job Scheduling. International Journal of Computer Applications. 130, 13 ( November 2015), 41-49. DOI=10.5120/ijca2015907161

@article{ 10.5120/ijca2015907161,
author = { Sreedhar C., N. Kasiviswanath, P. Chenna Reddy },
title = { A Survey on Big Data Management and Job Scheduling },
journal = { International Journal of Computer Applications },
issue_date = { November 2015 },
volume = { 130 },
number = { 13 },
month = { November },
year = { 2015 },
issn = { 0975-8887 },
pages = { 41-49 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume130/number13/23272-2015907161/ },
doi = { 10.5120/ijca2015907161 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:25:29.581021+05:30
%A Sreedhar C.
%A N. Kasiviswanath
%A P. Chenna Reddy
%T A Survey on Big Data Management and Job Scheduling
%J International Journal of Computer Applications
%@ 0975-8887
%V 130
%N 13
%P 41-49
%D 2015
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Big data has gained its popularity in the recent years due to the fact that there is a need for sophisticated method to collect, process, analyze and visualize huge volumes of data generated by our digital and computing world. Several challenges in handling petabytes of information, commonly named as Big data needs to be addressed in more efficient way. Big data management (BDM) is the process of collecting, storing, analysing and visualization of large volumes of data, which can be in the form of structured, unstructured and semi-structured formats. Problems such as data acquisition, data storage, data retrieval, data analysis, and data visualization can no longer be handled by traditional database systems. The primary purpose of this paper is to provide a comprehensive survey on Big data management and to provide an overview on various algorithms related to job scheduling in Hadoop and the latest advancements. These research directions can lead to exploration of Big data domain and result in development of optimal techniques and scheduling algorithms to address problems faced in Big data.

References
  1. Saeed Shahrivari and Saeed Jalili, “Beyond Batch Processing: Towards Real-Time and Streaming Big Data,” in Computers 2014, pp. 117 – 129, doi: 10.3390/computers3040117.
  2. J. F.N Afrati, V. Borkar, M. Carey, N. Polyzotis, and J. D. Ullman, “Map-reduce extensions and recursive queries,” in 14th International Conference on Extending Database Technology, 2011, pp. 1–8.
  3. J. Dean and S. Ghemawat, “MapReduce: Simplified data processing on large clusters,” in 6th USENIX Symp. Oper. Syst. Des. Implementation, 2004, pp. 137-150.
  4. Hadoop. (2014) [Online]. Available: http://hadoop.apache.org/.
  5. L. Neumeyer, B. Robbins, A. Nair, and A. Kesari, “S4: Distributed Stream Computing Platform,” in Proceedings of IEEE International Conference on Data Mining Workshops (ICDMW), 2010, pp. 170–177.
  6. Yang XIA, Lei WANG, Qiang ZHAO and Gongxuan ZHANG, “Research on Job Scheduling Algorithm in Hadoop,” in Journal of Computational Information Systems 7:16, pp. 5769 – 5775, December 2011.
  7. Capacity Scheduler for Hadoop [EB/OL]. http://Hadoop. apache.org/common/docs/current/Capacity_scheduler. .html, 2010-03-22
  8. Matei Zaharia, Dhruba Borthakur and Joydeep Sen Sarma, “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling,” in EuroSys’10 ACM International Conference, Apr 13 – 16, 2010.
  9. Willis Lang and Jignesh M. Patel, “Energy Management for MapReduce Clusters,” in International Conference on Very Large DataBases, Proceedings of the VLDB Endowment, Vol. 3 No.1, September 2010.
  10. A. Thusoo , “Hive: A Petabyte Scale Data Warehouse Using Hadoop,” in Proc. ICDE, pp. 996-1005, 2010.
  11. M. J. Fischer, X. Su, and Y. Yin, “Assigning tasks for efficiency, hadoop: extended abstract”, in Proc., SPAA, 2010, pp. 30-39 .
  12. A. Thusoo, “Data Warehousing and Analytics Infrastructure at Facebook”, in Proc., ICDE, pp. 1013-1020, 2010.
  13. D. Logothetis, Statefull Bulk Processing for Incremental Analytics,” in Proc.,SOCC, pp.51-62, 2010.
  14. A. Verma, L. Cherkasova, and R. H. Campbell, “Aria: automatic resource, inference and allocation for mapreduce environments,” in Proc., 8th ACM international conference on Autonomic computing, ICAC’11, USA: ACM, 2011, pp. 235-244. [Online]. Available: http://doi.acm.org/10.1145/1998582.1998637.
  15. J. Polo, D. Carrera, Y. Becerra, M. Steinder and I. Whalley, “Performance-driven task co-scheduling for mapreduce environments,” in Proc., NOMS, 2010, pp. 373-380.
  16. Dawei Jiang, Beng Chin Ooi, Lei Shi and Sai Wu, “The Performance of MapReduce: An In-depth Study,” in Proc., VLDB Endowment, Vol. 3, No. 1, 2010, pp.472-483.
  17. J. Dean and S. Ghemawat, “MapReduce: A flexible data processing tool,” in Proc ACM, 53(1), 2010, pp. 72-77.
  18. S. Babu, “Towards automatic optimization of mapreduce programs,” in proc., SoCC, ACM, 2010, pp. 137-142.
  19. Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, Yun Tian, James Majors, Adam Manzanares and Xiao Qin, “ Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters,” in Proc., 19th International Heterogeneity in Computing Workshop, Atlanta, Georgia,2010.
  20. Bikash Sharma, Ramya Prabhakar, Seung-Hwan Lim, Mahmut T. Kandemir and Chita R. Das, “MROrchestrator: A Fine-Grained Resource Orchestration Framework for Hadoop MaReduce,” Technical Report, CSE-12-001, January 2012.
  21. Jorda Polo, Claris Castillo, David Carrera, Yolanda Becerra, Ian Whalley, Malgorzata Steinder, Jordi Torres, Eduard Ayguade, “Resource-Aware Adaptive Scheduling for MapReduce Clusters,” in LNCS Vol. 7049, Springer, 2011, pp 187-207.
  22. Ponemon Institue, “2013 Cost of Data Breach Study: Global Analysis,” May 2013.
  23. J. Dean, and S. Ghemawat, “MapReduce: simplified data processing on large clusters,” ACM, vol. 51, no. 1, pp. 107-113, 2008.
  24. T. White, Hadoop: The Definitive Guide: O'Reilly Media, 2009.
  25. Changqing Ji, Yu Li, Wenming Qiu, Uchechukwu Awada, Keqiu Li,” Big Data Processing in Cloud Computing Environments”, 2012 International Symposium on Pervasive Systems, Algorithms and Networks.
  26. Matei Zaharia, Dhruba Borthakur and Joydeep Sen Sarma, “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”, in proc., EuroSys’10, ACM, 2010.
  27. Xueying Jiang, Zhongyao Li and Yang Yang, “Implementation of a Hadoop platform scheduling algorithm based on a genetic algorithm,” WIT Transactions on Information and Communication Technologies, Vol. 55 , pp. 595 – 605, 2013.
  28. M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, and I. Stoica, “Improving mapreduce performance in heterogeneous Environment,” in proc., IEEE 10th Internation Conference on CIT, pp. 2736-2743, 2010.
  29. H. Moed, “The Evolution of Big Data as a Research and Scientific Topic: Overview of the Literature,” 2012, Research Trends, http://www.researchtrends.com.
  30. S. S. Kaisler, F. Armour, J. A. Espinosa, andW.Money, “Big data: issues and challenges moving forward,” in Proceedings of the IEEE 46th Annual Hawaii International Conference on System Sciences (HICSS ’13), pp. 995–1004, January 2013.
  31. R. Cumbley and P. Church, “Is “Big Data” creepy?,” Computer Law and Security Review, vol. 29, no. 5, pp. 601–609, 2013.
  32. J. M. Wing, “Computational thinking and thinking about computing,” Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, vol. 366, no. 1881, pp. 3717–3725, 2008.
  33. S. Hendrickson, “Getting Started with Hadoop with Amazon’s Elastic MapReduce,” EMR, 2010.
  34. J.Mervis, “Agencies rally to tackle big data,” Science, vol. 336, no. 6077, p. 22, 2012.
  35. S. Sagiroglu and D. Sinanc, “Big data: a review,” in Proceedings of the International Conference on Collaboration Technologies and Systems (CTS ’13), pp. 42–47, IEEE, San Diego, Calif, USA,May 2013.
  36. J. Manyika, C. Michael, B. Brown et al., “Big data: The next frontier for innovation, competition, and productivity,” Tech. Rep., Mc Kinsey, May 2011.
  37. J. Manyika, M. Chui, B. Brown et al., “Big data: the next frontier for innovation, competition, and productivity,” McKinsey Global Institute, 2011.
  38. M. Chen, S. Mao, and Y. Liu, “Big data: a survey,” Mobile Networks and Applications, vol. 19, no. 2, pp. 171–209, 2014.
  39. Zikopoulos PC, Eaton C, DeRoos D, Deutsch T, Lapis G. “Understanding big data,” New York et al: McGraw-Hill, 2012.
  40. Bell G, Hey T, Szalay A. “Beyond the data deluge,” Science, 2009, 323(5919): 1297-1298.
  41. Narasimhaiah Gorla and Kang Zhang, “Deriving Program Physical Structures using Bond Energy algorithm,” in Proceeding 6th Asia Pacific Software Engineering Conference,pp-359,1999.
  42. Dong Yuan, Yun Yang, Xiao Liu, Jinjun Chen, “A Data Placement Strategy in Scientific cloud workflows,” pp-1200-1214, 2010.
  43. Xie Jiong, Yin Shu, Ruan Xiaojun, Ding Zhiyang, Tian Yun, “Improving Mapreduce performance through data placements in heterogeneous hadoop cluster,” 2010.
  44. Huang Lu, Hu Ting-ting and Chen Hai-shan, “Research on Hadoop Cloud Computing Model and its Applications,” 2012 Third International Conference on Networking and Distributed Computing.
  45. D. Jiang, B. C. Ooi, L. Shi, and S. Wu “The performance of MapReduce: an in-depth study,” VLDB Endowment, 3(1-2):472–483, Sept. 2010.
  46. S. Babu “Towards automatic optimization of MapReduce programs,” in IEEE SoCC , pages 137– 142, Indianapolis, June 2010.
  47. H. Herodotou, H. Lim, G. Luo, N. Borisov, L. Dong, F. B. Cetin, and S. Babu, “Starfish: A Self- tuning System for Big Data Analytics,” in CIDR, pages 261–272, Asilomar, CA, Jan. 2011.
  48. Vavilapalli, A. Murthy, C. Douglas, S. Agarwal, M. Konar, R. Evans, T. Graves, J. Lowe, H. Shah, S. Seth, B. Saha, C. Curino, O. O’Malley, S. Radia, B. Reed, and E. Baldeschwieler, “Apache Hadoop YARN: Yet Another Resource Negotiator,” in IEEE SOCC, pages 5:1–5:16, Santa Clara, CA, Oct. 2013.
  49. A. Verma, L. Cherkasova, and R. H. Campbell, “Play it Again, SimMR!,” in IEEE CLUSTER, pages 253–261, Austin, TX, Sept. 2011.
  50. S. Hammoud, M. Li, Y. Liu, N. K. Alham, and Z. Liu, “MRSim: A discrete event based MapReduce simulator,” in Fuzzy Systems and Knowledge Discovery, pages 2993–2997, Yantai, China, Aug. 2010.
  51. F. Teng, L. Yu, and F. Magoulaas, “SimMapReduce: A simulator for modeling MapReduce frame-work,” in FTRA Mobile and Ubiquitous Engineering, pages 277–282, Crete, Greece, June 2011.
  52. Liu, M. Li, N. K. Alham, and S. Hammoud, “HSim: A MapReduce Simulator in Enabling Cloud Computing,” in Future Gener. Comput. Syst. 29(1):300–308, Jan. 2013.
Index Terms

Computer Science
Information Sciences

Keywords

Big data Big data management Job Scheduling Hadoop MapReduce.