CFP last date
20 December 2024
Reseach Article

Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model

by Kiran M., Saikat Mukherjee, Ravi Prakash G.
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 79 - Number 5
Year of Publication: 2013
Authors: Kiran M., Saikat Mukherjee, Ravi Prakash G.
10.5120/13741-1551

Kiran M., Saikat Mukherjee, Ravi Prakash G. . Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model. International Journal of Computer Applications. 79, 5 ( October 2013), 51-59. DOI=10.5120/13741-1551

@article{ 10.5120/13741-1551,
author = { Kiran M., Saikat Mukherjee, Ravi Prakash G. },
title = { Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model },
journal = { International Journal of Computer Applications },
issue_date = { October 2013 },
volume = { 79 },
number = { 5 },
month = { October },
year = { 2013 },
issn = { 0975-8887 },
pages = { 51-59 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume79/number5/13741-1551/ },
doi = { 10.5120/13741-1551 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:52:15.977478+05:30
%A Kiran M.
%A Saikat Mukherjee
%A Ravi Prakash G.
%T Characterization of Randomized Shuffle and Sort Quantifiability in MapReduce Model
%J International Journal of Computer Applications
%@ 0975-8887
%V 79
%N 5
%P 51-59
%D 2013
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Quantifiability is a concept in MapReduce Analytics based on the following two conditions: (a) a mapper should be cautious, that is, should not exclude any reducer's shuffle and sort strategy from consideration; and (b) a mapper should respect the reducers' shuffle and sort preferences, that is, should deem a reducer's shuffle and sort strategy ki infinitely more likely than k'i if it premises the reducer to prefer ki to k'i. A shuffle and sort strategy is quantifiable if it can optimally be chosen under common shuffle and sort conjecture in the events (a) and (b). In this paper we present an algorithm that for every finite MapReduce operation computes the set of all quantifiable shuffle and sort strategies. The algorithm is based on the new idea of a key-value preference limitation, which is a pair (ki, Vi) consisting of a shuffle and sort strategy ki, and a subset of shuffle and sort strategies Vi, for mapper i. The interpretation is that mapper i prefers some shuffle and sort strategy in Vi to ki. The algorithm proceeds by successively adding key-value preference limitations to the MapReduce.

References
  1. Rajagopal Ananthanarayanan, Karan Gupta, Prashant Pandey, Himabindu Pucha, Prasenjit Sarkar, Mansi Shah, and Renu Tewari. Cloud analytics: Do we really need to reinvent the storage stack? In Proceedings of the 2009 Workshop on Hot Topics in Cloud Computing (HotCloud 09), San Diego, California, 2009.
  2. Mark Dredze, Alex Kulesza, and Koby Crammer. Multi-domain learning by confidence-weighted parameter combination. Machine Learning, 79:123–149, 2010.
  3. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating System Design and Implementation (OSDI 2006), pages 205–218, Seattle, Washington, 2006.
  4. Arthur Asuncion, Padhraic Smyth, and Max Welling. Asynchronous distributed learning of topic models. In Advances in Neural Information Processing Systems 21 (NIPS 2008), pages 81–88, Vancouver, British Columbia, Canada, 2008.
  5. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Proceedings of the First ACM Symposium on Cloud Computing (ACM SOCC 2010), Indianapolis, Indiana, 2010.
  6. Leon Bottou. Stochastic learning. In Olivier Bousquet and Ulrike von Luxburg, editors, Advanced Lectures on Machine Learning, Lecture Notes in Artificial Intelligence, LNAI 3176, pages 146-168. Springer Verlag, Berlin, 2004.
  7. Thorsten Brants and Peng Xu. Distributed Language models. Morgan & Claypool Publishers, 2010.
  8. Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swami Sivasubramanian, Peter Vosshall, and Werner Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP 2007), pages 205–220, Stevenson, Washington, 2007.
  9. Tony Hey, Stewart Tansley, and Kristin Tolle. The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington, 2009.
  10. Tony Hey, Stewart Tansley, and Kristin Tolle. Jim Gray on eScience: A transformed scientific method. In Tony Hey, Stewart Tansley, and Kristin Tolle, editors, The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington, 2009.
  11. Eric Brill, Jimmy Lin, Michele Banko, Susan Dumais, and Andrew Ng. Data-intensive question answering. In Proceedings of the Tenth Text REtrieval Conference (TREC 2001), pages 393–400, Gaithersburg, Maryland, 2001.
  12. Christopher Southan and Graham Cameron. Beyond the tsunami: Developing the infrastructure to deal with life sciences data. In Tony Hey, Stewart Tansley, and Kristin Tolle, editors, The Fourth Paradigm: Data-Intensive Scientific Discovery. Microsoft Research, Redmond, Washington, 2009.
  13. David Culler, Richard Karp, David Patterson, Abhijit Sahay, Klaus Erik Schauser, Eunice Santos, Ramesh Subramonian, and Thorsten von Eicken. LogP: Towards a realistic model of parallel computation. ACM SIGPLAN Notices, 28(7):1-12, 1993.
  14. Wittawat Tantisiriroj, Swapnil Patil, and Garth Gibson. Data-intensive file systems for Internet services: Arose by any other name…. Technical Report CMU-PDL-08-114,Parallel Data Laboratory, Carnegie Mellon University, 2008.
  15. Howard Karloff, Siddharth Suri, and Sergei Vassilvitskii. A model of computation for MapReduce. In Proceedings of the 21st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA 2010), Austin, Texas, 2010.
  16. Thomas Sandholm and Kevin Lai. MapReduce optimization using regulated dynamic prioritization. In Proceedings of the Eleventh International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '09), pages 299-310, Seattle, Washington, 2009.
  17. M. Mustafa Rafique, Benjamin Rose, Ali R. Butt, and Dimitrios S. Nikolopoulos. Supporting MapReduce on large-scale asymmetric multi-core clusters. ACM Operating Systems Review, 43(2):25–34, 2009.
  18. Jeffrey Dean and Sanjay Ghemawat. MapReduce: A flexible data processing tool. Communications of the ACM, 53(1):72–77, 2010.
  19. Jonathan Cohen. Graph twiddling in a MapReduce world. Computing in Science and Engineering, 11(4):29–41, 2009.
  20. Michael Stonebraker, Daniel Abadi, David J. DeWitt, Sam Madden, Erik Paulson, Andrew Pavlo, and Alexander Rasin. MapReduce and parallel DBMSs: Friends or foes? Communications of the ACM, 53(1): 64–71, 2010.
Index Terms

Computer Science
Information Sciences

Keywords

MapReduce analytics quantifiability key-value preference limitation shuffle and sort Totally Ordered Data-Intensive Systems