Abstract
Large-scale MapReduce clusters that routinely process big data bring challenges to the cloud computing. One of the key challenges is to reduce the response time of these MapReduce clusters by minimizing their makespans. It is observed that the order in which these jobs are executed can have a significant impact on their overall makespans and resource utilization. In this work, we consider a scheduling model for multiple MapReduce jobs. The goal is to design a job scheduler that minimizes the makespan of such a set of MapReduce jobs. We exploit classical Johnson model and propose a novel framework HScheduler, which combines features of both classical Johnson’s algorithm and MapReduce to minimize the makespan for both offline and online jobs. Our Offline HScheduler reaches the theoretical lower bound (optimum) and Online HScheduler is 2-competitive which is the best-known constant ratio for minimizing the makespan. Through extensive real data tests, we find that HScheduler has better performance than the best-known approach by 10.6–11.7 % on average for offline scheduling and 8–10 % on average for online scheduling. The HScheduler can be applied to improve responsive time, throughput and energy efficiency in cloud computing.












Similar content being viewed by others
References
Verma A, Cherkasova L, Campbell RH (2013) Orchestrating an ensemble of MapReduce jobs for minimizing their makespan. IEEE Trans Depend Secure Comput (online version)
Capacity Scheduler Guide. Available http://hadoop.apache.org/common/docs/r0.20.1/capacity~scheduler.html
Zaharia M, Borthakur D, Sen Sarma J, Elmeleegy K, Shenker S, Stoica I (2010) Delay scheduling: a simple technique for achieving locality and fairness in cluster schedul- ing. In: Proceeding of EuroSystem, ACM, pp 265–278
Wolf J et al (2010) FLEX: a slot allocation scheduling optimizer for MapReduce Workloads. In: ACM/IFIP/USENIX international middleware conference, Lecture Notes in Computer Science, vol 6452, pp 1–20
Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the ICAC, Germany, pp 235–244
Verma A, Cherkasova L, Campbell RH (2011) Play it again, SimMR! In: Proceedings of the international IEEE Cluster’2011, IEEE Computer Society Washington, DC, USA, pp 253–261
Zhu Y, Jiang Y, Wu W, Ding L, Teredesai A, Li D, Lee W (2014) Minimizing makespan and total completion time in MapReduce-like systems. In: Proceedings of INFOCOM, Toronto, ON, pp 2166–2174, 27 April 2014–2 May 2014
Herodotou H, Babu S (2011) Profiling, what-if analysis, and cost based optimization of MapReduce programs. In: Proceedings of the VLDB Endowment 4(11):1111–1122
Moseley B, Dasgupta A, Kumar R, Sarl T (2011) On scheduling in map-reduce and flow-shops. In: Proceedings of SPAA, ACM New York, NY, pp 289–298
Verma A, Cherkasova L, Campbell RH (2012) Two sides of a coin: optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance. In: MASCOTS, IEEE Computer Society, pp 11–18
Zheng Y, Shroff NB, Sinha P (2013) A new analytical technique for designing provably efficient MapReduce schedulers. In: The Proceedings of INFOCOM, Turin, pp 1600–1608, 14–19 April 2013
Johnson S (1954) Optimal two-and three-stage production schedules with setup times included. Naval Res Log Q
Garey M, Johnson D (1979) Computers and intractability: a guide to the theory of NP-completeness. WH Freeman & Co, New York
Acknowledgments
This research is partially supported by China National Science Foundation (CNSF) with project ID 61450110440 and Sichuan Province Technology Plan (ID 2016GZ0322); Chongqing Research Program of Basic Research and Frontier Technology (ID cstc2015jcyjB0244). Prof. Wenhong Tian finished most of this work when he was a visiting fellow at CLOUDS lab led by Prof. Rajkumar Buyya at the University of Melbourne, Australia. The author thanks team members in CLOUDS for their comments to polish the manuscript.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research is sponsored by the Natural Science Foundation of China (NSFC) Grant 61450110440.
Rights and permissions
About this article
Cite this article
Tian, W., Li, G., Yang, W. et al. HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs. J Supercomput 72, 2376–2393 (2016). https://doi.org/10.1007/s11227-016-1737-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1737-4