Optimal Execution Of MapReduce Jobs In Cloud Anshul Aggarwal, Software Engineer, Cisco Systems Session Length: 1 Hour Tue March 10 21:30 PST Wed March 11 0:30 EST Wed March 11 4:30:00 UTC Wed March 11 10:00 IST Wed March 11 15:30 Sydney Voices 2015 www.globaltechwomen.com We use MapReduce programming paradigm because it lends itself well to most data-intensive analytics jobs run on cloud these days, given its ability to scale-out and leverage several machines to parallel process data. Research has demonstrates that existing approaches to provisioning other applications in the cloud are not immediately relevant to MapReduce -based applications. Provisioning a MapReduce job entails requesting optimum number of resource sets (RS) and configuring MapReduce parameters such that each resource set is maximally utilized. Each application has a different bottleneck resource (CPU :Disk :Network), and different bottleneck resource utilization, and thus needs to pick a different combination of these parameters based on the job profile such that the bottleneck resource is maximally utilized. The problem at hand is thus defining a resource provisioning framework for MapReduce jobs running in a cloud keeping in mind performance goals such as Optimal resource utilization with Minimum incurred cost, Lower execution time, Energy Awareness, Automatic handling of node failure and Highly scalable solution.