OS-Assisted Task Preemption for Hadoop

342 views

Published on

his work introduces a new task preemption primitive for Hadoop, that allows tasks to be suspended and resumed exploiting existing memory management mechanisms readily available in modern operating systems. Our technique fills the gap that exists between the two extremes cases of killing tasks (which waste work) or waiting for their completion (which introduces latency): experimental results indicate superior performance and very small overheads when compared to existing alternatives.

Published in: Science
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
342
On SlideShare
0
From Embeds
0
Number of Embeds
17
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OS-Assisted Task Preemption for Hadoop

  1. 1. . ...... OS-Assisted Task Preemption for Hadoop Mario Pastorelli, Matteo Dell’Amico, Pietro Michiardi EURECOM, France DCPerf 2014 Madrid, 30 June 2014 1
  2. 2. Outline ...1 Why Task Preemption On Hadoop ...2 Our Approach ...3 Experiments 2
  3. 3. Why Task Preemption On Hadoop Outline ...1 Why Task Preemption On Hadoop ...2 Our Approach ...3 Experiments 3
  4. 4. Why Task Preemption On Hadoop Data-Intensive Scalable Computing & Hadoop Hadoop MapReduce Bring the computation to the data – split in blocks across a cluster . Map .. ...... One task per block Hadoop filesystem (HDFS): typically, 64–512 MB Stores locally key-value pairs e.g., for word count: [(red, 15) , (green, 7) , . . .] 4
  5. 5. Why Task Preemption On Hadoop Data-Intensive Scalable Computing & Hadoop Hadoop MapReduce Bring the computation to the data – split in blocks across a cluster . Map .. ...... One task per block Hadoop filesystem (HDFS): typically, 64–512 MB Stores locally key-value pairs e.g., for word count: [(red, 15) , (green, 7) , . . .] . Reduce .. ...... # of tasks set by the programmer Mapper output is partitioned by key and pulled from “mappers” The Reduce function operates on all values for a single key e.g., (green, [7, 42, 13, . . .]) 4
  6. 6. Why Task Preemption On Hadoop Why You Need Preemption High-Priority Tasks MapReduce jobs are made of several tasks we will focus on the task granularity Sometimes you have high priority tasks humans waiting for the results high-value computations Some tasks may take very long errors in implementation simply, a lot of computation Solution: preempt low-priority tasks and give the resources they are using to high-priority ones 5
  7. 7. Why Task Preemption On Hadoop Why You Need Preemption Preemptive Scheduling Priority can be decided by a scheduler fairness: guarantee that no user can “cheat the system” [Zaharia et al., EuroSys 2010] deadline scheduling: ensure jobs are completed by a due date [Kc and Anyanwu, CloudCom 2010] optimize response time: let small jobs pass in front [Wolf et al., Middleware 2010; Pastorelli et al., BIGDATA 2013] 6
  8. 8. Why Task Preemption On Hadoop Why You Need Preemption In Hadoop, Now Currently, Hadoop can only preempt tasks by killing them waste work …or you just wait for them to finish introduce latencies We want to do better! 7
  9. 9. Our Approach Outline ...1 Why Task Preemption On Hadoop ...2 Our Approach ...3 Experiments 8
  10. 10. Our Approach Delegating To the OS Delegating To the OS Hadoop tasks are standard POSIX processes they communicate through POSIX signals We use the same strategy: SIGTSTP, SIGCONT Our implementation mirrors the one for killing tasks in Hadoop SIGTSTP takes the place of SIGTERM The state of the computation is implicitly saved by the OS will be paged to disk if necessary 9
  11. 11. Our Approach The OS and Paging The OS and Paging Memory is occupied by running processes and file system cache When it is full, pages are evicted from memory Least Recently Used-like policy Prioritizing clean pages (not modified after reading) don’t need page out Page out operations are clustered to improve throughput disk seeks are amortized Thrashing: when the working set (memory used by running programs) is larger than memory 10
  12. 12. Our Approach The OS and Paging OS and Paging In Our Case In Hadoop, a best practice is to configure the OS to prioritize running processes over disk cache Hadoop reads in streams, so cache is not important This minimizes paging out Paging out is done efficiently close to maximum disk speed No Trashing! suspended tasks are not in the working set 11
  13. 13. Experiments Outline ...1 Why Task Preemption On Hadoop ...2 Our Approach ...3 Experiments 12
  14. 14. Experiments Settings Experimental Settings . tl, th: tasks with low and high priority Synthetic tasks parsing randomly generated data 512MB blocks We vary the arrival time of th 13
  15. 15. Experiments Results Standard Case 10 20 30 40 50 60 70 80 90 tl progress at launch of th (%) 80 90 100 110 120 130 140 150 sojourntimeth(s) wait kill susp 10 20 30 40 50 60 70 80 90 tl progress at launch of th (%) 170 180 190 200 210 220 230 240 makespan(s) wait kill susp 14
  16. 16. Experiments Results Worst Case 10 20 30 40 50 60 70 80 90 tl progress at launch of th (%) 80 90 100 110 120 130 140 150 sojourntimeth(s) wait kill susp 10 20 30 40 50 60 70 80 90 tl progress at launch of th (%) 170 180 190 200 210 220 230 240 250 makespan(s) wait kill susp Each job allocates 2GB of memory it’s a lot, requires modifying the Hadoop configuration 15
  17. 17. Experiments Results Overheads Due To Memory Usage 0 625 MB 1.25 GB 1.875 GB 2.5 GB memory allocated by th 200 400 600 800 1000 1200 1400 1600 pagedbytes(MB) 0 5 10 15 20 25 overhead(s) swap makespan th sojourn time 16
  18. 18. Experiments Discussion Another Approach: Natjam Natjam [Cho et al., SoCC 2013] works at the application layer Requires explicit handling by the application: currently works for stateless MapReduce programs proposes hooks for serialization/deserialization to deal with state 17
  19. 19. Experiments Discussion Another Approach: Natjam Natjam [Cho et al., SoCC 2013] works at the application layer Requires explicit handling by the application: currently works for stateless MapReduce programs proposes hooks for serialization/deserialization to deal with state . Pro .. ...... Might compress the amount of data written to disk . Con .. ...... Would always, pessimistically, write to disk Requires serialization/deserialization overhead The two approaches can be both available to a scheduler 17
  20. 20. Experiments Discussion Resume Locality You can resume only locally suspended tasks process migration could be implemented, but it would be expensive …or you could just restart the task from scratch Delay scheduling [Zaharia et al., EuroSys 2010]: wait until a threshold before scheduling non-local work can be done also here 18
  21. 21. Experiments Discussion Implications On Scheduling To optimize wall time, suspend tasks that are closest to completion avoid stragglers (late tasks) as much as possible To avoid redundant work, suspend tasks with smaller memory footprint avoid swapping overheads 19
  22. 22. Experiments Discussion Implementing Suspension-Friendly Tasks . Controlling Memory Footprint .. ...... It could be worth to optimize for using less memory Hint the garbage collector to run on suspension Use garbage collectors that do deallocate RAM . External State .. ...... Some tasks interact with the outside world Suspension should be handled correctly, but probably needs testing 20
  23. 23. Conclusion Take-Home Messages Take-Home Messages Task preemption is important for Hadoop scheduling priorities, fairness, deadlines, size-based schedulers, … We do not need to reinvent the wheel OSes have been suspending processes for many years they do it well, let’s just use them! Swapping is not bad per se Hadoop mechanisms keep the working set under control and avoid thrashing 21

×