Multi-Tasking Map (MapReduce, Tasks in Rust)

3,614 views

Published on

Published in: Education, Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,614
On SlideShare
0
From Embeds
0
Number of Embeds
1,896
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Multi-Tasking Map (MapReduce, Tasks in Rust)

  1. 1. cs4414 Fall 2013 University of Virginia David EvansJodhpur, India (Dec 2011)
  2. 2. Plan for Today • Recap list map • Google’s MapReduce • Tasks in Rust • Multi-threaded map 26 September 2013 University of Virginia cs4414 1 PS2 is due Monday (30 Sept) at 8:59pm. Submission form will be posted later today, and include signup for scheduling your demo/review. All team members are expected to participate in the review, except in extreme circumstances.
  3. 3. 26 September 2013 University of Virginia cs4414 2 struct Node { head : int, tail : Option<~Node> } type List = Option<~Node> ; trait Map { fn mapr(&self, &fn(int) -> int) -> List; } impl Map for List { fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } } } You should understand everything in this code. Ask questions now if there is anything unclear.
  4. 4. Cost of Map 26 September 2013 University of Virginia cs4414 3 Core 1 What is the running time of p.map(f) using one core where p is a list of N elements and each evaluation of f(x) takes 1ms?
  5. 5. Cost of Multi-Core Map 26 September 2013 University of Virginia cs4414 4 Core 1 Core 3 Core 2 Core 4 What is the running time of p.map(f) using k cores where p is a list of N elements and each evaluation of f(x) takes 1ms?
  6. 6. How should we parallelize map? 26 September 2013 University of Virginia cs4414 5 fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } }
  7. 7. 26 September 2013 University of Virginia cs4414 6 “MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper. Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines.” OSDI 2004
  8. 8. Did Google invent map? 26 September 2013 University of Virginia cs4414 7
  9. 9. 8 John McCarthy 1927-2011
  10. 10. 26 September 2013 University of Virginia cs4414 9
  11. 11. 10 1955-1960: First “mass-produced” computer (sold 123 of them) 1 accumulator register (38 bits), 3 decrement registers (15 bit) Instructions had 3 bit opcode, 15 bit decrement, 15 bit address Magentic Core Memory 32,000 36-bit words 40,000 instructions/second
  12. 12. 11 John McCarthy playing chess with IBM 7090 (1967)
  13. 13. 26 September 2013 University of Virginia cs4414 12 fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } }
  14. 14. 26 September 2013 University of Virginia cs4414 13
  15. 15. 26 September 2013 University of Virginia cs4414 14 @ pointers (in 1960)
  16. 16. 26 September 2013 University of Virginia cs4414 15
  17. 17. MapReduce 26 September 2013 University of Virginia cs4414 16 Google’s map: fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } }
  18. 18. 26 September 2013 University of Virginia cs4414 17
  19. 19. 26 September 2013 University of Virginia cs4414 18 fn mapg<K1, V1, K2, V2>(List<Pair<K1, V1>>, f: &fn(K1, V1) -> (K2, V2)) -> List<Pair<K2, V2>> fn reduceg<K, V, R>(K, List<V>) -> List<R>
  20. 20. 26 September 2013 University of Virginia cs4414 19 fn mapg<K1, V1, K2, V2>(List<Pair<K1, V1>>, f: &fn(K1, V1) -> (K2, V2)) -> List<Pair<K2, V2>> fn reduceg<K, V, R>(K, List<V>) -> List<R> fn map_reduce<K1, V1, K2, V2, R>( List<Pair<K1, V2>>, mapf: &fn(K1, V1) -> (K2, V2)), reducef: &fn(K2, List<V2>) -> R)) -> List<R>
  21. 21. 26 September 2013 University of Virginia cs4414 20 fn map_reduce<K1, V1, K2, V2, R>( data: List<Pair<K1, V2>>, mapf: &fn(K1, V1) -> (K2, V2)), reducef: &fn(K2, List<V2>) -> R)) -> List<R> { }
  22. 22. 26 September 2013 University of Virginia cs4414 21 fn map_reduce<K1, V1, K2, V2, R>( data: List<Pair<K1, V2>>, mapf: &fn(K1, V1) -> (K2, V2)), reducef: &fn(K2, List<V2>) -> R)) -> List<R> { let ivalues = data.map(mapf) let mvalues = // merge ivalues by k2 mvalues.map(reducef) } Completing the code (with parallel map will finish today) is left as sticker-worthy exercise!
  23. 23. Mapping in Parallel 26 September 2013 University of Virginia cs4414 22
  24. 24. Processes, Threads, Tasks Process Originally: abstraction for owning the whole machine What do you need: 26 September 2013 University of Virginia cs4414 23 Thread (Illusion of) independent sequence of instructions What do you need:
  25. 25. Processes, Threads, Tasks Process Originally: abstraction for owning the whole machine What do you need: 26 September 2013 University of Virginia cs4414 24 Own program counter Own stack, registers Own memory space Own program counter Own stack, registers Shares memory space Thread (Illusion of) independent sequence of instructions What do you need:
  26. 26. Tasks in Rust 26 September 2013 University of Virginia cs4414 25
  27. 27. Tasks Own PC Own stack, registers Safely shared immutable memory Safely independent own memory 26 September 2013 University of Virginia cs4414 26 fn spawn(f: ~fn()) spawn( | | { println(“Get back to work!”); }); do spawn { println(“Get back to work!”); } syntactic sugar: Task = Thread – unsafe memory sharing or Task = Process + safe memory sharing – cost of OS process
  28. 28. 26 September 2013 University of Virginia cs4414 27 impl Map for List { fn mapr(&self, f: &fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { Some(~Node{ head: f(node.head), tail: node.tail.mapr(f) }) }, } } } Original single-threaded mapr fn spawn(f: ~fn())
  29. 29. 26 September 2013 University of Virginia cs4414 28 impl Map for List { fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { do spawn { f(node.head) } Some(~Node{ head: ?, tail: node.tail.mapr(f) }) }, } } } First attempt Cannot use node here!
  30. 30. 26 September 2013 University of Virginia cs4414 29 impl Map for List { fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { let val = node.head; do spawn { f(val) } Some(~Node{ head: ?, tail: node.tail.mapr(f) }) }, } } } How can we get results back from a spawned task without shared memory?
  31. 31. Channels 26 September 2013 University of Virginia cs4414 30 let (port, chan) : (Port<int>, Chan<int>) = stream(); let val = node.head; do spawn { chan.send(f(val)); } let newval = port.recv();
  32. 32. 26 September 2013 University of Virginia cs4414 31 Using streams to spawn is dangerous for salmon, but Rust saves you from (data) races with the bears!
  33. 33. 26 September 2013 University of Virginia cs4414 32 First attempt fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { let (port, chan) : (Port<int>, Chan<int>) = stream(); let newtail = node.tail.mapr(f); let val = node.head; do spawn { chan.send(f(val)); } Some(~Node{ head: port.recv(), tail: newtail }) } } } } Compiles are runs fine and produces correct output… but has a major bug!
  34. 34. 26 September 2013 University of Virginia cs4414 33 Now we’re spawning! fn mapr(&self, f: extern fn(int) -> int) -> List { match(*self) { None => None, Some(ref node) => { let (port, chan) : (Port<int>, Chan<int>) = stream(); let val = node.head; do spawn { chan.send(f(val)); } let newtail = node.tail.mapr(f); Some(~Node{ head: port.recv(), tail: newtail }) } } } }
  35. 35. 26 September 2013 University of Virginia cs4414 34 fn collatz_steps(n: int) -> int { if n == 1 { 0 } else { 1 + collatz_steps(if n % 2 == 0 { n / 2 } else { 3*n + 1 }) } } fn find_collatz(k: int) -> int { // Returns the minimum value, n, with Collatz stopping time >= k. let mut n = 1; while collatz_steps(n) < k { n += 1; } n } fn main() { let lst0 : List = Some(~Node{head: 400, tail: . Some(~Node{head : 410, tail: // … 16 total similar elements } ); println(lst0.to_str()); let lst1 = lst0.mapr(find_collatz); println(lst1.to_str()); let lst2 = lst1.mapr(find_collatz); println(lst2.to_str()); }
  36. 36. 26 September 2013 University of Virginia cs4414 35 When 350+% of your CPU isn’t fast enough, its time to buy a new computer!
  37. 37. 26 September 2013 University of Virginia cs4414 36
  38. 38. 26 September 2013 University of Virginia cs4414 37 Intel i7 Quad-Core Processor
  39. 39. 26 September 2013 University of Virginia cs4414 38 Intel i7 Quad-Core Processor Core Core Core Core Shared Memory Cache (L3 = 6MB) ~256KBL2 Cache(?)
  40. 40. Why so few? 26 September 2013 University of Virginia cs4414 39
  41. 41. 26 September 2013 University of Virginia cs4414 40 Portuguese, was beavering away in the library when ‘smoke suddenly started to come out’ of her computer. Fortunately, she removed the fire hazard from the library, averting disaster at the last moment. The student gave The Tab her version of the story: “I was in the library working at my computer when smoke suddenly started to come out of it. I freaked out for a second, trying to save my work onto my hard disk, but then I realised it was probably more important to take it out of the library. The Tab (Oxford), “Laptop Fire Almost Destroys College Library”
  42. 42. Where the Cores Are 26 September 2013 University of Virginia cs4414 41 nVIDIA GeForce GTX 650M 384 cores (but even harder for typical programs to use well than Intel’s cores)
  43. 43. How much faster will my Rust mapping program be on my new machine? 26 September 2013 University of Virginia cs4414 42 2013 MacBook Pro Intel i7-3740QM 2.7 GHz, 4 cores (8 threads) 6MB shared L3 cache 2011 MacBook Air Intel i5-2557M 1.7 GHz, 2 cores (4 threads) 3 MB shared L3 cache both support “hyperthreading” (two threads per core) 60 seconds (normalized time, running on 16- element list) ?
  44. 44. 26 September 2013 University of Virginia cs4414 43
  45. 45. 26 September 2013 University of Virginia cs4414 44 Submit your “guesses” and reasoning in course forum….hopefully I will know the actual answer by Tuesday! PS2 is due Monday (30 Sept) at 8:59pm. Submission form will be posted later today, and include signup for scheduling your demo/review. All team members are expected to participate in the review, except in extreme circumstances.
  46. 46. 26 September 2013 University of Virginia cs4414 45

×