REEF: Towards a Big Data Stdlib

508
-1

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
508
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

REEF: Towards a Big Data Stdlib

  1. 1. Resource Managers
  2. 2. Resource Managers
  3. 3. Resource Managers True multi-tenancy… Many workloads: Streaming, Batch, Interactive, … Many users: Production Jobs, Ad-Hoc Jobs, Experiments, … …but, only for sophisticated apps
  4. 4. Resource Managers True multi-tenancy… Many workloads: Streaming, Batch, Interactive, … Many users: Production Jobs, Ad-Hoc Jobs, Experiments, … …but, only for sophisticated apps Fault tolerance Pre-emption Elasticity
  5. 5. Example 1: SQL / MapReduce Fault tolerance Elasticity σ π ⋈ ⋈⋈ ⋈⋈
  6. 6. Example 2: Machine learning Fault tolerance Elasticity Iterative computations
  7. 7. Example 3: Graph processing Fault tolerance Elasticity Iterative computations Low latency communication
  8. 8. Silos Silos are hard to build Each duplicates the same mechanisms under the hood In practice, silos form pipelines In each step: Read from and write to HDFS Synchronize on complete data between steps  Slow σ π ⋈
  9. 9. REEF Breadth Mechanism over Policy Avoid silos Recognize the need for different models But: allow them to be composed. Bridge the JVM/CLR/… divide Different parts of the computation can be in either of them. Resource Manager and DFS := Cluster OS REEF := stdlib
  10. 10. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ + 3 σ σ σ
  11. 11. Retaining Evaluators Handover of pre-partitioned and parsed data between frameworks Iterative computation Interactive queries $…
  12. 12. Wake: Events + I/O Event based programming and remoting API: A static subset of Rx → static checking of event flows → aggressive JVM event inlining Implementation: “SEDA++” → Global thread pool → Thread sharing where possible
  13. 13. Tang Configuration is hard Errors often show up at runtime only State of receiving process is unknown to the configuring process Our approach: Configuration as Dependency Injection Configuration here is pure data  Early static and dynamic checks Command = ‘ls’ Error: Unknown parameter “Command” Missing required parameter “cmd” cmd = ‘ls’ ShellTask Evaluator Error: Required instanceof Evaluator Got ShellTask Task YarnEvaluator Evaluator Error: container-4872364523847-02.stderr: NullPointerException at: java…eval():1234 ShellTask.helper():546 ShellTask.onNext():789 YarnEvaluator.onNext():12
  14. 14. REEF Data Plane Fault-tolerant communication Group communication / shuffle Low-latency communication Storage & Checkpointing
  15. 15. REEF Data Plane Fault-tolerant communication Group communication / shuffle Low-latency communication Storage & Checkpointing
  16. 16. Objective Function
  17. 17. Start with a random 𝑤0 Until convergence: Step 1: Compute the gradient 𝜕 𝑤 = 𝑥,𝑦 𝜖 𝑋 2 𝑤, 𝑥 − 𝑦 Apply the gradient to the model 𝑤𝑡+1 = 𝑤𝑡 − 𝜕 𝑤 Data parallel in X  Reduce Needed by Partitions  Broadcast
  18. 18. On REEF Driver requests Evaluators
  19. 19. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data
  20. 20. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks
  21. 21. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks Computation commences in sequence of Broadcast and Reduce
  22. 22. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks Computation commences in sequence of Broadcast and Reduce Failure: Node takes part in computation atomically
  23. 23. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks Computation commences in sequence of Broadcast and Reduce Failure: Node takes part in computation atomically Added node takes part in the next call
  24. 24. https://github.com/Microsoft-CISL
  25. 25. bgchun@gmail.com http://www.reef-project.org/surf/ sears.russell@gmail.com http://www.reef-project.org/grouper/
  26. 26. mweimer@microsoft.com

×