Resource
Managers
Resource
Managers
Resource
Managers
True multi-tenancy…
Many workloads: Streaming, Batch,
Interactive, …
Many users: Production Jobs, Ad-Hoc...
Resource
Managers
True multi-tenancy…
Many workloads: Streaming, Batch,
Interactive, …
Many users: Production Jobs, Ad-Hoc...
Example 1:
SQL / MapReduce
Fault tolerance
Elasticity
σ
π
⋈ ⋈⋈ ⋈⋈
Example 2:
Machine learning
Fault tolerance
Elasticity
Iterative computations
Example 3:
Graph processing
Fault tolerance
Elasticity
Iterative computations
Low latency communication
Silos
Silos are hard to build
Each duplicates the same mechanisms under
the hood
In practice, silos form pipelines
In each...
REEF
Breadth
Mechanism over Policy
Avoid silos
Recognize the need for different models
But: allow them to be composed.
Bri...
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resource...
Retaining
Evaluators
Handover of pre-partitioned and
parsed data between frameworks
Iterative computation
Interactive quer...
Wake: Events +
I/O
Event based programming and
remoting
API: A static subset of Rx
→ static checking of event flows
→ aggr...
Tang
Configuration is hard
Errors often show up at runtime only
State of receiving process is unknown to the
configuring p...
REEF Data Plane
Fault-tolerant communication
Group communication / shuffle
Low-latency communication
Storage & Checkpointi...
REEF Data Plane
Fault-tolerant communication
Group communication / shuffle
Low-latency communication
Storage & Checkpointi...
Objective
Function
Start with a random 𝑤0
Until convergence:
Step 1: Compute the gradient
𝜕 𝑤 =
𝑥,𝑦 𝜖 𝑋
2 𝑤, 𝑥 − 𝑦
Apply the gradient to the ...
On REEF
Driver requests Evaluators
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
Driver sends ComputeGradient and
master Tasks
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
Driver sends ComputeGradient and
master Tasks
C...
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
Driver sends ComputeGradient and
master Tasks
C...
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
Driver sends ComputeGradient and
master Tasks
C...
https://github.com/Microsoft-CISL
bgchun@gmail.com
http://www.reef-project.org/surf/
sears.russell@gmail.com
http://www.reef-project.org/grouper/
mweimer@microsoft.com
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
REEF: Towards a Big Data Stdlib
Upcoming SlideShare
Loading in...5
×

REEF: Towards a Big Data Stdlib

417

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
417
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "REEF: Towards a Big Data Stdlib"

  1. 1. Resource Managers
  2. 2. Resource Managers
  3. 3. Resource Managers True multi-tenancy… Many workloads: Streaming, Batch, Interactive, … Many users: Production Jobs, Ad-Hoc Jobs, Experiments, … …but, only for sophisticated apps
  4. 4. Resource Managers True multi-tenancy… Many workloads: Streaming, Batch, Interactive, … Many users: Production Jobs, Ad-Hoc Jobs, Experiments, … …but, only for sophisticated apps Fault tolerance Pre-emption Elasticity
  5. 5. Example 1: SQL / MapReduce Fault tolerance Elasticity σ π ⋈ ⋈⋈ ⋈⋈
  6. 6. Example 2: Machine learning Fault tolerance Elasticity Iterative computations
  7. 7. Example 3: Graph processing Fault tolerance Elasticity Iterative computations Low latency communication
  8. 8. Silos Silos are hard to build Each duplicates the same mechanisms under the hood In practice, silos form pipelines In each step: Read from and write to HDFS Synchronize on complete data between steps  Slow σ π ⋈
  9. 9. REEF Breadth Mechanism over Policy Avoid silos Recognize the need for different models But: allow them to be composed. Bridge the JVM/CLR/… divide Different parts of the computation can be in either of them. Resource Manager and DFS := Cluster OS REEF := stdlib
  10. 10. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ + 3 σ σ σ
  11. 11. Retaining Evaluators Handover of pre-partitioned and parsed data between frameworks Iterative computation Interactive queries $…
  12. 12. Wake: Events + I/O Event based programming and remoting API: A static subset of Rx → static checking of event flows → aggressive JVM event inlining Implementation: “SEDA++” → Global thread pool → Thread sharing where possible
  13. 13. Tang Configuration is hard Errors often show up at runtime only State of receiving process is unknown to the configuring process Our approach: Configuration as Dependency Injection Configuration here is pure data  Early static and dynamic checks Command = ‘ls’ Error: Unknown parameter “Command” Missing required parameter “cmd” cmd = ‘ls’ ShellTask Evaluator Error: Required instanceof Evaluator Got ShellTask Task YarnEvaluator Evaluator Error: container-4872364523847-02.stderr: NullPointerException at: java…eval():1234 ShellTask.helper():546 ShellTask.onNext():789 YarnEvaluator.onNext():12
  14. 14. REEF Data Plane Fault-tolerant communication Group communication / shuffle Low-latency communication Storage & Checkpointing
  15. 15. REEF Data Plane Fault-tolerant communication Group communication / shuffle Low-latency communication Storage & Checkpointing
  16. 16. Objective Function
  17. 17. Start with a random 𝑤0 Until convergence: Step 1: Compute the gradient 𝜕 𝑤 = 𝑥,𝑦 𝜖 𝑋 2 𝑤, 𝑥 − 𝑦 Apply the gradient to the model 𝑤𝑡+1 = 𝑤𝑡 − 𝜕 𝑤 Data parallel in X  Reduce Needed by Partitions  Broadcast
  18. 18. On REEF Driver requests Evaluators
  19. 19. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data
  20. 20. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks
  21. 21. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks Computation commences in sequence of Broadcast and Reduce
  22. 22. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks Computation commences in sequence of Broadcast and Reduce Failure: Node takes part in computation atomically
  23. 23. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks Computation commences in sequence of Broadcast and Reduce Failure: Node takes part in computation atomically Added node takes part in the next call
  24. 24. https://github.com/Microsoft-CISL
  25. 25. bgchun@gmail.com http://www.reef-project.org/surf/ sears.russell@gmail.com http://www.reef-project.org/grouper/
  26. 26. mweimer@microsoft.com

×