SlideShare a Scribd company logo
a library to simplify and unify the lower layers
of big data systems on modern resource managers.
• One cluster used by all workloads
(interactive, batch, streaming, …)
• Resources are handed out as
containers
Container is slice of a machine
Fixed RAM, CPU, I/O, …
• Examples:
Apache Hadoop YARN
Apache Mesos
Google Borg
Resource
Managers
Resource
Managers
Enable true multi-tenancy…
Many workloads:
Streaming, Batch, Interactive…
Many users:
Production, Ad-Hoc, Experiments…
…but, only for sophisticated apps
Resource
Managers
Enable true multi-tenancy…
Many workloads:
Streaming, Batch, Interactive…
Many users:
Production, Ad-Hoc, Experiments…
…but, only for sophisticated apps
Fault tolerance
Resource
Managers
Enable true multi-tenancy…
Many workloads:
Streaming, Batch, Interactive…
Many users:
Production, Ad-Hoc, Experiments…
…but, only for sophisticated apps
Fault tolerance
Resource
Managers
Enable true multi-tenancy…
Many workloads:
Streaming, Batch, Interactive…
Many users:
Production, Ad-Hoc, Experiments…
…but, only for sophisticated apps
Fault tolerance
Preemption
Resource
Managers
Enable true multi-tenancy…
Many workloads:
Streaming, Batch, Interactive…
Many users:
Production, Ad-Hoc, Experiments…
…but, only for sophisticated apps
Fault tolerance
Preemption
Elasticity
Example 1:
SQL / MapReduce
Elasticity
Fault tolerance
σ
π
⋈
Example 1:
SQL / MapReduce
Elasticity
Fault tolerance
σ
π
⋈ ⋈⋈ ⋈⋈
Example 1:
SQL / MapReduce
Elasticity
Fault tolerance
π
⋈ ⋈⋈ ⋈⋈
Example 2:
Machine learning
Fault tolerance
Elasticity
Iterative computations
Example 3:
Graph processing
Fault tolerance
Elasticity
Iterative computations
Low latency communication
Silos
Silos are hard to build
Each duplicates the same
mechanisms under the hood
In practice, silos form pipelines
• In each step:
Read from and write to HDFS
• Synchronize on complete data
between steps
 Slow!
σ
π
⋈
Fun new problems,
boring old ones
Control flow setup
Master / Slave
Membership
Via heartbeats
Task Submission
Inter-Task Messaging
…
σ
π
⋈
REEF
Breadth
Mechanism over Policy
Avoid silos
Recognize the need
for different models
But: allow them to be composed
RM portability
Make app independent from low-
level Resource Manager APIs.
Bridge the JVM/CLR/… divide
Different parts of the computation
can be in either of them.
Resource Manager and DFS := Cluster OS
REEF := stdlib
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc…
+ 3
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc.
+ 3
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc…
REEF Evaluators ( ) hold hardware
resources, allowing multiple Tasks
( , , , , , , etc…) to
use the same cached state.
πσ
+ 3
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc…
REEF Evaluators ( ) hold hardware
resources, allowing multiple Tasks
( , , , , , , etc…) to
use the same cached state.
πσ
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc…
REEF Evaluators ( ) hold hardware
resources, allowing multiple Tasks
( , , , , , , etc…) to
use the same cached state.
πσ
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc…
REEF Evaluators ( ) hold hardware
resources, allowing multiple Tasks
( , , , , , , etc…) to
use the same cached state.
πσ
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc…
REEF Evaluators ( ) hold hardware
resources, allowing multiple Tasks
( , , , , , , etc…) to
use the same cached state.
πσ
σ σ
σ
REEF Control Flow
Yarn ( ) handles resource
management (security, quotas,
priorities)
Per-job Drivers ( ) request
resources, coordinate computations,
and handle events: faults,
preemption, etc…
REEF Evaluators ( ) hold hardware
resources, allowing multiple Tasks
( , , , , , , etc…) to
use the same cached state.
πσ
Hello World
1. Client submits the Driver
allocation request to the
Resource Manager ( )
$…
Hello World
1. Client submits the Driver
allocation request to the
Resource Manager ( )
2. Once started, Driver ( )
requests one Evaluator
container from YARN
$…
Hello World
1. Client submits the Driver
allocation request to the
Resource Manager ( )
2. Once started, Driver ( )
requests one Evaluator
container from YARN
3. Driver submits HelloWorld
Task to the newly allocated
Evaluator
$…
Hello World
1. Client submits the Driver
allocation request to the
Resource Manager ( )
2. Once started, Driver ( )
requests one Evaluator
container from YARN
3. Driver submits a HelloWorld
Task to the newly allocated
Evaluator
4. HelloWorld Task prints a
greeting to the log and quits
$…
LOG.log(Level.INFO, "Hello REEF!");
Hello World
1. Client submits the Driver
allocation request to the
Resource Manager ( )
2. Once started, Driver ( )
requests one Evaluator
container from YARN
3. Driver submits a HelloWorld
Task to the newly allocated
Evaluator
4. HelloWorld Task prints a
greeting to the log and quits
5. Driver receives CompletedTask
notification, releases the
Evaluator and quits $…
public final class HelloTask implements Task {
@Inject
private HelloTask() {}
@Override
public byte[] call(final byte[] memento) {
LOG.log(Level.INFO, "Hello REEF!");
return null;
}
}
public final class StartHandler implements EventHandler<StartTime> {
@Override
public void onNext(final StartTime startTime) {
requestor.submit(EvaluatorRequest.newBuilder()
.setNumber(1).setMemory(64).setNumberOfCores(1).build());
}
}
public final class EvaluatorAllocatedHandler
implements EventHandler<AllocatedEvaluator> {
@Override
public void onNext(final AllocatedEvaluator allocatedEvaluator) {
allocatedEvaluator.submitTask(TaskConfiguration.CONF
.set(TaskConfiguration.IDENTIFIER, "HelloREEFTask")
.set(TaskConfiguration.TASK, HelloTask.class)
.build();
}
}
final Configuration runtimeConfig = YarnRuntimeConfiguration.CONF.build();
final Configuration driverConfig = DriverConfiguration.CONF
.set(DriverConfiguration.DRIVER_IDENTIFIER, "HelloREEF")
.set(DriverConfiguration.GLOBAL_LIBRARIES,
EnvironmentUtils.getClassLocation(HelloDriver.class))
.set(DriverConfiguration.ON_DRIVER_STARTED, HelloDriver.StartHandler.class)
.set(DriverConfiguration.ON_EVALUATOR_ALLOCATED,
HelloDriver.EvaluatorAllocatedHandler.class)
.build();
DriverLauncher
.getLauncher(runtimeConfig)
.run(driverConfig, JOB_TIMEOUT);
Tang
Command = ‘ls’
Error:
container-487236457-02.stderr:
NullPointerException at:
java…eval():1234
ShellTask.helper():546
ShellTask.onNext():789
YarnEvaluator.onNext():12
Configuration is hard
- Errors often show up at runtime only
- State of receiving process is
unknown to the configuring process
Tang
Command = ‘ls’
Error:
Unknown parameter "Command"
Missing required parameter "cmd"
Configuration is hard
- Errors often show up at runtime only
- State of receiving process is
unknown to the configuring process
Our approach:
- Use Dependency Injection
- Configuration is pure data
 Early static and dynamic checks
Tang
cmd = ‘ls’
ShellTask
Evaluator
Error:
Required instanceof Evaluator
Got ShellTask
Configuration is hard
- Errors often show up at runtime only
- State of receiving process is
unknown to the configuring process
Our approach:
- Use Dependency Injection
- Configuration is pure data
 Early static and dynamic checks
Tang
cmd = ‘ls’
ShellTask
Task
YarnEvaluator
Evaluator
Configuration is hard
- Errors often show up at runtime only
- State of receiving process is
unknown to the configuring process
Our approach:
- Use Dependency Injection
- Configuration is pure data
 Early static and dynamic checks
Wake:
Events + I/O
Event-based
programming and remoting
API: A static subset of Rx
- Static checking of event flows
- Aggressive JVM event inlining
Implementation: “SEDA++”
- Global thread pool
- Thread sharing where possible
Distributed Shell
1. Client submits Driver configuration
to YARN runtime. Configuration
has cmd shell command, and n
number of Evaluators to run it on
$…
Distributed Shell
1. Client submits Driver configuration
to YARN runtime. Configuration
has cmd shell command, and n
number of Evaluators to run it on
2. Once started, Driver requests n
Evaluators from YARN
$…
Distributed Shell
1. Client submits Driver configuration
to YARN runtime. Configuration
has cmd shell command, and n
number of Evaluators to run it on
2. Once started, Driver requests n
Evaluators from YARN
3. Driver submits ShellTask with cmd
to each Evaluator
#!
$…
#!
#!
Distributed Shell
1. Client submits Driver configuration
to YARN runtime. Configuration
has cmd shell command, and n
number of Evaluators to run it on
2. Once started, Driver requests n
Evaluators from YARN
3. Driver submits ShellTask with cmd
to each Evaluator
4. Each ShellTask runs the command,
logs its stdout, and quits
#!
$…
#!
#!
Distributed Shell
1. Client submits Driver configuration
to YARN runtime. Configuration
has cmd shell command, and n
number of Evaluators to run it on
2. Once started, Driver requests n
Evaluators from YARN
3. Driver submits ShellTask with cmd
to each Evaluator
4. Each ShellTask runs the command,
logs its stdout, and quits
5. Driver receives CompletedTask
notifications, releases Evaluators
and quits when all Evaluators are
gone
$…
@Inject
private ShellTask(@Parameter(Command.class) final String command) {
this.command = command;
}
@Override
public byte[] call(final byte[] memento) {
final String result = CommandUtils.runCommand(this.command);
LOG.log(Level.INFO, result);
return CODEC.encode(result);
}
@NamedParameter(doc="Number of evaluators", short_name="n", default_value="1")
public final class NumEvaluators implements Name<Integer> {}
@NamedParameter(doc="The shell command", short_name="cmd")
public final class Command implements Name<String> {}
public final class EvaluatorAllocatedHandler
implements EventHandler<AllocatedEvaluator> {
@Override
public void onNext(final AllocatedEvaluator allocatedEvaluator) {
final JavaConfigurationBuilder taskConfigBuilder =
tang.newConfigurationBuilder(TaskConfiguration.CONF
.set(TaskConfiguration.IDENTIFIER, "ShellTask")
.set(TaskConfiguration.TASK, ShellTask.class)
.build());
taskConfigBuilder.bindNamedParameter(Command.class, command);
allocatedEvaluator.submitTask(taskConfigBuilder.build());
}
}
final JavaConfigurationBuilder driverConfig =
tang.newConfigurationBuilder(DriverConfiguration.CONF
.set(DriverConfiguration.DRIVER_IDENTIFIER, "DistributedShell")
.set(DriverConfiguration.GLOBAL_LIBRARIES,
EnvironmentUtils.getClassLocation(ShellDriver.class))
.set(DriverConfiguration.ON_DRIVER_STARTED, ShellDriver.StartHandler.class)
.set(DriverConfiguration.ON_EVALUATOR_ALLOCATED,
ShellDriver.EvaluatorAllocatedHandler.class)
.build());
new CommandLine(driverConfigBuilder)
.registerShortNameOfClass(Command.class)
.registerShortNameOfClass(NumEvaluators.class)
.processCommandLine(args);
DriverLauncher.getLauncher(runtimeConfig).run(driverConfig.build(), JOB_TIMEOUT);
YARN Example: 1333 lines of code
YARN Example: 1333 lines of code
Simple REEF Application: 83 lines
94%
YARN Example: 1333 lines of code
Simple REEF Application: 83 lines
Interactive Fault-Tolerant
Web Application on REEF: 361 lines
27%
•
•
•
•
•
Group
Communication
Also: Collective Communication
Communications with many
participants
Contrast: Peer-to-peer communication
Most commonly used interface: MPI
Broadcast
Mechanism
The sender (S) sends a value
All receivers (R) receive the identical value
Use case
Distribute a model
Distribute a descent direction for line search
Optimizations
Trees to distribute the data
Be mindful of the topology of machines
Do peer-to-peer sends
…
S
Reduce
Mechanism
The senders (S) each send a value
Those values are aggregated
The receiver (R) receives the aggregate total
Use case
Aggregate gradients, losses
Optimizations
Aggregation trees
Pipelining
// We can send and receive any Java serializable data, e.g. jBLAS matrices
private final Broadcast.Sender<DoubleMatrix> modelSender;
private final Broadcast.Receiver<DoubleMatrix[]> resultReceiver;
// Broadcast the model, collect the results, repeat.
do {
this.modelSender.send(modelSlice);
// ...
final DoubleMatrix[] result = this.resultReceiver.reduce();
} while (notConverged(modelSlice, prevModelSlice));
http://reef.apache.org
http://github.com/cmssnu/dolphin
http://github.com/Microsoft-CISL/TensorFactorization-LDA
Contact dev@reef.apache.org
Website http://reef.apache.org
Start with a random 𝑤0
Until convergence:
Compute the gradient
𝜕 𝑤 = ෍
𝑥,𝑦 𝜖 𝑋
𝑤, 𝑥 − 𝑦
Apply gradient and regularizer to the model
𝑤𝑡+1 = 𝑤𝑡 − 𝜂 𝜕 𝑤 + 𝜆
Data parallel in X
 Reduce
Needed by Partitions
 Broadcast
On REEF
Driver requests Evaluators
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
Driver sends ComputeGradient and
master Tasks
On REEF
Driver requests Evaluators
Driver sends Tasks to load & parse
data
Driver sends ComputeGradient and
master Tasks
Computation commences in
sequence of Broadcast and Reduce

Start with a random 𝑤0
Until convergence:
Compute the gradient
𝜕 𝑤 = ෍
𝑥,𝑦 𝜖 𝑋
2 𝑤, 𝑥 − 𝑦
Apply gradient to the model
𝑤𝑡+1 = 𝑤𝑡 − 𝜕 𝑤
Not having some machines means
training on a (random) subset of X
On REEF
First iteration
On REEF
Second Iteration
On REEF
End state
Apache REEF - stdlib for big data

More Related Content

What's hot

Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Evan Chan
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
DataWorks Summit
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
P. Taylor Goetz
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
Rahul Jain
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Helena Edelson
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
Hari Shreedharan
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
Rahul Kumar
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Spark Summit
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
Dori Waldman
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Helena Edelson
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
hitesh1892
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
DataWorks Summit/Hadoop Summit
 
Meetup ml spark_ppt
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_ppt
Snehal Nagmote
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
Rahul Jain
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Lucidworks
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
DataWorks Summit/Hadoop Summit
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server TalkEvan Chan
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Robert "Chip" Senkbeil
 

What's hot (20)

Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared ClustersMercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
Mercury: Hybrid Centralized and Distributed Scheduling in Large Shared Clusters
 
Apache storm vs. Spark Streaming
Apache storm vs. Spark StreamingApache storm vs. Spark Streaming
Apache storm vs. Spark Streaming
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Spark+flume seattle
Spark+flume seattleSpark+flume seattle
Spark+flume seattle
 
Reactive app using actor model & apache spark
Reactive app using actor model & apache sparkReactive app using actor model & apache spark
Reactive app using actor model & apache spark
 
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
Recipes for Running Spark Streaming Applications in Production-(Tathagata Das...
 
Spark stream - Kafka
Spark stream - Kafka Spark stream - Kafka
Spark stream - Kafka
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Running Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache HadoopRunning Non-MapReduce Big Data Applications on Apache Hadoop
Running Non-MapReduce Big Data Applications on Apache Hadoop
 
The Future of Apache Storm
The Future of Apache StormThe Future of Apache Storm
The Future of Apache Storm
 
Meetup ml spark_ppt
Meetup ml spark_pptMeetup ml spark_ppt
Meetup ml spark_ppt
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
Near Real Time Indexing Kafka Messages into Apache Blur: Presented by Dibyend...
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
 
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark ClustersA Container-based Sizing Framework for Apache Hadoop/Spark Clusters
A Container-based Sizing Framework for Apache Hadoop/Spark Clusters
 
Spark Summit 2014: Spark Job Server Talk
Spark Summit 2014:  Spark Job Server TalkSpark Summit 2014:  Spark Job Server Talk
Spark Summit 2014: Spark Job Server Talk
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
Spark Kernel Talk - Apache Spark Meetup San Francisco (July 2015)
 

Similar to Apache REEF - stdlib for big data

Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination ExtRohit Kelapure
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
Sadayuki Furuhashi
 
Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talk
Amrit Sarkar
 
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
DevOpsDays Tel Aviv
 
Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)
Viral Solani
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
Achieve Internet
 
Introduction to PowerShell
Introduction to PowerShellIntroduction to PowerShell
Introduction to PowerShell
Boulos Dib
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAkshaya Mahapatra
 
The Integration of Laravel with Swoole
The Integration of Laravel with SwooleThe Integration of Laravel with Swoole
The Integration of Laravel with Swoole
Albert Chen
 
Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java
Baruch Sadogursky
 
YARN Services
YARN ServicesYARN Services
YARN Services
Steve Loughran
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
Alex Payne
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
nzhang
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
preethaappan
 
OGSA-DAI DQP: A Developer's View
OGSA-DAI DQP: A Developer's ViewOGSA-DAI DQP: A Developer's View
OGSA-DAI DQP: A Developer's View
Bartosz Dobrzelecki
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
Daniel Bryant
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward
 
Running Microservices on AWS Elastic Beanstalk
Running Microservices on AWS Elastic BeanstalkRunning Microservices on AWS Elastic Beanstalk
Running Microservices on AWS Elastic Beanstalk
Amazon Web Services
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
Robert Metzger
 
Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...
Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...
Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...
Amazon Web Services
 

Similar to Apache REEF - stdlib for big data (20)

Web Sphere Problem Determination Ext
Web Sphere Problem Determination ExtWeb Sphere Problem Determination Ext
Web Sphere Problem Determination Ext
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Cdcr apachecon-talk
Cdcr apachecon-talkCdcr apachecon-talk
Cdcr apachecon-talk
 
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
Deploy and Destroy: Testing Environments - Michael Arenzon - DevOpsDays Tel A...
 
Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)Introduction to Laravel Framework (5.2)
Introduction to Laravel Framework (5.2)
 
Harmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and PuppetHarmonious Development: Via Vagrant and Puppet
Harmonious Development: Via Vagrant and Puppet
 
Introduction to PowerShell
Introduction to PowerShellIntroduction to PowerShell
Introduction to PowerShell
 
Automating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps ApproachAutomating Software Development Life Cycle - A DevOps Approach
Automating Software Development Life Cycle - A DevOps Approach
 
The Integration of Laravel with Swoole
The Integration of Laravel with SwooleThe Integration of Laravel with Swoole
The Integration of Laravel with Swoole
 
Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java Everything you wanted to know about writing async, concurrent http apps in java
Everything you wanted to know about writing async, concurrent http apps in java
 
YARN Services
YARN ServicesYARN Services
YARN Services
 
Building Distributed Systems in Scala
Building Distributed Systems in ScalaBuilding Distributed Systems in Scala
Building Distributed Systems in Scala
 
Hive Anatomy
Hive AnatomyHive Anatomy
Hive Anatomy
 
Velocity 2018 preetha appan final
Velocity 2018   preetha appan finalVelocity 2018   preetha appan final
Velocity 2018 preetha appan final
 
OGSA-DAI DQP: A Developer's View
OGSA-DAI DQP: A Developer's ViewOGSA-DAI DQP: A Developer's View
OGSA-DAI DQP: A Developer's View
 
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
J1 2015 "Debugging Java Apps in Containers: No Heavy Welding Gear Required"
 
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
Flink Forward SF 2017: Joe Olson - Using Flink and Queryable State to Buffer ...
 
Running Microservices on AWS Elastic Beanstalk
Running Microservices on AWS Elastic BeanstalkRunning Microservices on AWS Elastic Beanstalk
Running Microservices on AWS Elastic Beanstalk
 
Apache Flink Hands On
Apache Flink Hands OnApache Flink Hands On
Apache Flink Hands On
 
Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...
Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...
Running Microservices and Docker on AWS Elastic Beanstalk - August 2016 Month...
 

Recently uploaded

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Natan Silnitsky
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
Srikant77
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 

Recently uploaded (20)

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
RISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent EnterpriseRISE with SAP and Journey to the Intelligent Enterprise
RISE with SAP and Journey to the Intelligent Enterprise
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 

Apache REEF - stdlib for big data

  • 1.
  • 2. a library to simplify and unify the lower layers of big data systems on modern resource managers.
  • 3. • One cluster used by all workloads (interactive, batch, streaming, …) • Resources are handed out as containers Container is slice of a machine Fixed RAM, CPU, I/O, … • Examples: Apache Hadoop YARN Apache Mesos Google Borg Resource Managers
  • 4. Resource Managers Enable true multi-tenancy… Many workloads: Streaming, Batch, Interactive… Many users: Production, Ad-Hoc, Experiments… …but, only for sophisticated apps
  • 5. Resource Managers Enable true multi-tenancy… Many workloads: Streaming, Batch, Interactive… Many users: Production, Ad-Hoc, Experiments… …but, only for sophisticated apps Fault tolerance
  • 6. Resource Managers Enable true multi-tenancy… Many workloads: Streaming, Batch, Interactive… Many users: Production, Ad-Hoc, Experiments… …but, only for sophisticated apps Fault tolerance
  • 7. Resource Managers Enable true multi-tenancy… Many workloads: Streaming, Batch, Interactive… Many users: Production, Ad-Hoc, Experiments… …but, only for sophisticated apps Fault tolerance Preemption
  • 8. Resource Managers Enable true multi-tenancy… Many workloads: Streaming, Batch, Interactive… Many users: Production, Ad-Hoc, Experiments… …but, only for sophisticated apps Fault tolerance Preemption Elasticity
  • 9. Example 1: SQL / MapReduce Elasticity Fault tolerance σ π ⋈
  • 10. Example 1: SQL / MapReduce Elasticity Fault tolerance σ π ⋈ ⋈⋈ ⋈⋈
  • 11. Example 1: SQL / MapReduce Elasticity Fault tolerance π ⋈ ⋈⋈ ⋈⋈
  • 12. Example 2: Machine learning Fault tolerance Elasticity Iterative computations
  • 13. Example 3: Graph processing Fault tolerance Elasticity Iterative computations Low latency communication
  • 14. Silos Silos are hard to build Each duplicates the same mechanisms under the hood In practice, silos form pipelines • In each step: Read from and write to HDFS • Synchronize on complete data between steps  Slow! σ π ⋈
  • 15. Fun new problems, boring old ones Control flow setup Master / Slave Membership Via heartbeats Task Submission Inter-Task Messaging … σ π ⋈
  • 16. REEF Breadth Mechanism over Policy Avoid silos Recognize the need for different models But: allow them to be composed RM portability Make app independent from low- level Resource Manager APIs. Bridge the JVM/CLR/… divide Different parts of the computation can be in either of them. Resource Manager and DFS := Cluster OS REEF := stdlib
  • 17.
  • 18.
  • 19.
  • 20. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… + 3
  • 21. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc. + 3
  • 22. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ + 3
  • 23. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ
  • 24. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ
  • 25. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ
  • 26. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ σ σ σ
  • 27. REEF Control Flow Yarn ( ) handles resource management (security, quotas, priorities) Per-job Drivers ( ) request resources, coordinate computations, and handle events: faults, preemption, etc… REEF Evaluators ( ) hold hardware resources, allowing multiple Tasks ( , , , , , , etc…) to use the same cached state. πσ
  • 28.
  • 29.
  • 30. Hello World 1. Client submits the Driver allocation request to the Resource Manager ( ) $…
  • 31. Hello World 1. Client submits the Driver allocation request to the Resource Manager ( ) 2. Once started, Driver ( ) requests one Evaluator container from YARN $…
  • 32. Hello World 1. Client submits the Driver allocation request to the Resource Manager ( ) 2. Once started, Driver ( ) requests one Evaluator container from YARN 3. Driver submits HelloWorld Task to the newly allocated Evaluator $…
  • 33. Hello World 1. Client submits the Driver allocation request to the Resource Manager ( ) 2. Once started, Driver ( ) requests one Evaluator container from YARN 3. Driver submits a HelloWorld Task to the newly allocated Evaluator 4. HelloWorld Task prints a greeting to the log and quits $… LOG.log(Level.INFO, "Hello REEF!");
  • 34. Hello World 1. Client submits the Driver allocation request to the Resource Manager ( ) 2. Once started, Driver ( ) requests one Evaluator container from YARN 3. Driver submits a HelloWorld Task to the newly allocated Evaluator 4. HelloWorld Task prints a greeting to the log and quits 5. Driver receives CompletedTask notification, releases the Evaluator and quits $…
  • 35. public final class HelloTask implements Task { @Inject private HelloTask() {} @Override public byte[] call(final byte[] memento) { LOG.log(Level.INFO, "Hello REEF!"); return null; } }
  • 36. public final class StartHandler implements EventHandler<StartTime> { @Override public void onNext(final StartTime startTime) { requestor.submit(EvaluatorRequest.newBuilder() .setNumber(1).setMemory(64).setNumberOfCores(1).build()); } } public final class EvaluatorAllocatedHandler implements EventHandler<AllocatedEvaluator> { @Override public void onNext(final AllocatedEvaluator allocatedEvaluator) { allocatedEvaluator.submitTask(TaskConfiguration.CONF .set(TaskConfiguration.IDENTIFIER, "HelloREEFTask") .set(TaskConfiguration.TASK, HelloTask.class) .build(); } }
  • 37. final Configuration runtimeConfig = YarnRuntimeConfiguration.CONF.build(); final Configuration driverConfig = DriverConfiguration.CONF .set(DriverConfiguration.DRIVER_IDENTIFIER, "HelloREEF") .set(DriverConfiguration.GLOBAL_LIBRARIES, EnvironmentUtils.getClassLocation(HelloDriver.class)) .set(DriverConfiguration.ON_DRIVER_STARTED, HelloDriver.StartHandler.class) .set(DriverConfiguration.ON_EVALUATOR_ALLOCATED, HelloDriver.EvaluatorAllocatedHandler.class) .build(); DriverLauncher .getLauncher(runtimeConfig) .run(driverConfig, JOB_TIMEOUT);
  • 38. Tang Command = ‘ls’ Error: container-487236457-02.stderr: NullPointerException at: java…eval():1234 ShellTask.helper():546 ShellTask.onNext():789 YarnEvaluator.onNext():12 Configuration is hard - Errors often show up at runtime only - State of receiving process is unknown to the configuring process
  • 39. Tang Command = ‘ls’ Error: Unknown parameter "Command" Missing required parameter "cmd" Configuration is hard - Errors often show up at runtime only - State of receiving process is unknown to the configuring process Our approach: - Use Dependency Injection - Configuration is pure data  Early static and dynamic checks
  • 40. Tang cmd = ‘ls’ ShellTask Evaluator Error: Required instanceof Evaluator Got ShellTask Configuration is hard - Errors often show up at runtime only - State of receiving process is unknown to the configuring process Our approach: - Use Dependency Injection - Configuration is pure data  Early static and dynamic checks
  • 41. Tang cmd = ‘ls’ ShellTask Task YarnEvaluator Evaluator Configuration is hard - Errors often show up at runtime only - State of receiving process is unknown to the configuring process Our approach: - Use Dependency Injection - Configuration is pure data  Early static and dynamic checks
  • 42. Wake: Events + I/O Event-based programming and remoting API: A static subset of Rx - Static checking of event flows - Aggressive JVM event inlining Implementation: “SEDA++” - Global thread pool - Thread sharing where possible
  • 43.
  • 44. Distributed Shell 1. Client submits Driver configuration to YARN runtime. Configuration has cmd shell command, and n number of Evaluators to run it on $…
  • 45. Distributed Shell 1. Client submits Driver configuration to YARN runtime. Configuration has cmd shell command, and n number of Evaluators to run it on 2. Once started, Driver requests n Evaluators from YARN $…
  • 46. Distributed Shell 1. Client submits Driver configuration to YARN runtime. Configuration has cmd shell command, and n number of Evaluators to run it on 2. Once started, Driver requests n Evaluators from YARN 3. Driver submits ShellTask with cmd to each Evaluator #! $… #! #!
  • 47. Distributed Shell 1. Client submits Driver configuration to YARN runtime. Configuration has cmd shell command, and n number of Evaluators to run it on 2. Once started, Driver requests n Evaluators from YARN 3. Driver submits ShellTask with cmd to each Evaluator 4. Each ShellTask runs the command, logs its stdout, and quits #! $… #! #!
  • 48. Distributed Shell 1. Client submits Driver configuration to YARN runtime. Configuration has cmd shell command, and n number of Evaluators to run it on 2. Once started, Driver requests n Evaluators from YARN 3. Driver submits ShellTask with cmd to each Evaluator 4. Each ShellTask runs the command, logs its stdout, and quits 5. Driver receives CompletedTask notifications, releases Evaluators and quits when all Evaluators are gone $…
  • 49. @Inject private ShellTask(@Parameter(Command.class) final String command) { this.command = command; } @Override public byte[] call(final byte[] memento) { final String result = CommandUtils.runCommand(this.command); LOG.log(Level.INFO, result); return CODEC.encode(result); }
  • 50. @NamedParameter(doc="Number of evaluators", short_name="n", default_value="1") public final class NumEvaluators implements Name<Integer> {} @NamedParameter(doc="The shell command", short_name="cmd") public final class Command implements Name<String> {}
  • 51. public final class EvaluatorAllocatedHandler implements EventHandler<AllocatedEvaluator> { @Override public void onNext(final AllocatedEvaluator allocatedEvaluator) { final JavaConfigurationBuilder taskConfigBuilder = tang.newConfigurationBuilder(TaskConfiguration.CONF .set(TaskConfiguration.IDENTIFIER, "ShellTask") .set(TaskConfiguration.TASK, ShellTask.class) .build()); taskConfigBuilder.bindNamedParameter(Command.class, command); allocatedEvaluator.submitTask(taskConfigBuilder.build()); } }
  • 52. final JavaConfigurationBuilder driverConfig = tang.newConfigurationBuilder(DriverConfiguration.CONF .set(DriverConfiguration.DRIVER_IDENTIFIER, "DistributedShell") .set(DriverConfiguration.GLOBAL_LIBRARIES, EnvironmentUtils.getClassLocation(ShellDriver.class)) .set(DriverConfiguration.ON_DRIVER_STARTED, ShellDriver.StartHandler.class) .set(DriverConfiguration.ON_EVALUATOR_ALLOCATED, ShellDriver.EvaluatorAllocatedHandler.class) .build()); new CommandLine(driverConfigBuilder) .registerShortNameOfClass(Command.class) .registerShortNameOfClass(NumEvaluators.class) .processCommandLine(args); DriverLauncher.getLauncher(runtimeConfig).run(driverConfig.build(), JOB_TIMEOUT);
  • 53. YARN Example: 1333 lines of code
  • 54. YARN Example: 1333 lines of code Simple REEF Application: 83 lines 94%
  • 55. YARN Example: 1333 lines of code Simple REEF Application: 83 lines Interactive Fault-Tolerant Web Application on REEF: 361 lines 27%
  • 57.
  • 58. Group Communication Also: Collective Communication Communications with many participants Contrast: Peer-to-peer communication Most commonly used interface: MPI
  • 59. Broadcast Mechanism The sender (S) sends a value All receivers (R) receive the identical value Use case Distribute a model Distribute a descent direction for line search Optimizations Trees to distribute the data Be mindful of the topology of machines Do peer-to-peer sends … S
  • 60. Reduce Mechanism The senders (S) each send a value Those values are aggregated The receiver (R) receives the aggregate total Use case Aggregate gradients, losses Optimizations Aggregation trees Pipelining
  • 61.
  • 62.
  • 63. // We can send and receive any Java serializable data, e.g. jBLAS matrices private final Broadcast.Sender<DoubleMatrix> modelSender; private final Broadcast.Receiver<DoubleMatrix[]> resultReceiver; // Broadcast the model, collect the results, repeat. do { this.modelSender.send(modelSlice); // ... final DoubleMatrix[] result = this.resultReceiver.reduce(); } while (notConverged(modelSlice, prevModelSlice));
  • 65.
  • 68.
  • 69.
  • 70. Start with a random 𝑤0 Until convergence: Compute the gradient 𝜕 𝑤 = ෍ 𝑥,𝑦 𝜖 𝑋 𝑤, 𝑥 − 𝑦 Apply gradient and regularizer to the model 𝑤𝑡+1 = 𝑤𝑡 − 𝜂 𝜕 𝑤 + 𝜆 Data parallel in X  Reduce Needed by Partitions  Broadcast
  • 72. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data
  • 73. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks
  • 74. On REEF Driver requests Evaluators Driver sends Tasks to load & parse data Driver sends ComputeGradient and master Tasks Computation commences in sequence of Broadcast and Reduce
  • 75.
  • 76.
  • 77. Start with a random 𝑤0 Until convergence: Compute the gradient 𝜕 𝑤 = ෍ 𝑥,𝑦 𝜖 𝑋 2 𝑤, 𝑥 − 𝑦 Apply gradient to the model 𝑤𝑡+1 = 𝑤𝑡 − 𝜕 𝑤 Not having some machines means training on a (random) subset of X