8. Building
a
New
Processing
Framework
on
YARN
Copyright
2012
Cloudera
Inc.
All
rights
reserved
9. A
Terrifyingly
Accurate
Paraphrasing
of
JWZ
Some
people,
when
confronted
with
a
tedious
problem,
say,
“I
know,
I’ll
write
a
framework.”
Now
they
have
two
tedious
problems.
Copyright
2012
Cloudera
Inc.
All
rights
reserved
11. The
Example
YARN
App:
Distributed
Shell
Copyright
2012
Cloudera
Inc.
All
rights
reserved
12. Do
We
Need
a
New
Programming
Language
for
Developing
YARN
ApplicaUons?
Copyright
2012
Cloudera
Inc.
All
rights
reserved
13. Do
We
Need
a
New
Programming
Language
for
Developing
YARN
ApplicaUons?
Copyright
2012
Cloudera
Inc.
All
rights
reserved
14. Leverage
ExisUng
Frameworks
• Popular
RPC
libraries
with
support
for
mul@ple
languages
• C++,
Java,
Python
• We
need
to
make
it
easy
to
deploy
exisUng
applicaUons
on
YARN
Copyright
2012
Cloudera
Inc.
All
rights
reserved
16. Design
PaXern:
The
Unified
ApplicaUon
Master
• Contains
business
logic
and
YARN
logic
• Primary
reason:
Communica@on
• Also:
dynamic
resource
alloca@on
• Develop
our
master/
worker
applicaUons
locally
and
then
deploy
them
on
YARN
Copyright
2012
Cloudera
Inc.
All
rights
reserved
17. YARN
Lifecycle
Management
as
a
Service
• Specifically,
extensions
of
Guava’s
Service
interface
• YarnClientService
• AppMasterService
• Contains
all
of
the
logic
for
crea@ng
applica@ons
and
keeping
an
eye
on
them
Copyright
2012
Cloudera
Inc.
All
rights
reserved
18. Moving
the
ConfiguraUon
Logic
Out
of
Java
Copyright
2012
Cloudera
Inc.
All
rights
reserved
19. Lua
as
a
ConfiguraUon
Language
• Small
and
Simple
• Looks
like
a
configura@on
file
• Func@ons
are
there
when/if
you
need
them
• Inheritance
• Don’t
Repeat
Yourself
• Forgiving
of
undefined
values
• Java/C++
Integra@on
Copyright
2012
Cloudera
Inc.
All
rights
reserved
20. First
KiXen
UUlity:
The
cat
FuncUon
Copyright
2012
Cloudera
Inc.
All
rights
reserved
21. Second
KiXen
UUlity:
The
yarn
FuncUon
Copyright
2012
Cloudera
Inc.
All
rights
reserved
24. The
Challenge
of
Parallel
Branch
and
Bound:
Unbalanced
Search
Space
• Some
branches
are
pruned
quickly
• Can
be
difficult
to
determine
the
best
splits
a
priori
• Easy
to
revert
to
a
de
facto
single-‐threaded
search
Copyright
2012
Cloudera
Inc.
All
rights
reserved
25. The
SoluUon:
Work
Stealing
Copyright
2012
Cloudera
Inc.
All
rights
reserved
26. You
Write
Three
Classes
• A
Task
class
that
implements
Writable
• A
GlobalState
class
that
implements
Writable
and
has
a
mergeWith(GlobalState
other)
method
• A
Processor
class
that
defines:
• execute(T
task,
BranchReduceContext<T,
GlobalState>
ctxt);
• With
op@onal
iniUalize
and
cleanup
methods
• Configura@on
is
done
via
BranchReduceJob
Copyright
2012
Cloudera
Inc.
All
rights
reserved