Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn

Hadoop
2:
Eﬃcient
mul3-‐tenant

workloads
that
enable
the

Modern
Data
Architecture

SCALE
12X,
Los
Angeles

February
23,
2014

David
Kaiser

@ddkaiser

linkedin.com/in/dkaiser

facebook.com/dkaiser

dkaiser@cdk.com

dkaiser@hortonworks.com

Who Am I?
20+
years
experience
with
Linux

3
years
experience
with
Hadoop

Career
experiences:

•  Data
Warehousing

•  Geospa3al
Analy3cs

•  Open-‐source
Solu3ons
and
Architecture

Employed
at
Hortonworks
as
a
Senior
Solu3ons
Engineer

David
Kaiser

@ddkaiser



dkaiser@cdk.com


Hadoop 2: Efficient multi-tenant workloads
that enable the Modern Data Architecture
•  Abstract:
– Hadoop
is
about
so
much
more
than
batch
processing.

With
the

recent
release
of
Hadoop
2,
there
have
been
signiﬁcant
changes

to
how
a
Hadoop
cluster
uses
resources.

– YARN,
the
new
resource
management
component,
allows
for
a

more
eﬃcient
mix
of
workloads
across
hardware
resources,
and

enables
new
applica3ons
and
new
processing
paradigms
such
as

stream-‐processing.

– This
talk
will
discuss
the
new
design
and
components
of
Hadoop

2,
and
provide
examples
of
Modern
Data
Architectures
that

leverage
Hadoop
2.

What is This Thing?

hZp://hadoop.apache.org/

Misconceptions
•  Bucket
brigade
for
large
or
slow
data
processing
tasks

Misconceptions
•  Bucket
brigade
for
large
or
slow
data
processing
tasks

•  Batch
processor
–
Another
mainframe

Misconceptions
•  Bucket
brigade
for
large
or
slow
data
processing
tasks

•  Batch
processor
–
Another
mainframe

•  Dumb/inﬂexible,
trendy,
too
simple

Misconceptions
•  Incorrect
assump3on
that
Java
==
SLOW

Misconceptions
•  Incorrect
assump3on
that
Java
==
EVIL

Hadoop + Linux
Provides a 100% Open-Source framework for efficient
scalable data processing on commodity hardware
Hadoop
–
The

Open-‐source

Data
Opera3ng
System

Linux
–
The

Open-‐source

Opera3ng
System

Commodity

Hardware

Hadoop Fundamentals
•  Hadoop is a single system, across multiple Linux systems
•  Two basic capabilities of Hadoop
– Reliable,
Redundant
and
Distributed
Storage

– Distributed
Computa3on

•  Storage: Hadoop Distributed File System (HDFS)
– Replicated,
distributed
ﬁlesystem

– Blocks
wriZen
to
underlying
ﬁlesystem
on
mul3ple
nodes

•  Computation
– Resource
management

– Frameworks
to
divide
workloads
across
collec3on
of
resources

–  Hadoop
V1:
MapReduce
framework
only

–  Hadoop
V2:
MapReduce,
Tez,
Spark,
others…

HDFS: File create lifecycle
HDFS
CLIENT

FILE

B1
B2

FILE

2

1

Create

ack

4

3

Complete

NameNode

B1

B2

B1

RACK3

B1

RACK2

RACK1

ack

B2

ack

B2

Page
16

Hadoop 1 Computation
•  MapReduce Framework
–  Combined
both
Resource
Management
and
Applica3on
Logic
in
the
same
code

•  Limitations
–  Resource
alloca3on
units
(slots)
fixed
per
cluster

–  Difficult
to
use
a
cluster
for
differing
or
simultaneous
workloads

The 1st Generation of Hadoop: Batch
HADOOP
1.0

Built
for
Web-‐Scale
Batch
Apps

Single
App

Single
App

INTERACTIVE

ONLINE

Single
App

Single
App

Single
App

BATCH

BATCH

BATCH

HDFS

HDFS

HDFS

•  All
other
usage

paZerns
must
leverage

that
same

infrastructure

•  Forces
the
crea3on
of

silos
for
managing

mixed
workloads

Hadoop MapReduce Classic
• JobTracker
– Manages
cluster
resources
and
job
scheduling

• TaskTracker
– Per-‐node
agent

– Manage
tasks

Page 19

MapReduce Classic: Limitations
• Scalability
– Maximum
Cluster
size
–
4,000
nodes

– Maximum
concurrent
tasks
–
40,000

– Coarse
synchroniza3on
in
JobTracker

•  Availability

– Failure
kills
all
queued
and
running
jobs

•  Hard partition of resources into map and reduce slots

– Low
resource
u3liza3on

•  Lacks support for alternate paradigms and services

– Itera3ve
applica3ons
implemented
using
MapReduce
are
10x
slower

Page 20

Hadoop 1: Poor Utilization of Cluster Resources
Hadoop
1
JobTracker
and
TaskTracker
used
ﬁxed-‐sized
“slots”
for
resource
alloca3on

Map
tasks
are
wai3ng
for

the
slots
which
are
NOT

currently
used
by
reduce

tasks

Hard-‐Coded
values.
Task

tracker
must
be
restarted

aker
a
change

Hadoop 2: Moving Past MapReduce
Single
Use
System

Mul/
Purpose
Pla5orm

Batch
Apps

Batch,
Interac/ve,
Online,
Streaming,
…

HADOOP
1.0

HADOOP
2.0

MapReduce

Others

(data
processing)

MapReduce

YARN

(cluster
resource
management

&
data
processing)

(cluster
resource
management)

HDFS

HDFS2

(redundant,
reliable
storage)

(redundant,
highly-‐available
&
reliable
storage)

Page
22

Apache Tez as the new Primitive
MapReduce
as
Base

Apache
Tez
as
Base

HADOOP
1.0

HADOOP
2.0

Batch

MapReduce

Pig

(data
ﬂow)

Hive
Others

(sql)

(cascading)

MapReduce

Data
Flow

Pig

SQL

Hive

Others

Real
Time

Stream

Processing

Storm

(cascading)

Tez

(execu3on
engine)

HBase,

Accumulo

??
(HOYA)

(con3nuous
execu3on)

YARN

(cluster
resource
management

&
data
processing)

(cluster
resource
management)

HDFS

HDFS2

(redundant,
reliable
storage)

Online

Data

Processing

(redundant,
reliable
storage)

Tez – Execution Performance
•  Performance gains over Map Reduce
–  Eliminate
replicated
write
barrier
between
successive
computa3ons.

–  Eliminate
job
launch
overhead
of
workflow
jobs.

–  Eliminate
extra
stage
of
map
reads
in
every
workflow
job.

–  Eliminate
queue
and
resource
conten3on
suffered
by
workflow
jobs
that
are
started
aker

a
predecessor
job
completes.

Pig/Hive
-‐
MR

Pig/Hive
-‐
Tez

Page
24

YARN: Taking Hadoop Beyond Batch
Store ALL DATA in one place…
Interact with that data in MULTIPLE WAYS

with Predictable Performance and Quality of Service
ApplicaSons
Run
NaSvely
in
Hadoop

BATCH

INTERACTIVE

(MapReduce)

(Tez)

ONLINE

(HBase)

STREAMING

(Storm,
S4,…)

GRAPH

(Giraph)

IN-‐MEMORY

(Spark)

HPC
MPI

(OpenMPI)

OTHER

(Search)

(Weave…)

YARN
(Cluster
Resource
Management)

HDFS2
(Redundant,
Reliable
Storage)

Page 25

YARN Overview
•  Goals:
– Reduce
the
responsibili3es
of
the
JobTracker

–  Separate
the
resource
management
du3es
away
from
the
job
coordina3on
du3es

– Allow
mul3ple
simultaneous
jobs

–  Enables
diﬀerent
style
and
sized
workloads
in
one
cluster

•  Design:
– A
separate
Resource
Manager

–  1
Global
Resource
Scheduler
for
the
en3re
cluster

–  Each
worker
(slave)
node
runs
a
Node
Manager,

manages
life-‐cycle
of
containers

– JobTracker
is
now
called
Applica3on
Master

–  Each
Applica3on
has
1
Applica3on
Master

–  Manages
applica3on
scheduling
and
task
execu3on

YARN Architecture
ResourceManager

Client
1

Scheduler

NodeManager

NodeManager

Client
2

NodeManager

NodeManager

Container
1.1

Container
2.1

Container
2.4

NodeManager

AM
1

NodeManager

NodeManager

Container
1.2

NodeManager

Container
1.3

NodeManager

AM2

NodeManager

NodeManager

Container
2.2

NodeManager

Container
2.3

Capacity Sharing: Concepts
• Application
– Applica3on
is
a
temporal
job
or
a
service
submiZed
to
YARN

– Examples

–  Map
Reduce
Job
(job)

–  Storm
topology
(service)

• Container
– Basic
unit
of
alloca3on

– Fine-‐grained
resource
alloca3on
across
mul3ple
resource
types

(memory,
cpu,
disk,
network,
etc.)

–  container_0
=
2GB

–  container_1
=
1GB

– Replaces
ﬁxed
map/reduce
slots
(from
Hadoop
1.x)

28

YARN – Resource Allocation & Usage!
•  ResourceRequest!
–  Fine-‐grained
resource
ask
to
the
ResourceManager

–  Ask
for
a
speciﬁc
amount
of
resources

(memory,
cpu
etc.)
on
a
speciﬁc
machine
or
rack

–  Use
special
value
of
*
for
resource
name
for
any
machine

ResourceRequest!
priority!
resourceName!
capability!
numContainers!

priority!

capability!

!
0!

!
<2gb, 1 core>!

resourceName! numContainers!

<4gb, 1 core>!

1!

rack0!

1!

*!
1!

host01!

1!

*!

1!
Page
29

CGroup
•  Linux Kernel capability to limit, account and isolate resources
–  CPU
:
Controlling
the
prioriza3on
of
processes
in
the
group.
Think
of
it
as
a
more
advanced

nice
level

–  Memory
:
Allow
for
setng
limits
on
RAM
and
swap
usage

–  Disk
I/O

–  Network

•  YARN currently support, CPU / Memory

List of YARN Apps
•  MapReduce (of course)
•  Apache Tez
–  Apache
Hive

–  Apache
Pig

•  Apache Hama - Iterative, Bulk Synchronous Parallel (BSP) engine
•  Apache Giraph - Iterative, BSP-based Graph Analysis engine
•  HBase on YARN (HOYA)
•  Apache Storm – Real-time stream processing
•  Apache Spark – Advanced DAG execution engine that supports cyclic data
flow and in-memory computing
•  Apache S4 – Real-time processing
•  Open MPI – Open source Message Passing Interface for HPC
http://wiki.apache.org/hadoop/PoweredByYarn

The YARN Book
•  “Coming Soon”
•  Expected by 2nd Quarter 2014
•  Complete coverage of YARN

Modern Data Architecture
•  Effective use of data – especially BIG Data – is enhanced when data is
co-located, enabling discovery and mining of unanticipated patterns.
•  A “Data Lake” is the growing body of all data
–  Encompassing
more
than
a
single
warehouse

–  Data
can
con3nuously
stream
in
to
and
out
of
the
lake

Multi-Tenancy Requirements
Multi-Tenancy in one shared cluster
•  Multiple Business Units
•  Multiple Applications

Requirements
•  Shared Processing Capacity
•  Shared Storage Capacity
•  Data Access Security

Page
34

Multi-Tenancy: Capabilities
• Group and User:
– Use
of
Linux
and
HDFS
permissions
to
separate
ﬁles
and
directories
to

create
tenant
boundaries
–
can
be
integrated
with
LDAP
(or
AD)

• Security
– Used
to
enforce
tenant
boundaries
–
can
be
integrated
with
Kerberos

• Capacity:
– Storage
quota
setup
to
manage
consump3on

– Capacity
resource
scheduler
queues
to
balance
shared
processing

resources
between
tenants
–
Use
ACLs
to
deﬁne
tenants

Page
35

FUNCTION

Capacity

Sharing

FUNCTION

Capacity

Enforcement

FUNCTION

The Capacity Scheduler

Admin-‐
istraSon

• 

Queues
with
priori3es

• 

ACLs
for
job
submit
permissions

•  Max
capacity
per
queue

•  User
limits
within
queue

•  Monitoring
+
Management
Admin
ACLs

•  Capacity-‐Scheduler.xml

Page 36

Roadmap: Capacity Scheduling
Feature

DescripSon

CS
Pre-‐emp3on

•  Enhance
SLA
support

•  Re-‐claim
capacity
from
tasks
in
queue
that
have

been
over-‐scheduled

Queue
Hierarchy

•  Granular
conﬁgura3on
of
queues

•  Provide
constraints
across
a
set
of
queues

Node
Labels

•  Schedule
tasks
on
speciﬁc
cluster
nodes

•  Account
for
op3mized
hardware

Container
Isola3on

•  Stronger
isola3on
of
resources
for
each

container,
incorpora3ng
CPU

CPU
Scheduling

•  Schedule
and
share
CPU
core
capacity
across

tasks

37

Capacity Scheduler by example
Total
Cluster
capacity

• 20
slots

• 11
Mappers

• 9
Reducers

Queue
:
ProducSon

• Guarantee
70%
resources

• 14
slots
–
8M
/
6R

• Max
100%

Queue
:
Dev

• Guarantee
10%
resources

• 2
slots
–
1M
/
1R

• Max
50%

Queue
:
Default

• Guarantee
20%
resources

• 4
slots
–
2M
/
2R

• Max
80%

Hierarchical queues
root

Dev

10%

Eng

20%

Default

20%

Test

80%

Produc3on

70%

DevOps

10%

Reserved

20%

Prod

70%

P0

70%

P1

30%

39

CS: Example Queue Configuration
•  Default: 10 users | Ad-hoc BI Query jobs etc. | General User SLAs
•  Dev: 4 users | Ad-hoc Data Science Only (Pig+Mahout) | Lower SLAs
•  Applications: 2 users | Batch ETL and Report Generation jobs | Production SLAs
Yarn.scheduler.capacity.root.default

Capacity

ACLs

Min:
0.10
|
Max:
0.20
|
User
Limit:
0.8

‘Users’
group

Yarn.scheduler.capacity.root.dev

Capacity

ACLs

Min:
0.10
|
Max:
0.10
|
User
Limit:
0.5

‘Engineering’
group

Yarn.scheduler.capacity.root.producSon

Capacity

ACLs

Min:
0.20
|
Max:
0.70
|
User
Limit:
1.0

‘Applica3ons’
group

40

CS: Configuration
•  yarn.scheduler.capacity.root.default.acl_administer_jobs=*
•  yarn.scheduler.capacity.root.default.acl_submit_jobs=*
•  yarn.scheduler.capacity.root.default.capacity=100
•  yarn.scheduler.capacity.root.default.maximum-capacity=100
•  yarn.scheduler.capacity.root.default.user-limit-factor=1

• http://hadoop.apache.org/docs/r2.2.0/hadoop-yarn/
hadoop-yarn-site/CapacityScheduler.html

CS: Configuration
•  yarn.scheduler.capacity.root.default.acl_administer_jobs=Admin
•  yarn.scheduler.capacity.root.default.acl_submit_jobs=Users
•  yarn.scheduler.capacity.root.default.capacity=10
•  yarn.scheduler.capacity.root.default.maximum-capacity=20
•  yarn.scheduler.capacity.root.default.user-limit-factor=0.8
•  yarn.scheduler.capacity.root.dev.acl_administer_jobs=Engineering
•  yarn.scheduler.capacity.root.dev.acl_submit_jobs=Engineering
•  yarn.scheduler.capacity.root.dev.capacity=10
•  yarn.scheduler.capacity.root.dev.maximum-capacity=10
•  yarn.scheduler.capacity.root.dev.user-limit-factor=0.5
•  yarn.scheduler.capacity.root.production.acl_administer_jobs=Applications
•  yarn.scheduler.capacity.root.production.acl_submit_jobs=Admin
•  yarn.scheduler.capacity.root.production.capacity=20
•  yarn.scheduler.capacity.root.production.maximum-capacity=70
•  yarn.scheduler.capacity.root.production.user-limit-factor=1.0

•  Job 1 : Launch in production queue
–  Require
100
slots

–  Get
14
slots
at
a
3me

Cluster
resources

Produc3on

Development

Default

Idle

•  Job 1 : Running in Production queue
–  Using
14
slots

•  Job 2 : Schedule in Development queue
–  Require
50
slots

–  Get
4
slots
at
a
3me

Cluster
resources

Produc3on

Development

Default

Idle

•  Job 1 : Running in Production queue
–  98
complete,
only
2
slots
in
use
un3l
ﬁnish

•  Job 2 : Schedule in Development queue
–  Require
50
slots

–  S3ll
only
getng
4
slots
at
a
3me

Cluster
resources

Produc3on

Development

Default

Idle

Summary
•  YARN is the logical extension of Apache Hadoop
–  Complements
HDFS,
the
data
reservoir

•  Resource Management for the Enterprise Data Lake
–  Shared,
secure,
mul3-‐tenant
Hadoop

Allows for all processing in Hadoop

BATCH

INTERACTIVE

(MapReduce)

(Tez)

ONLINE

(HBase)

STREAMING

(Storm,
S4,…)

GRAPH

(Giraph)

IN-‐MEMORY

(Spark)

HPC
MPI

(OpenMPI)

OTHER

(Search)

(Weave…)

YARN
(Cluster
Resource
Management)

HDFS2
(Redundant,
Reliable
Storage)

Page
46

Your Fastest On-ramp to Enterprise Hadoop™!

hZp://hortonworks.com/products/hortonworks-‐sandbox/

The
Sandbox
lets
you
experience
Apache
Hadoop
from
the
convenience
of
your
own

laptop
–
no
data
center,
no
cloud
and
no
internet
connec3on
needed!

The
Hortonworks
Sandbox
is:

•  A
free
download:

hZp://hortonworks.com/products/hortonworks-‐sandbox/

•  A
complete,
self
contained
virtual
machine
with
Apache
Hadoop
pre-‐conﬁgured

•  A
personal,
portable
and
standalone
Hadoop
environment

•  A
set
of
hands-‐on,
step-‐by-‐step
tutorials
that
allow
you
to
learn
and
explore
Hadoop

Page
47

Ques3ons?

David
Kaiser

@ddkaiser



dkaiser@cdk.com


Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn

More Related Content

What's hot

Viewers also liked

Similar to Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn

Recently uploaded

Scale 12 x Efficient Multi-tenant Hadoop 2 Workloads with Yarn