Hadoop - Past, Present and Future - v2.0

©
2014
Trace3,
All
rights
reserved.

BIG
DATA
INTELLIGENCE
PRACTICE

HADOOP:

PAST,
PRESENT
AND
FUTURE

©
2014
Trace3,
All
rights
reserved.

Roadmap

1

~1
hour

1-‐
What
Makes
Up
Hadoop
1.x?

2-‐
What’s
New
In
Hadoop
2.x?

3-‐
The
Future
Of
Hadoop
…

©
2014
Trace3,
All
rights
reserved.

WHAT
MAKES
UP

HADOOP
1.0?

©
2014
Trace3,
All
rights
reserved.

What’s
a
“Node”?

Node
aka
Server

Compute

Storage

OperaVng
System

Memory

©
2014
Trace3,
All
rights
reserved.

Hadoop
1.0:
HDFS
+
MapReduce

4

NameNode

DataNode
/
TaskTracker
DataNode
/
TaskTracker

DataNode
/
TaskTracker
DataNode
/
TaskTracker

JobTracker

Client

1-‐1

1-‐2
1-‐3

©
2014
Trace3,
All
rights
reserved.

Hadoop
1.0:
HDFS
+
MapReduce

5

NameNode

DataNode
/
TaskTracker
DataNode
/
TaskTracker

DataNode
/
TaskTracker
DataNode
/
TaskTracker

JobTracker

Client

1-‐1
1-‐2

1-‐3

Reduce
Map

2-‐1
3-‐2
3-‐3
4-‐1

2-‐3
4-‐2
2-‐2
3-‐1
4-‐3

Reduce
Map

©
2014
Trace3,
All
rights
reserved.

MapReduce
v1
LimitaVons

6

Scalability

Maximum
cluster
size
is
4,000
nodes
and
maximum
concurrent
tasks
is
40,000

Availability

JobTracker
failure
kills
all
queued
and
running
jobs

Resources
ParVVoned
into
Map
and
Reduce

Hard
parGGoning
of
Map
and
Reduce
slots
led
to
low
resource
uVlizaVon

No
Support
for
Alternate
Paradigms
/
Services

Only
MapReduce
batch
jobs,
nothing
else

©
2014
Trace3,
All
rights
reserved.

Hadoop
1.0:
Single
Use
System

7

HADOOP
1.0

Single
Use
System

Batch
Apps

HDFS

(redundant,
reliable
storage)

MapReduce

(cluster
resource
management
and
data

processing)

Pig
Hive

©
2014
Trace3,
All
rights
reserved.

WHAT’S
NEW
IN

HADOOP
2.0?

©
2014
Trace3,
All
rights
reserved.

YARN

9

YARN
Replaces

MapReduce

Yet
Another
Resource
NegoVator

YARN
will
be
the
de-‐facto
distributed

operaVng
system
for
Big
Data

©
2014
Trace3,
All
rights
reserved.
10

Store
DATA
in
one
place

Interact
with
that
data
in
MULTIPLE
WAYS

with
Predictable
Performance
and
Quality
of
Service

ApplicaGons
Run
NaGvely
IN
Hadoop

HDFS2

(redundant,
reliable
storage)

YARN

(cluster
resource
management)

BATCH

(MapReduce)

INTERACTIVE

(Tez)

ONLINE

(HBase)

STREAMING

(DataTorrent)

GRAPH

(Giraph)

YARN:
No
Longer
Just
Batch
Apps

©
2014
Trace3,
All
rights
reserved.
11

YARN:
ApplicaVons

Running
all
on
the
same
Hadoop
cluster
to
give

applicaVons
access
to
all
the
same
source
data!

MapReduce
v2

Stream
Processing

Master-‐Worker
Online

In-‐Memory

Apache
Storm

©
2014
Trace3,
All
rights
reserved.
12

YARN:
Quickly
Maturing

2010

2011

2012

2013

2014

Today

Conceived
at
Yahoo!

Alpha
Releases
–
2.0

Beta
Releases
–
2.1

GA
Released
–
2.2

100,000+
nodes,
400,000+
jobs
daily

10
million+
hours
of
compute
daily

Version
2.3

Version
2.4

©
2014
Trace3,
All
rights
reserved.
13

YARN:
Dr.
Evil
Approved

©
2014
Trace3,
All
rights
reserved.
14

YARN:
What
Has
Changed?

YARN
MRv1

RM

ResourceManager

AM
ApplicaVonMaster

JT

JobTracker

Scheduler
Scheduler

NM
NodeManager

TT
TaskTracker

Container

Map
&

Reduce

Slot

ResourceManager

Scheduler

JobTracker

Scheduler

NodeManager

ApplicaVonMaster

TaskTracker

Map
Reduce

NodeManager

Container
Container

TaskTracker

Map
Reduce

©
2014
Trace3,
All
rights
reserved.

The
6
Beneﬁts
Of
YARN

15

• Scale

• New
programming
models

and
services

• Improved
cluster
uVlizaVon

• Agility

• Backwards
compaVble
with

MapReduce
v1

• Mixed
workloads
on
the

same
source
of
data

©
2014
Trace3,
All
rights
reserved.

THE
FUTURE

OF
HADOOP

©
2014
Trace3,
All
rights
reserved.

SQL
on
Hadoop

Speed

Deliver
interacGve
query
performance.

SQL

Support
array
of
SQL
semanGcs
for
analyGc

applicaGons
running
against
Hadoop.

Scale

SQL
interface
to
Hadoop
designed
for
queries

that
scale
from
Terabytes
to
Petabytes

©
2014
Trace3,
All
rights
reserved.

SQL
on
Hadoop

Hive
on
Apache
Tez

Hortonworks
HDP2

Hive
on
Apache
Spark

Cloudera
CDH5

Apache
Drill

MapR
M7

Cloudera
Impala

Cloudera
CDH5

Pivotal
HAWQ

Pivotal
Big
Data
Suite

©
2014
Trace3,
All
rights
reserved.

HOYA:
HBase
(NoSQL)
on
YARN

Dynamic
Scaling

On-‐demand
cluster
size.
Increase
and
decrease

the
size
with
load.

Easier
Deployment

APIs
to
create,
start,
stop
and
delete
HBase

clusters.

Availability

Recover
from
Region
Server
loss
with
a
new

container.

©
2014
Trace3,
All
rights
reserved.

Microsoo
REEF

Machine
Learning

Framework
well
suited
for
building
machine

learning
jobs.

Scalable
/
Fault
Tolerant

Makes
it
easy
to
implement
scalable,
fault-‐
tolerant
runGme
environments
for
a
range
of

computaGonal
models.

Maintain
State

Users
can
build
jobs
that
uGlize
data
from

where
it’s
needed
and
also
maintain
state
a_er

jobs
are
done.

Retainable

Evaluator

ExecuGon

Framework

©
2014
Trace3,
All
rights
reserved.

Hadoop
Roadmap

• Apache
Hadoop
2.5

–  NodeManager
Restart
w/o
disrupGon

–  Dynamic
Resource
ConﬁguraGon

• Apache
Hadoop
2.6

–  Memory
As
Storage
Tier

–  Support
For
Docker
Containers

Q3
2014

Q4
2014

Hadoop - Past, Present and Future - v2.0

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Hadoop - Past, Present and Future - v2.0

Similar to Hadoop - Past, Present and Future - v2.0 (20)

More from Big Data Joe™ Rossi

More from Big Data Joe™ Rossi (6)

Recently uploaded

Recently uploaded (20)

Hadoop - Past, Present and Future - v2.0