Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition

HADOOP:
PAST,
PRESENT
AND
FUTURE
BIG
DATA
INTELLIGENCE
PRACTICE
©
2014
Trace3,
All
rights
reserved.

Roadmap
©
2014
Trace3,
All
rights
reserved.
1
~1
hour
1-‐
What
Makes
Up
Hadoop
1.x?
2-‐
What’s
New
In
Hadoop
2.x?
3-‐
The
Future
Of
Hadoop
…

WHAT
MAKES
UP
HADOOP
1.0?
©
2014
Trace3,
All
rights
reserved.

What’s
a
“Node”?
Processes
/
Daemons
/
Services
©
2014
Trace3,
All
rights
reserved.
Node
aka
Server
OperaZng
System
Compute
Storage
Memory

Hadoop
1.0:
HDFS
+
MapReduce
©
2014
Trace3,
All
rights
reserved.
4
NameNode
JobTracker
DataNode
/
TaskTracker
DataNode
/
TaskTracker
DataNode
/
TaskTracker
DataNode
/
TaskTracker
Client
1-‐1
11-‐-‐23

Hadoop
1.0:
HDFS
+
MapReduce
©
2014
Trace3,
All
rights
reserved.
5
NameNode
JobTracker
DataNode
/
TaskTracker
DataNode
/
TaskTracker
2-‐1
3-‐2
Map
Reduce
DataNode
/
TaskTracker
DataNode
/
TaskTracker
Client
1-‐1
1-‐2
1-‐3
Map
Reduce
3-‐3
4-‐1
2-‐3
4-‐2
2-‐2
3-‐1
4-‐3

MapReduce
v1
LimitaZons
©
2014
Trace3,
All
rights
reserved.
6
Scalability
Maximum
cluster
size
is
4,000
nodes
and
maximum
concurrent
tasks
is
40,000
Availability
JobTracker
failure
kills
all
queued
and
running
jobs
Resources
ParZZoned
into
Map
and
Reduce
Hard
parGGoning
of
Map
and
Reduce
slots
led
to
low
resource
uZlizaZon
No
Support
for
Alternate
Paradigms
/
Services
Only
MapReduce
batch
jobs,
nothing
else

Hadoop
1.0:
Single
Use
System
Pig
Hive
MapReduce
(cluster
resource
management
and
data
processing)
©
2014
Trace3,
All
rights
reserved.
7
HADOOP
1.0
Single
Use
System
Batch
Apps
HDFS
(redundant,
reliable
storage)

WHAT’S
NEW
IN
HADOOP
2.0?
©
2014
Trace3,
All
rights
reserved.

YARN
©
2014
Trace3,
All
rights
reserved.
9
YARN
Replaces
MapReduce
Yet
Another
Resource
NegoZator
YARN
will
be
the
de-‐facto
distributed
operaZng
system
for
Big
Data

YARN
=
BIG
DATA
©
2014
Trace3,
All
rights
reserved.
10

YARN:
No
Longer
Just
Batch
Apps
©
2014
Trace3,
All
11
rights
reserved.
Store
DATA
in
one
place
Interact
with
that
data
in
MULTIPLE
WAYS
with
Predictable
Performance
and
Quality
of
Service
ApplicaGons
Run
NaGvely
IN
Hadoop
YARN
(cluster
resource
management)
HDFS2
(redundant,
reliable
storage)
BATCH
(MapReduce)
INTERACTIVE
(Tez)
ONLINE
(HBase)
STREAMING
(DataTorrent)
GRAPH
(Giraph)

YARN:
ApplicaZons
Online
Running
all
on
the
same
Hadoop
cluster
to
give
applicaZons
access
to
all
the
same
source
data!
©
2014
Trace3,
All
12
rights
reserved.
MapReduce
v2
Real-‐Time
Stream
Processing
Master-‐Worker
In-‐Memory
Apache
Storm

YARN:
Quickly
Maturing
©
2014
Trace3,
All
13
Version
2.3
Version
2.5
rights
reserved.
2010
2011
2012
2013
2014
Today
Conceived
at
Yahoo!
Alpha
Releases
–
2.0
Beta
Releases
–
2.1
GA
Released
–
2.2
Version
2.4
200,000+
nodes,
800,000+
jobs
daily
10
million+
hours
of
compute
daily

YARN:
What
Has
Changed?
©
2014
Trace3,
All
14
rights
reserved.
YARN
MRv1
RM
ResourceManager
AM
ApplicaZonMaster
JT
JobTracker
Scheduler
Scheduler
NM
TT
NodeManager
TaskTracker
Container
Map
&
Reduce
Slot
ResourceManager
Scheduler
JobTracker
Scheduler
NodeManager
ApplicaZonMaster
TaskTracker
Map
Reduce
NodeManager
Container
Container
TaskTracker
Map
Reduce

The
6
Benefits
Of
YARN
©
2014
Trace3,
All
rights
reserved.
15
• Scale
• New
programming
models
and
services
• Improved
cluster
uZlizaZon
• Agility
• Backwards
compaZble
with
MapReduce
v1
• Mixed
workloads
on
the
same
source
of
data

THE
FUTURE
OF
HADOOP
©
2014
Trace3,
All
rights
reserved.

SQL
on
Hadoop
Speed
Deliver
interacGve
query
performance.
SQL
Support
array
of
SQL
semanGcs
for
analyGc
applicaGons
running
against
Hadoop.
Scale
SQL
interface
to
Hadoop
designed
for
queries
that
scale
from
Terabytes
to
Petabytes
©
2014
Trace3,
All
rights
reserved.

SQL
on
Hadoop
Hive
on
Apache
Tez
Hortonworks
HDP2
Hive
on
Apache
Spark
Cloudera
CDH5
Apache
Drill
MapR
M7
Cloudera
Impala
Cloudera
CDH5
Pivotal
HAWQ
Pivotal
Big
Data
Suite
©
2014
Trace3,
All
rights
reserved.

Apache
Spark
©
2014
Trace3,
All
rights
reserved.
Apache
Spark
(Databricks)
YARN
(cluster
resource
management)
HDFS2
(redundant,
reliable
storage)
Programming
Languages
Java,
Scala,
Python,
R*
InteracZve
Shell
Ability
to
write
code
and
get
output.
Faster
by
~100x
Due
how
it
handles
data
in
memory.

HOYA:
HBase
(NoSQL)
on
YARN
Dynamic
Scaling
On-‐demand
cluster
size.
Increase
and
decrease
the
size
with
load.
Easier
Deployment
APIs
to
create,
start,
stop
and
delete
HBase
clusters.
Availability
Recover
from
Region
Server
loss
with
a
new
container.
©
2014
Trace3,
All
rights
reserved.

Apache
REEF
Machine
Learning
Framework
well
suited
for
building
machine
learning
jobs.
Scalable
/
Fault
Tolerant
Makes
it
easy
to
implement
scalable,
fault-‐
tolerant
runGme
environments
for
a
range
of
computaGonal
models.
Maintain
State
Users
can
build
jobs
that
uGlize
data
from
where
it’s
needed
and
also
maintain
state
a`er
jobs
are
done.
©
2014
Trace3,
All
rights
reserved.
Retainable
Evaluator
ExecuGon
Framework

Hadoop
Roadmap
• Apache
Hadoop
2.5
– NodeManager
©
2014
Trace3,
All
rights
reserved.
Restart
w/o
disrupGon
• Apache
Hadoop
2.6
– Memory
As
Storage
Tier
– Dynamic
Resource
ConfiguraGon
– Support
For
Docker
Containers
Q3
2014
Q4
2014

Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition

Similar to Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition (20)

More from Big Data Joe™ Rossi

More from Big Data Joe™ Rossi (6)

Recently uploaded

Recently uploaded (20)

Hadoop: Past, Present and Future - v2.2 - SQLSaturday #326 - Tampa BA Edition