The Evolution and Future of Hadoop Storage （Hadoop Conference Japan 2016キーノート講演資料）

1
©
Cloudera,
Inc.
All
rights
reserved.

The
Evolu:on
and
Future
of

Hadoop
Storage

Todd
Lipcon
|
Engineer
at
Cloudera

TwiCer
@tlipcon
|
todd@cloudera.com

2
©
Cloudera,
Inc.
All
rights
reserved.

Introduc:on
(the
evolu:on
and
future
of
me)

Mailing
list
messages
sent
by
Todd
Lipcon

Spoke
at
HCJ
2011!

3
©
Cloudera,
Inc.
All
rights
reserved.

Introduc:on
(the
evolu:on
and
future
of
me)

Mailing
list
messages
sent
by
Todd
Lipcon

-‐ Early
user
of
Hadoop

-‐ Joined
Cloudera
as

So4ware
Engineer

Spoke
at
HCJ
2011!

4
©
Cloudera,
Inc.
All
rights
reserved.

Introduc:on
(the
evolu:on
and
future
of
me)

Mailing
list
messages
sent
by
Todd
Lipcon

-‐ Early
user
of
Hadoop

-‐ Joined
Cloudera
as

So4ware
Engineer
-‐  Work
on
HDFS,
HBase,

MR
(HA,
performance,

stability,
etc)

-‐  Became
a
commiFer,

PMC
member,
and
ASF

Member

Spoke
at
HCJ
2011!

5
©
Cloudera,
Inc.
All
rights
reserved.

Introduc:on
(the
evolu:on
and
future
of
me)

Mailing
list
messages
sent
by
Todd
Lipcon

-‐ Early
user
of
Hadoop

-‐ Joined
Cloudera
as

So4ware
Engineer

-‐  Founded
the
Kudu

project
within

Cloudera

-‐  Secretly
developing

with
a
small
team

for
3
years

-‐  Work
on
HDFS,
HBase,

MR
(HA,
performance,

stability,
etc)

-‐  Became
a
commiFer,

PMC
member,
and
ASF

Member

Spoke
at
HCJ
2011!

6
©
Cloudera,
Inc.
All
rights
reserved.

Introduc:on
(the
evolu:on
and
future
of
me)

Mailing
list
messages
sent
by
Todd
Lipcon

-‐ Early
user
of
Hadoop

-‐ Joined
Cloudera
as

So4ware
Engineer

-‐  Founded
the
Kudu

project
within

Cloudera

-‐  Secretly
developing

with
a
small
team

for
3
years

-‐  Kudu
announced

and
contributed
to

the
ASF
as
Apache

Kudu
(incubaMng)

-‐  Work
on
HDFS,
HBase,

MR
(HA,
performance,

stability,
etc)

-‐  Became
a
commiFer,

PMC
member,
and
ASF

Member

Spoke
at
HCJ
2011!

7
©
Cloudera,
Inc.
All
rights
reserved.

誕生日おめでとう

ございます。

Hadoop:
the
last
10
years

8
©
Cloudera,
Inc.
All
rights
reserved.

9
©
Cloudera,
Inc.
All
rights
reserved.

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Evolu:on
of
the
Hadoop
Plagorm

2006
2008
2009
2010
2011
2012
2013

Core
Hadoop

(HDFS,

MapReduce)

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

The
stack
is
con:nually
evolving
and
growing!

2007

Solr

Pig

Core
Hadoop

Ibis

Flink

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

2014-‐15

10
©
Cloudera,
Inc.
All
rights
reserved.

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Basics

Evolu:on
of
the
Hadoop
Plagorm

2006
2008
2009
2010
2011
2012
2013

Core
Hadoop

(HDFS,

MapReduce)

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

The
stack
is
con:nually
evolving
and
growing!

2007

Solr

Pig

Core
Hadoop

Ibis

Flink

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

2014-‐15

-‐ Very
basic

Hadoop

-‐ Batch
processes

only

-‐ Not
stable,
fast,

or
featureful

11
©
Cloudera,
Inc.
All
rights
reserved.

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Basics

Evolu:on
of
the
Hadoop
Plagorm

2006
2008
2009
2010
2011
2012
2013

Core
Hadoop

(HDFS,

MapReduce)

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

The
stack
is
con:nually
evolving
and
growing!

2007

Solr

Pig

Core
Hadoop

Ibis

Flink

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

2014-‐15

-‐ Very
basic

Hadoop

-‐ Batch
processes

only

-‐ Not
stable,
fast,

or
featureful

-‐ Expanding
feature
set

-‐ Basic
security,
HA,

stability

-‐ Commercial
distribuMons

Produc:on

12
©
Cloudera,
Inc.
All
rights
reserved.

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Basics

Evolu:on
of
the
Hadoop
Plagorm

2006
2008
2009
2010
2011
2012
2013

Core
Hadoop

(HDFS,

MapReduce)

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

Core
Hadoop

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

The
stack
is
con:nually
evolving
and
growing!

2007

Solr

Pig

Core
Hadoop

Ibis

Flink

Parquet

Sentry

Spark

Tez

Impala

Ka]a

Drill

Flume

Bigtop

Oozie

MRUnit

HCatalog

Hue

Sqoop

Whirr

Avro

Hive

Mahout

HBase

ZooKeeper

Solr

Pig

YARN

Core
Hadoop

2014-‐15

Enterprise

-‐ Security

-‐ Performance

-‐ Fast
full-‐featured
SQL

-‐ Very
basic

Hadoop

-‐ Batch
processes

only

-‐ Not
stable,
fast,

or
featureful

-‐ Expanding
feature
set

-‐ Basic
security,
HA,

stability

-‐ Commercial
distribuMons

Produc:on

13
©
Cloudera,
Inc.
All
rights
reserved.

Evolu:on
of
Storage
(Basics
/
2006-‐2007)

•  HDFS
only

•  Support
basic
batch
workloads.
No
HA.

•  Performance
not
important

• MapReduce
is
too
slow,
anyway!

• Batch
only

•  Early
Adopters
(FaceBook,
Yahoo,
etc)

14
©
Cloudera,
Inc.
All
rights
reserved.

Evolu:on
of
Storage
(Produc:on
/
2008-‐2011)

•  HDFS
evolves
to
add
high
availability
and
security

• Focused
on
batch
workloads

• Ineﬃcient
ﬁle
formats
commonly
used
(text)

• Query
engines
are
slow!
No
need
for
beCer
performance

•  Apache
HBase
becomes
an
Apache
Top-‐Level
Project
(TLP)

• Introduces
fast
random
access

• Early
adopters
experiment
with
new
use
cases

• Deployed
at
Facebook
and
other
large
companies

15
©
Cloudera,
Inc.
All
rights
reserved.

Evolu:on
of
Storage
(Enterprise
/
2012-‐2015)

•  Reliable
core
brings
new
users

• Enterprise
features:
access
control,
disaster
recovery,
encryp:on

•  Introduc:on
of
fast
query
engines

• 10-‐100x
faster
SQL-‐on-‐Hadoop
(Impala,
Spark,
etc.)

• Pushes
HDFS
performance
improvements:
caching,
CPU
eﬃciency,
columnar

ﬁle
formats
(Apache
Parquet,
ORCFile)

•  HBase
evolves
to
1.0

• Improved
stability,
scalability,
security

• Good
random
access
-‐
not
fast
for
SQL
analy:cs.

•  IniMal
support
for
cloud
storage

• Rising
adop:on
of
AWS,
Azure,
Google
Compute,
etc.

17
©
Cloudera,
Inc.
All
rights
reserved.

2016-‐2020
(Next-‐gen):
storage
hardware

•  Spinning
disk
-‐>
solid
state
storage

• NAND
ﬂash:
Up
to
450k
read
250k
write
iops,
about
2GB/sec
read
and
1.5GB/
sec
write
throughput,
at
a
price
of
less
than
$3/GB
and
dropping
fast

• 3D
XPoint
memory
(1000x
faster
than
NAND,
cheaper
than
RAM)

•  RAM
is
cheaper
and
more
abundant:

• 64-‐>128-‐>256GB
over
last
few
years

•  HDFS
and
HBase
were
not
designed
for
next-‐genera:on
hardware.

• Not
using
full
speed
of
ﬂash
or
RAM
size

18
©
Cloudera,
Inc.
All
rights
reserved.

2016-‐2020
(Next-‐gen):
gaps
in
capabili:es

HDFS
good
at:

•  Batch
ingest
only
(eg
hourly)

•  Efficiently
scanning
large
amounts

of
data
(analy:cs)

HBase
good
at:

•  Efficiently
finding
and
wri:ng

individual
rows

•  Making
data
mutable

Gaps
exist
when
these
proper:es

are
needed
simultaneously

19
©
Cloudera,
Inc.
All
rights
reserved.

•  High
throughput
for
big
scans

Goal:
Within
2x
of
Parquet

•  Low-‐latency
for
short
accesses

Goal:
1ms
read/write
on
SSD

•  RelaMonal
data
model

•  SQL
queries
are
easy

•  “NoSQL”
style
scan/insert/update
(Java/C++
client)

•  Expands
Hadoop
use
cases

•  Real-‐:me
analy:cs
and
:me
series

•  Internet-‐of-‐things

2016-‐2020
(Next-‐gen):
Apache
Kudu
(incuba:ng)

20
©
Cloudera,
Inc.
All
rights
reserved.

Kudu:
Open
source,
scalable
and
fast
tabular
storage

•  Scalable

• Designed
to
scale
to
1000s
of
nodes,
tens
of
PBs

•  Fast

• Designed
for
modern
hardware

• Millions
of
read/write
opera:ons
per
second
across
cluster

• MulMple
GB/second
read
throughput
per
node

•  Tabular

• Store
tables
like
a
normal
database
(support
SQL,
Spark,
etc)

• NoSQL-‐style
access
to
100+
billion
row
tables
(Java/C++/Python
APIs)

21
©
Cloudera,
Inc.
All
rights
reserved.

2016-‐2020
(Next
gen):
Predic:ons

•  Kudu
will
evolve
an
enterprise
feature
set
and
enable
simple
high-‐performance

real-‐:me
architectures

• Increasing
ability
to
migrate
tradi:onal
applica:ons

•  HDFS
and
HBase
will
con:nue
to
innovate
and
adapt
to
next
genera:on

hardware

• Steady
improvements
in
performance,
eﬃciency,
and
scalability
(e.g.
erasure

coding)

•  Cloud
storage
will
become
increasingly
important

• Hadoop
ecosystem
will
evolve
to
coexist

The Evolution and Future of Hadoop Storage （Hadoop Conference Japan 2016キーノート講演資料）

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to The Evolution and Future of Hadoop Storage （Hadoop Conference Japan 2016キーノート講演資料）

Similar to The Evolution and Future of Hadoop Storage （Hadoop Conference Japan 2016キーノート講演資料） (20)

More from Hadoop / Spark Conference Japan

More from Hadoop / Spark Conference Japan (10)

Recently uploaded

Recently uploaded (20)

The Evolution and Future of Hadoop Storage （Hadoop Conference Japan 2016キーノート講演資料）