SlideShare a Scribd company logo
1 of 22
Download to read offline
1	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
The	
  Evolu:on	
  and	
  Future	
  of	
  
Hadoop	
  Storage	
  
Todd	
  Lipcon	
  |	
  Engineer	
  at	
  Cloudera	
  
TwiCer	
  @tlipcon	
  |	
  todd@cloudera.com	
  
	
  
2	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Introduc:on	
  (the	
  evolu:on	
  and	
  future	
  of	
  me)	
  
Mailing	
  list	
  messages	
  sent	
  by	
  Todd	
  Lipcon	
  
Spoke	
  at	
  HCJ	
  2011!	
  
3	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Introduc:on	
  (the	
  evolu:on	
  and	
  future	
  of	
  me)	
  
Mailing	
  list	
  messages	
  sent	
  by	
  Todd	
  Lipcon	
  
-­‐ Early	
  user	
  of	
  Hadoop	
  
-­‐ Joined	
  Cloudera	
  as	
  
So4ware	
  Engineer	
  
Spoke	
  at	
  HCJ	
  2011!	
  
4	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Introduc:on	
  (the	
  evolu:on	
  and	
  future	
  of	
  me)	
  
Mailing	
  list	
  messages	
  sent	
  by	
  Todd	
  Lipcon	
  
-­‐ Early	
  user	
  of	
  Hadoop	
  
-­‐ Joined	
  Cloudera	
  as	
  
So4ware	
  Engineer	
   -­‐  Work	
  on	
  HDFS,	
  HBase,	
  
MR	
  (HA,	
  performance,	
  
stability,	
  etc)	
  
-­‐  Became	
  a	
  commiFer,	
  
PMC	
  member,	
  and	
  ASF	
  
Member	
  
Spoke	
  at	
  HCJ	
  2011!	
  
5	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Introduc:on	
  (the	
  evolu:on	
  and	
  future	
  of	
  me)	
  
Mailing	
  list	
  messages	
  sent	
  by	
  Todd	
  Lipcon	
  
-­‐ Early	
  user	
  of	
  Hadoop	
  
-­‐ Joined	
  Cloudera	
  as	
  
So4ware	
  Engineer	
  
-­‐  Founded	
  the	
  Kudu	
  
project	
  within	
  
Cloudera	
  
-­‐  Secretly	
  developing	
  
with	
  a	
  small	
  team	
  
for	
  3	
  years	
  
-­‐  Work	
  on	
  HDFS,	
  HBase,	
  
MR	
  (HA,	
  performance,	
  
stability,	
  etc)	
  
-­‐  Became	
  a	
  commiFer,	
  
PMC	
  member,	
  and	
  ASF	
  
Member	
  
Spoke	
  at	
  HCJ	
  2011!	
  
6	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Introduc:on	
  (the	
  evolu:on	
  and	
  future	
  of	
  me)	
  
Mailing	
  list	
  messages	
  sent	
  by	
  Todd	
  Lipcon	
  
-­‐ Early	
  user	
  of	
  Hadoop	
  
-­‐ Joined	
  Cloudera	
  as	
  
So4ware	
  Engineer	
  
-­‐  Founded	
  the	
  Kudu	
  
project	
  within	
  
Cloudera	
  
-­‐  Secretly	
  developing	
  
with	
  a	
  small	
  team	
  
for	
  3	
  years	
  
-­‐  Kudu	
  announced	
  
and	
  contributed	
  to	
  
the	
  ASF	
  as	
  Apache	
  
Kudu	
  (incubaMng)	
  
-­‐  Work	
  on	
  HDFS,	
  HBase,	
  
MR	
  (HA,	
  performance,	
  
stability,	
  etc)	
  
-­‐  Became	
  a	
  commiFer,	
  
PMC	
  member,	
  and	
  ASF	
  
Member	
  
Spoke	
  at	
  HCJ	
  2011!	
  
7	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
誕生日おめでとう	
  
ございます。	
  
	
  
Hadoop:	
  the	
  last	
  10	
  years	
  
8	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
9	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Evolu:on	
  of	
  the	
  Hadoop	
  Plagorm	
  
	
  
2006	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
  
Core	
  Hadoop	
  
	
  (HDFS,	
  	
  
MapReduce)	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
The	
  stack	
  is	
  con:nually	
  evolving	
  and	
  growing!	
  
2007	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
	
  
	
  
Ibis	
  
Flink	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
	
  
2014-­‐15	
  
10	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Basics	
  
Evolu:on	
  of	
  the	
  Hadoop	
  Plagorm	
  
	
  
2006	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
  
Core	
  Hadoop	
  
	
  (HDFS,	
  	
  
MapReduce)	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
The	
  stack	
  is	
  con:nually	
  evolving	
  and	
  growing!	
  
2007	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
	
  
	
  
Ibis	
  
Flink	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
	
  
2014-­‐15	
  
-­‐ Very	
  basic	
  
Hadoop	
  
-­‐ Batch	
  processes	
  
only	
  
-­‐ Not	
  stable,	
  fast,	
  
or	
  featureful	
  
11	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Basics	
  
Evolu:on	
  of	
  the	
  Hadoop	
  Plagorm	
  
	
  
2006	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
  
Core	
  Hadoop	
  
	
  (HDFS,	
  	
  
MapReduce)	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
The	
  stack	
  is	
  con:nually	
  evolving	
  and	
  growing!	
  
2007	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
	
  
	
  
Ibis	
  
Flink	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
	
  
2014-­‐15	
  
-­‐ Very	
  basic	
  
Hadoop	
  
-­‐ Batch	
  processes	
  
only	
  
-­‐ Not	
  stable,	
  fast,	
  
or	
  featureful	
  
-­‐ Expanding	
  feature	
  set	
  
-­‐ Basic	
  security,	
  HA,	
  
stability	
  
-­‐ Commercial	
  distribuMons	
  
	
  
Produc:on	
  
12	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Basics	
  
Evolu:on	
  of	
  the	
  Hadoop	
  Plagorm	
  
	
  
2006	
   2008	
   2009	
   2010	
   2011	
   2012	
   2013	
  
Core	
  Hadoop	
  
	
  (HDFS,	
  	
  
MapReduce)	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
The	
  stack	
  is	
  con:nually	
  evolving	
  and	
  growing!	
  
2007	
  
Solr	
  
Pig	
  
Core	
  Hadoop	
  
	
  
	
  
Ibis	
  
Flink	
  
Parquet	
  
Sentry	
  
Spark	
  
Tez	
  
Impala	
  
Ka]a	
  
Drill	
  
Flume	
  
Bigtop	
  
Oozie	
  
MRUnit	
  
HCatalog	
  
Hue	
  
Sqoop	
  
Whirr	
  
Avro	
  
Hive	
  
Mahout	
  
HBase	
  
ZooKeeper	
  
Solr	
  
Pig	
  
YARN	
  
Core	
  Hadoop	
  
	
  
2014-­‐15	
  
Enterprise	
  
-­‐ Security	
  
-­‐ Performance	
  
-­‐ Fast	
  full-­‐featured	
  SQL	
  	
  
-­‐ Very	
  basic	
  
Hadoop	
  
-­‐ Batch	
  processes	
  
only	
  
-­‐ Not	
  stable,	
  fast,	
  
or	
  featureful	
  
-­‐ Expanding	
  feature	
  set	
  
-­‐ Basic	
  security,	
  HA,	
  
stability	
  
-­‐ Commercial	
  distribuMons	
  
	
  
Produc:on	
  
13	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Evolu:on	
  of	
  Storage	
  (Basics	
  /	
  2006-­‐2007)	
  
•  HDFS	
  only	
  
•  Support	
  basic	
  batch	
  workloads.	
  No	
  HA.	
  
•  Performance	
  not	
  important	
  
• MapReduce	
  is	
  too	
  slow,	
  anyway!	
  
• Batch	
  only	
  
•  Early	
  Adopters	
  (FaceBook,	
  Yahoo,	
  etc)	
  
14	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Evolu:on	
  of	
  Storage	
  (Produc:on	
  /	
  2008-­‐2011)	
  
•  HDFS	
  evolves	
  to	
  add	
  high	
  availability	
  and	
  security	
  
• Focused	
  on	
  batch	
  workloads	
  
• Inefficient	
  file	
  formats	
  commonly	
  used	
  (text)	
  
• Query	
  engines	
  are	
  slow!	
  No	
  need	
  for	
  beCer	
  performance	
  
•  Apache	
  HBase	
  becomes	
  an	
  Apache	
  Top-­‐Level	
  Project	
  (TLP)	
  
• Introduces	
  fast	
  random	
  access	
  
• Early	
  adopters	
  experiment	
  with	
  new	
  use	
  cases	
  
• Deployed	
  at	
  Facebook	
  and	
  other	
  large	
  companies	
  
15	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Evolu:on	
  of	
  Storage	
  (Enterprise	
  /	
  2012-­‐2015)	
  
•  Reliable	
  core	
  brings	
  new	
  users	
  
• Enterprise	
  features:	
  access	
  control,	
  disaster	
  recovery,	
  encryp:on	
  
•  Introduc:on	
  of	
  fast	
  query	
  engines	
  
• 10-­‐100x	
  faster	
  SQL-­‐on-­‐Hadoop	
  (Impala,	
  Spark,	
  etc.)	
  
• Pushes	
  HDFS	
  performance	
  improvements:	
  caching,	
  CPU	
  efficiency,	
  columnar	
  
file	
  formats	
  (Apache	
  Parquet,	
  ORCFile)	
  
•  HBase	
  evolves	
  to	
  1.0	
  
• Improved	
  stability,	
  scalability,	
  security	
  
• Good	
  random	
  access	
  -­‐	
  not	
  fast	
  for	
  SQL	
  analy:cs.	
  
•  IniMal	
  support	
  for	
  cloud	
  storage	
  
• Rising	
  adop:on	
  of	
  AWS,	
  Azure,	
  Google	
  Compute,	
  etc.	
  
16	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
So	
  what’s	
  the	
  next	
  genera:on?	
  
2016	
  and	
  beyond	
  
17	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
2016-­‐2020	
  (Next-­‐gen):	
  storage	
  hardware	
  
•  Spinning	
  disk	
  -­‐>	
  solid	
  state	
  storage	
  
• NAND	
  flash:	
  Up	
  to	
  450k	
  read	
  250k	
  write	
  iops,	
  about	
  2GB/sec	
  read	
  and	
  1.5GB/
sec	
  write	
  throughput,	
  at	
  a	
  price	
  of	
  less	
  than	
  $3/GB	
  and	
  dropping	
  fast	
  
• 3D	
  XPoint	
  memory	
  (1000x	
  faster	
  than	
  NAND,	
  cheaper	
  than	
  RAM)	
  
•  RAM	
  is	
  cheaper	
  and	
  more	
  abundant:	
  
• 64-­‐>128-­‐>256GB	
  over	
  last	
  few	
  years	
  
•  HDFS	
  and	
  HBase	
  were	
  not	
  designed	
  for	
  next-­‐genera:on	
  hardware.	
  
• Not	
  using	
  full	
  speed	
  of	
  flash	
  or	
  RAM	
  size	
  	
  
18	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
2016-­‐2020	
  (Next-­‐gen):	
  gaps	
  in	
  capabili:es	
  
HDFS	
  good	
  at:	
  
•  Batch	
  ingest	
  only	
  (eg	
  hourly)	
  
•  Efficiently	
  scanning	
  large	
  amounts	
  
of	
  data	
  (analy:cs)	
  
HBase	
  good	
  at:	
  
•  Efficiently	
  finding	
  and	
  wri:ng	
  
individual	
  rows	
  
•  Making	
  data	
  mutable	
  
	
  
Gaps	
  exist	
  when	
  these	
  proper:es	
  
are	
  needed	
  simultaneously	
  
	
  
19	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
•  High	
  throughput	
  for	
  big	
  scans	
  
Goal:	
  Within	
  2x	
  of	
  Parquet	
  
	
  
•  Low-­‐latency	
  for	
  short	
  accesses	
  	
  
	
  	
  	
  Goal:	
  1ms	
  read/write	
  on	
  SSD	
  
	
  
•  RelaMonal	
  data	
  model	
  
•  SQL	
  queries	
  are	
  easy	
  
•  “NoSQL”	
  style	
  scan/insert/update	
  (Java/C++	
  client)	
  
•  Expands	
  Hadoop	
  use	
  cases	
  
•  Real-­‐:me	
  analy:cs	
  and	
  :me	
  series	
  
•  Internet-­‐of-­‐things	
  
2016-­‐2020	
  (Next-­‐gen):	
  Apache	
  Kudu	
  (incuba:ng)	
  
20	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
Kudu:	
  Open	
  source,	
  scalable	
  and	
  fast	
  tabular	
  storage	
  
•  Scalable	
  
• Designed	
  to	
  scale	
  to	
  1000s	
  of	
  nodes,	
  tens	
  of	
  PBs	
  
•  Fast	
  
• Designed	
  for	
  modern	
  hardware	
  
• Millions	
  of	
  read/write	
  opera:ons	
  per	
  second	
  across	
  cluster	
  
• MulMple	
  GB/second	
  read	
  throughput	
  per	
  node	
  
•  Tabular	
  
• Store	
  tables	
  like	
  a	
  normal	
  database	
  (support	
  SQL,	
  Spark,	
  etc)	
  
• NoSQL-­‐style	
  access	
  to	
  100+	
  billion	
  row	
  tables	
  (Java/C++/Python	
  APIs)	
  
21	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
2016-­‐2020	
  (Next	
  gen):	
  Predic:ons	
  
•  Kudu	
  will	
  evolve	
  an	
  enterprise	
  feature	
  set	
  and	
  enable	
  simple	
  high-­‐performance	
  
real-­‐:me	
  architectures	
  
• Increasing	
  ability	
  to	
  migrate	
  tradi:onal	
  applica:ons	
  
•  HDFS	
  and	
  HBase	
  will	
  con:nue	
  to	
  innovate	
  and	
  adapt	
  to	
  next	
  genera:on	
  
hardware	
  
• Steady	
  improvements	
  in	
  performance,	
  efficiency,	
  and	
  scalability	
  (e.g.	
  erasure	
  
coding)	
  
	
  
•  Cloud	
  storage	
  will	
  become	
  increasingly	
  important	
  
• Hadoop	
  ecosystem	
  will	
  evolve	
  to	
  coexist	
  
22	
  ©	
  Cloudera,	
  Inc.	
  All	
  rights	
  reserved.	
  
ありがとうございます	
  
@tlipcon	
  
@ApacheKudu	
  
To	
  learn	
  more	
  about	
  Kudu,	
  please	
  aCend	
  my	
  session	
  at	
  
13:45,	
  Conference	
  Room	
  B	
  (7F)	
  	
  

More Related Content

What's hot

August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...Yahoo Developer Network
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv larsgeorge
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesMithun Radhakrishnan
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかToshihiro Suzuki
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache KuduJeff Holoman
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideDouglas Bernardini
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceSteve Loughran
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupMike Percy
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataYan Wang
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impalamarkgrover
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationAlex Moundalexis
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confSujee Maniyam
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...Yahoo Developer Network
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingGreat Wide Open
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)Todd Lipcon
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drillJulien Le Dem
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014alanfgates
 

What's hot (20)

August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
August 2016 HUG: Better together: Fast Data with Apache Spark™ and Apache Ign...
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
Data Pipelines in Hadoop - SAP Meetup in Tel Aviv
 
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the TrenchesHadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
Hadoop Summit 2015: Hive at Yahoo: Letters from the Trenches
 
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのかApache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
Apache HBaseの現在 - 火山と呼ばれたHBaseは今どうなっているのか
 
Introduction to Apache Kudu
Introduction to Apache KuduIntroduction to Apache Kudu
Introduction to Apache Kudu
 
Hortonworks.Cluster Config Guide
Hortonworks.Cluster Config GuideHortonworks.Cluster Config Guide
Hortonworks.Cluster Config Guide
 
HDP-1 introduction for HUG France
HDP-1 introduction for HUG FranceHDP-1 introduction for HUG France
HDP-1 introduction for HUG France
 
Intro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application MeetupIntro to Apache Kudu (short) - Big Data Application Meetup
Intro to Apache Kudu (short) - Big Data Application Meetup
 
Automation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure DataAutomation of Hadoop cluster operations in Arm Treasure Data
Automation of Hadoop cluster operations in Arm Treasure Data
 
SQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for ImpalaSQL Engines for Hadoop - The case for Impala
SQL Engines for Hadoop - The case for Impala
 
Improving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux ConfigurationImproving Hadoop Cluster Performance via Linux Configuration
Improving Hadoop Cluster Performance via Linux Configuration
 
Hadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA confHadoop2 new and noteworthy SNIA conf
Hadoop2 new and noteworthy SNIA conf
 
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
February 2016 HUG: Apache Kudu (incubating): New Apache Hadoop Storage for Fa...
 
Intro To Hadoop
Intro To HadoopIntro To Hadoop
Intro To Hadoop
 
Troubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed DebuggingTroubleshooting Hadoop: Distributed Debugging
Troubleshooting Hadoop: Distributed Debugging
 
A brave new world in mutable big data relational storage (Strata NYC 2017)
A brave new world in mutable big data  relational storage (Strata NYC 2017)A brave new world in mutable big data  relational storage (Strata NYC 2017)
A brave new world in mutable big data relational storage (Strata NYC 2017)
 
Sql on everything with drill
Sql on everything with drillSql on everything with drill
Sql on everything with drill
 
Hadoop
HadoopHadoop
Hadoop
 
Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014Hive analytic workloads hadoop summit san jose 2014
Hive analytic workloads hadoop summit san jose 2014
 

Viewers also liked

Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)Hadoop / Spark Conference Japan
 
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境Hadoop / Spark Conference Japan
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16Yifeng Jiang
 
データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-
データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-
データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-Makoto SHIMURA
 
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)Hadoop / Spark Conference Japan
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013Facundo Farias
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetupgethue
 
Apache Pig for Data Scientists
Apache Pig for Data ScientistsApache Pig for Data Scientists
Apache Pig for Data ScientistsDataWorks Summit
 
初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)
初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)
初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)Hadoop / Spark Conference Japan
 
Path to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered JourneyPath to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered JourneyDataWorks Summit/Hadoop Summit
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemDataWorks Summit/Hadoop Summit
 
Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...
Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...
Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...Hadoop / Spark Conference Japan
 
Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016
Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016
Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016Nagato Kasaki
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceDataWorks Summit/Hadoop Summit
 

Viewers also liked (20)

Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Apache Hadoop の現在と将来(Hadoop / Spark Conference Japan 2016 キーノート講演資料)
 
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境
Hadoop / Spark Conference Japan 2016 ご挨拶・Hadoopを取り巻く環境
 
sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16sparksql-hive-bench-by-nec-hwx-at-hcj16
sparksql-hive-bench-by-nec-hwx-at-hcj16
 
データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-
データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-
データドリブン企業におけるHadoop基盤とETL -niconicoでの実践例-
 
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)
Spark 2.0 What's Next (Hadoop / Spark Conference Japan 2016 キーノート講演資料)
 
NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013NoSQL Databases Introduction - UTN 2013
NoSQL Databases Introduction - UTN 2013
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
 
Apache Pig for Data Scientists
Apache Pig for Data ScientistsApache Pig for Data Scientists
Apache Pig for Data Scientists
 
Apache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and FutureApache Hadoop YARN: Past, Present and Future
Apache Hadoop YARN: Past, Present and Future
 
初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)
初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)
初めてのHadoopパッチ投稿 / How to Contribute to Hadoop (Cloudera World Tokyo 2014 LT講演資料)
 
State of the art Stream Processing #hadoopreading
State of the art Stream Processing #hadoopreadingState of the art Stream Processing #hadoopreading
State of the art Stream Processing #hadoopreading
 
Protecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache HadoopProtecting Enterprise Data In Apache Hadoop
Protecting Enterprise Data In Apache Hadoop
 
Path to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered JourneyPath to 400M Members: LinkedIn’s Data Powered Journey
Path to 400M Members: LinkedIn’s Data Powered Journey
 
Evolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage SubsystemEvolving HDFS to a Generalized Distributed Storage Subsystem
Evolving HDFS to a Generalized Distributed Storage Subsystem
 
Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...
Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...
Project Tungsten Bringing Spark Closer to Bare Meta (Hadoop / Spark Conferenc...
 
Apache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in NutshellApache NiFi 1.0 in Nutshell
Apache NiFi 1.0 in Nutshell
 
Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016
Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016
Hive on Spark を活用した高速データ分析 - Hadoop / Spark Conference Japan 2016
 
Streamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache AmbariStreamline Hadoop DevOps with Apache Ambari
Streamline Hadoop DevOps with Apache Ambari
 
Major advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL complianceMajor advancements in Apache Hive towards full support of SQL compliance
Major advancements in Apache Hive towards full support of SQL compliance
 
To The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid AnalyticsTo The Cloud and Back: A Look At Hybrid Analytics
To The Cloud and Back: A Look At Hybrid Analytics
 

Similar to The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahCloudera, Inc.
 
Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to HadoopUp-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoopblueboxtraveler
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopYifeng Jiang
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101Adam Muise
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystemtfmailru
 
The Big Picture on Hadoop
The Big Picture on HadoopThe Big Picture on Hadoop
The Big Picture on HadoopStackIQ
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championAmeet Paranjape
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in AmritsarE2MATRIX
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...yaevents
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and BeyondDataWorks Summit
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in MohaliE2MATRIX
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of HadoopCloudera, Inc.
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemShivaji Dutta
 
Insight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerInsight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerSynerzip
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in LudhianaE2MATRIX
 
Hue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGHue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGgethue
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtechYuta Imai
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Jonathan Seidman
 

Similar to The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料) (20)

Hadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr AwadallahHadoop's Impact on the Future of Data Management | Amr Awadallah
Hadoop's Impact on the Future of Data Management | Amr Awadallah
 
Apache Hadoop at 10
Apache Hadoop at 10Apache Hadoop at 10
Apache Hadoop at 10
 
Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to HadoopUp-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
Up-Armoring The Elephant: Adding Kerberos-based Security to Hadoop
 
Hadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise HadoopHadoop Present - Open Enterprise Hadoop
Hadoop Present - Open Enterprise Hadoop
 
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop1012014 feb 24_big_datacongress_hadoopsession1_hadoop101
2014 feb 24_big_datacongress_hadoopsession1_hadoop101
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
The Big Picture on Hadoop
The Big Picture on HadoopThe Big Picture on Hadoop
The Big Picture on Hadoop
 
Cloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a championCloud Austin Meetup - Hadoop like a champion
Cloud Austin Meetup - Hadoop like a champion
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
 
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...Контроль зверей: инструменты для управления и мониторинга распределенных сист...
Контроль зверей: инструменты для управления и мониторинга распределенных сист...
 
Hadoop Now, Next and Beyond
Hadoop Now, Next and BeyondHadoop Now, Next and Beyond
Hadoop Now, Next and Beyond
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
 
Webinar: The Future of Hadoop
Webinar: The Future of HadoopWebinar: The Future of Hadoop
Webinar: The Future of Hadoop
 
Introduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystemIntroduction to the Hadoop EcoSystem
Introduction to the Hadoop EcoSystem
 
Insight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark KerznerInsight on "From Hadoop to Spark" by Mark Kerzner
Insight on "From Hadoop to Spark" by Mark Kerzner
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
 
Hue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGHue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUG
 
Hadoop in adtech
Hadoop in adtechHadoop in adtech
Hadoop in adtech
 
Hortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data LondonHortonworks Presentation at Big Data London
Hortonworks Presentation at Big Data London
 
Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014Application architectures with hadoop – big data techcon 2014
Application architectures with hadoop – big data techcon 2014
 

More from Hadoop / Spark Conference Japan

機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)Hadoop / Spark Conference Japan
 
マルチテナント Hadoop クラスタのためのモニタリング Best Practice
マルチテナント Hadoop クラスタのためのモニタリング Best Practiceマルチテナント Hadoop クラスタのためのモニタリング Best Practice
マルチテナント Hadoop クラスタのためのモニタリング Best PracticeHadoop / Spark Conference Japan
 
Hadoop / Spark Conference Japan 2019 ご挨拶・開催にあたって
Hadoop / Spark Conference Japan 2019 ご挨拶・開催にあたってHadoop / Spark Conference Japan 2019 ご挨拶・開催にあたって
Hadoop / Spark Conference Japan 2019 ご挨拶・開催にあたってHadoop / Spark Conference Japan
 
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)Hadoop / Spark Conference Japan
 
MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)
MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)
MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)Hadoop / Spark Conference Japan
 
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)Hadoop / Spark Conference Japan
 
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)Hadoop / Spark Conference Japan
 
HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)
HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)
HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)Hadoop / Spark Conference Japan
 

More from Hadoop / Spark Conference Japan (10)

機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
機械学習、グラフ分析、SQLによるサイバー攻撃対策事例(金融業界)
 
What makes Apache Spark?
What makes Apache Spark?What makes Apache Spark?
What makes Apache Spark?
 
マルチテナント Hadoop クラスタのためのモニタリング Best Practice
マルチテナント Hadoop クラスタのためのモニタリング Best Practiceマルチテナント Hadoop クラスタのためのモニタリング Best Practice
マルチテナント Hadoop クラスタのためのモニタリング Best Practice
 
Hadoop / Spark Conference Japan 2019 ご挨拶・開催にあたって
Hadoop / Spark Conference Japan 2019 ご挨拶・開催にあたってHadoop / Spark Conference Japan 2019 ご挨拶・開催にあたって
Hadoop / Spark Conference Japan 2019 ご挨拶・開催にあたって
 
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)
Sparkによる GISデータを題材とした時系列データ処理 (Hadoop / Spark Conference Japan 2016 講演資料)
 
MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)
MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)
MapReduce/Spark/Tezのフェアな性能比較に向けて (Cloudera World Tokyo 2014 LT講演)
 
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
A Deeper Understanding of Spark Internals (Hadoop Conference Japan 2014)
 
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)
Mahoutによるアルツハイマー診断支援へ向けた取り組み (Hadoop Confernce Japan 2014)
 
The Future of Apache Spark
The Future of Apache SparkThe Future of Apache Spark
The Future of Apache Spark
 
HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)
HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)
HadoopとRDBMSをシームレスに連携させるSmart SQL Processing (Hadoop Conference Japan 2014)
 

Recently uploaded

الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهMohamed Sweelam
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireExakis Nelite
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform EngineeringMarcus Vechiato
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfdanishmna97
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfSrushith Repakula
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...FIDO Alliance
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe中 央社
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?Paolo Missier
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard37
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTopCSSGallery
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)Wonjun Hwang
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfalexjohnson7307
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...SOFTTECHHUB
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data SciencePaolo Missier
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...ScyllaDB
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Skynet Technologies
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityVictorSzoltysek
 

Recently uploaded (20)

الأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهلهالأمن السيبراني - ما لا يسع للمستخدم جهله
الأمن السيبراني - ما لا يسع للمستخدم جهله
 
Microsoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - QuestionnaireMicrosoft CSP Briefing Pre-Engagement - Questionnaire
Microsoft CSP Briefing Pre-Engagement - Questionnaire
 
Working together SRE & Platform Engineering
Working together SRE & Platform EngineeringWorking together SRE & Platform Engineering
Working together SRE & Platform Engineering
 
How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
How we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdfHow we scaled to 80K users by doing nothing!.pdf
How we scaled to 80K users by doing nothing!.pdf
 
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...Hyatt driving innovation and exceptional customer experiences with FIDO passw...
Hyatt driving innovation and exceptional customer experiences with FIDO passw...
 
Portal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russePortal Kombat : extension du réseau de propagande russe
Portal Kombat : extension du réseau de propagande russe
 
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
(Explainable) Data-Centric AI: what are you explaininhg, and to whom?
 
JohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptxJohnPollard-hybrid-app-RailsConf2024.pptx
JohnPollard-hybrid-app-RailsConf2024.pptx
 
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
TrustArc Webinar - Unified Trust Center for Privacy, Security, Compliance, an...
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Top 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development CompaniesTop 10 CodeIgniter Development Companies
Top 10 CodeIgniter Development Companies
 
CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)CORS (Kitworks Team Study 양다윗 발표자료 240510)
CORS (Kitworks Team Study 양다윗 발표자료 240510)
 
Generative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdfGenerative AI Use Cases and Applications.pdf
Generative AI Use Cases and Applications.pdf
 
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
The Ultimate Prompt Engineering Guide for Generative AI: Get the Most Out of ...
 
Design and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data ScienceDesign and Development of a Provenance Capture Platform for Data Science
Design and Development of a Provenance Capture Platform for Data Science
 
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
Event-Driven Architecture Masterclass: Engineering a Robust, High-performance...
 
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
Human Expert Website Manual WCAG 2.0 2.1 2.2 Audit - Digital Accessibility Au...
 
ChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps ProductivityChatGPT and Beyond - Elevating DevOps Productivity
ChatGPT and Beyond - Elevating DevOps Productivity
 

The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

  • 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   The  Evolu:on  and  Future  of   Hadoop  Storage   Todd  Lipcon  |  Engineer  at  Cloudera   TwiCer  @tlipcon  |  todd@cloudera.com    
  • 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   Spoke  at  HCJ  2011!  
  • 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   Spoke  at  HCJ  2011!  
  • 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   -­‐  Work  on  HDFS,  HBase,   MR  (HA,  performance,   stability,  etc)   -­‐  Became  a  commiFer,   PMC  member,  and  ASF   Member   Spoke  at  HCJ  2011!  
  • 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   -­‐  Founded  the  Kudu   project  within   Cloudera   -­‐  Secretly  developing   with  a  small  team   for  3  years   -­‐  Work  on  HDFS,  HBase,   MR  (HA,  performance,   stability,  etc)   -­‐  Became  a  commiFer,   PMC  member,  and  ASF   Member   Spoke  at  HCJ  2011!  
  • 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   -­‐  Founded  the  Kudu   project  within   Cloudera   -­‐  Secretly  developing   with  a  small  team   for  3  years   -­‐  Kudu  announced   and  contributed  to   the  ASF  as  Apache   Kudu  (incubaMng)   -­‐  Work  on  HDFS,  HBase,   MR  (HA,  performance,   stability,  etc)   -­‐  Became  a  commiFer,   PMC  member,  and  ASF   Member   Spoke  at  HCJ  2011!  
  • 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   誕生日おめでとう   ございます。     Hadoop:  the  last  10  years  
  • 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.  
  • 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15  
  • 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Basics   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15   -­‐ Very  basic   Hadoop   -­‐ Batch  processes   only   -­‐ Not  stable,  fast,   or  featureful  
  • 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Basics   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15   -­‐ Very  basic   Hadoop   -­‐ Batch  processes   only   -­‐ Not  stable,  fast,   or  featureful   -­‐ Expanding  feature  set   -­‐ Basic  security,  HA,   stability   -­‐ Commercial  distribuMons     Produc:on  
  • 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Basics   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15   Enterprise   -­‐ Security   -­‐ Performance   -­‐ Fast  full-­‐featured  SQL     -­‐ Very  basic   Hadoop   -­‐ Batch  processes   only   -­‐ Not  stable,  fast,   or  featureful   -­‐ Expanding  feature  set   -­‐ Basic  security,  HA,   stability   -­‐ Commercial  distribuMons     Produc:on  
  • 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   Evolu:on  of  Storage  (Basics  /  2006-­‐2007)   •  HDFS  only   •  Support  basic  batch  workloads.  No  HA.   •  Performance  not  important   • MapReduce  is  too  slow,  anyway!   • Batch  only   •  Early  Adopters  (FaceBook,  Yahoo,  etc)  
  • 14. 14  ©  Cloudera,  Inc.  All  rights  reserved.   Evolu:on  of  Storage  (Produc:on  /  2008-­‐2011)   •  HDFS  evolves  to  add  high  availability  and  security   • Focused  on  batch  workloads   • Inefficient  file  formats  commonly  used  (text)   • Query  engines  are  slow!  No  need  for  beCer  performance   •  Apache  HBase  becomes  an  Apache  Top-­‐Level  Project  (TLP)   • Introduces  fast  random  access   • Early  adopters  experiment  with  new  use  cases   • Deployed  at  Facebook  and  other  large  companies  
  • 15. 15  ©  Cloudera,  Inc.  All  rights  reserved.   Evolu:on  of  Storage  (Enterprise  /  2012-­‐2015)   •  Reliable  core  brings  new  users   • Enterprise  features:  access  control,  disaster  recovery,  encryp:on   •  Introduc:on  of  fast  query  engines   • 10-­‐100x  faster  SQL-­‐on-­‐Hadoop  (Impala,  Spark,  etc.)   • Pushes  HDFS  performance  improvements:  caching,  CPU  efficiency,  columnar   file  formats  (Apache  Parquet,  ORCFile)   •  HBase  evolves  to  1.0   • Improved  stability,  scalability,  security   • Good  random  access  -­‐  not  fast  for  SQL  analy:cs.   •  IniMal  support  for  cloud  storage   • Rising  adop:on  of  AWS,  Azure,  Google  Compute,  etc.  
  • 16. 16  ©  Cloudera,  Inc.  All  rights  reserved.   So  what’s  the  next  genera:on?   2016  and  beyond  
  • 17. 17  ©  Cloudera,  Inc.  All  rights  reserved.   2016-­‐2020  (Next-­‐gen):  storage  hardware   •  Spinning  disk  -­‐>  solid  state  storage   • NAND  flash:  Up  to  450k  read  250k  write  iops,  about  2GB/sec  read  and  1.5GB/ sec  write  throughput,  at  a  price  of  less  than  $3/GB  and  dropping  fast   • 3D  XPoint  memory  (1000x  faster  than  NAND,  cheaper  than  RAM)   •  RAM  is  cheaper  and  more  abundant:   • 64-­‐>128-­‐>256GB  over  last  few  years   •  HDFS  and  HBase  were  not  designed  for  next-­‐genera:on  hardware.   • Not  using  full  speed  of  flash  or  RAM  size    
  • 18. 18  ©  Cloudera,  Inc.  All  rights  reserved.   2016-­‐2020  (Next-­‐gen):  gaps  in  capabili:es   HDFS  good  at:   •  Batch  ingest  only  (eg  hourly)   •  Efficiently  scanning  large  amounts   of  data  (analy:cs)   HBase  good  at:   •  Efficiently  finding  and  wri:ng   individual  rows   •  Making  data  mutable     Gaps  exist  when  these  proper:es   are  needed  simultaneously    
  • 19. 19  ©  Cloudera,  Inc.  All  rights  reserved.   •  High  throughput  for  big  scans   Goal:  Within  2x  of  Parquet     •  Low-­‐latency  for  short  accesses          Goal:  1ms  read/write  on  SSD     •  RelaMonal  data  model   •  SQL  queries  are  easy   •  “NoSQL”  style  scan/insert/update  (Java/C++  client)   •  Expands  Hadoop  use  cases   •  Real-­‐:me  analy:cs  and  :me  series   •  Internet-­‐of-­‐things   2016-­‐2020  (Next-­‐gen):  Apache  Kudu  (incuba:ng)  
  • 20. 20  ©  Cloudera,  Inc.  All  rights  reserved.   Kudu:  Open  source,  scalable  and  fast  tabular  storage   •  Scalable   • Designed  to  scale  to  1000s  of  nodes,  tens  of  PBs   •  Fast   • Designed  for  modern  hardware   • Millions  of  read/write  opera:ons  per  second  across  cluster   • MulMple  GB/second  read  throughput  per  node   •  Tabular   • Store  tables  like  a  normal  database  (support  SQL,  Spark,  etc)   • NoSQL-­‐style  access  to  100+  billion  row  tables  (Java/C++/Python  APIs)  
  • 21. 21  ©  Cloudera,  Inc.  All  rights  reserved.   2016-­‐2020  (Next  gen):  Predic:ons   •  Kudu  will  evolve  an  enterprise  feature  set  and  enable  simple  high-­‐performance   real-­‐:me  architectures   • Increasing  ability  to  migrate  tradi:onal  applica:ons   •  HDFS  and  HBase  will  con:nue  to  innovate  and  adapt  to  next  genera:on   hardware   • Steady  improvements  in  performance,  efficiency,  and  scalability  (e.g.  erasure   coding)     •  Cloud  storage  will  become  increasingly  important   • Hadoop  ecosystem  will  evolve  to  coexist  
  • 22. 22  ©  Cloudera,  Inc.  All  rights  reserved.   ありがとうございます   @tlipcon   @ApacheKudu   To  learn  more  about  Kudu,  please  aCend  my  session  at   13:45,  Conference  Room  B  (7F)