Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

1,262 views

Published on

Hadoop / Spark Conference Japan 2016 キーノート講演資料
The Evolution and Future of Hadoop Storage
Cloudera Todd Lipcon氏

Published in: Technology
  • Be the first to comment

The Evolution and Future of Hadoop Storage (Hadoop Conference Japan 2016キーノート講演資料)

  1. 1. 1  ©  Cloudera,  Inc.  All  rights  reserved.   The  Evolu:on  and  Future  of   Hadoop  Storage   Todd  Lipcon  |  Engineer  at  Cloudera   TwiCer  @tlipcon  |  todd@cloudera.com    
  2. 2. 2  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   Spoke  at  HCJ  2011!  
  3. 3. 3  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   Spoke  at  HCJ  2011!  
  4. 4. 4  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   -­‐  Work  on  HDFS,  HBase,   MR  (HA,  performance,   stability,  etc)   -­‐  Became  a  commiFer,   PMC  member,  and  ASF   Member   Spoke  at  HCJ  2011!  
  5. 5. 5  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   -­‐  Founded  the  Kudu   project  within   Cloudera   -­‐  Secretly  developing   with  a  small  team   for  3  years   -­‐  Work  on  HDFS,  HBase,   MR  (HA,  performance,   stability,  etc)   -­‐  Became  a  commiFer,   PMC  member,  and  ASF   Member   Spoke  at  HCJ  2011!  
  6. 6. 6  ©  Cloudera,  Inc.  All  rights  reserved.   Introduc:on  (the  evolu:on  and  future  of  me)   Mailing  list  messages  sent  by  Todd  Lipcon   -­‐ Early  user  of  Hadoop   -­‐ Joined  Cloudera  as   So4ware  Engineer   -­‐  Founded  the  Kudu   project  within   Cloudera   -­‐  Secretly  developing   with  a  small  team   for  3  years   -­‐  Kudu  announced   and  contributed  to   the  ASF  as  Apache   Kudu  (incubaMng)   -­‐  Work  on  HDFS,  HBase,   MR  (HA,  performance,   stability,  etc)   -­‐  Became  a  commiFer,   PMC  member,  and  ASF   Member   Spoke  at  HCJ  2011!  
  7. 7. 7  ©  Cloudera,  Inc.  All  rights  reserved.   誕生日おめでとう   ございます。     Hadoop:  the  last  10  years  
  8. 8. 8  ©  Cloudera,  Inc.  All  rights  reserved.  
  9. 9. 9  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15  
  10. 10. 10  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Basics   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15   -­‐ Very  basic   Hadoop   -­‐ Batch  processes   only   -­‐ Not  stable,  fast,   or  featureful  
  11. 11. 11  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Basics   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15   -­‐ Very  basic   Hadoop   -­‐ Batch  processes   only   -­‐ Not  stable,  fast,   or  featureful   -­‐ Expanding  feature  set   -­‐ Basic  security,  HA,   stability   -­‐ Commercial  distribuMons     Produc:on  
  12. 12. 12  ©  Cloudera,  Inc.  All  rights  reserved.     Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Basics   Evolu:on  of  the  Hadoop  Plagorm     2006   2008   2009   2010   2011   2012   2013   Core  Hadoop    (HDFS,     MapReduce)   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   Core  Hadoop   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop   The  stack  is  con:nually  evolving  and  growing!   2007   Solr   Pig   Core  Hadoop       Ibis   Flink   Parquet   Sentry   Spark   Tez   Impala   Ka]a   Drill   Flume   Bigtop   Oozie   MRUnit   HCatalog   Hue   Sqoop   Whirr   Avro   Hive   Mahout   HBase   ZooKeeper   Solr   Pig   YARN   Core  Hadoop     2014-­‐15   Enterprise   -­‐ Security   -­‐ Performance   -­‐ Fast  full-­‐featured  SQL     -­‐ Very  basic   Hadoop   -­‐ Batch  processes   only   -­‐ Not  stable,  fast,   or  featureful   -­‐ Expanding  feature  set   -­‐ Basic  security,  HA,   stability   -­‐ Commercial  distribuMons     Produc:on  
  13. 13. 13  ©  Cloudera,  Inc.  All  rights  reserved.   Evolu:on  of  Storage  (Basics  /  2006-­‐2007)   •  HDFS  only   •  Support  basic  batch  workloads.  No  HA.   •  Performance  not  important   • MapReduce  is  too  slow,  anyway!   • Batch  only   •  Early  Adopters  (FaceBook,  Yahoo,  etc)  
  14. 14. 14  ©  Cloudera,  Inc.  All  rights  reserved.   Evolu:on  of  Storage  (Produc:on  /  2008-­‐2011)   •  HDFS  evolves  to  add  high  availability  and  security   • Focused  on  batch  workloads   • Inefficient  file  formats  commonly  used  (text)   • Query  engines  are  slow!  No  need  for  beCer  performance   •  Apache  HBase  becomes  an  Apache  Top-­‐Level  Project  (TLP)   • Introduces  fast  random  access   • Early  adopters  experiment  with  new  use  cases   • Deployed  at  Facebook  and  other  large  companies  
  15. 15. 15  ©  Cloudera,  Inc.  All  rights  reserved.   Evolu:on  of  Storage  (Enterprise  /  2012-­‐2015)   •  Reliable  core  brings  new  users   • Enterprise  features:  access  control,  disaster  recovery,  encryp:on   •  Introduc:on  of  fast  query  engines   • 10-­‐100x  faster  SQL-­‐on-­‐Hadoop  (Impala,  Spark,  etc.)   • Pushes  HDFS  performance  improvements:  caching,  CPU  efficiency,  columnar   file  formats  (Apache  Parquet,  ORCFile)   •  HBase  evolves  to  1.0   • Improved  stability,  scalability,  security   • Good  random  access  -­‐  not  fast  for  SQL  analy:cs.   •  IniMal  support  for  cloud  storage   • Rising  adop:on  of  AWS,  Azure,  Google  Compute,  etc.  
  16. 16. 16  ©  Cloudera,  Inc.  All  rights  reserved.   So  what’s  the  next  genera:on?   2016  and  beyond  
  17. 17. 17  ©  Cloudera,  Inc.  All  rights  reserved.   2016-­‐2020  (Next-­‐gen):  storage  hardware   •  Spinning  disk  -­‐>  solid  state  storage   • NAND  flash:  Up  to  450k  read  250k  write  iops,  about  2GB/sec  read  and  1.5GB/ sec  write  throughput,  at  a  price  of  less  than  $3/GB  and  dropping  fast   • 3D  XPoint  memory  (1000x  faster  than  NAND,  cheaper  than  RAM)   •  RAM  is  cheaper  and  more  abundant:   • 64-­‐>128-­‐>256GB  over  last  few  years   •  HDFS  and  HBase  were  not  designed  for  next-­‐genera:on  hardware.   • Not  using  full  speed  of  flash  or  RAM  size    
  18. 18. 18  ©  Cloudera,  Inc.  All  rights  reserved.   2016-­‐2020  (Next-­‐gen):  gaps  in  capabili:es   HDFS  good  at:   •  Batch  ingest  only  (eg  hourly)   •  Efficiently  scanning  large  amounts   of  data  (analy:cs)   HBase  good  at:   •  Efficiently  finding  and  wri:ng   individual  rows   •  Making  data  mutable     Gaps  exist  when  these  proper:es   are  needed  simultaneously    
  19. 19. 19  ©  Cloudera,  Inc.  All  rights  reserved.   •  High  throughput  for  big  scans   Goal:  Within  2x  of  Parquet     •  Low-­‐latency  for  short  accesses          Goal:  1ms  read/write  on  SSD     •  RelaMonal  data  model   •  SQL  queries  are  easy   •  “NoSQL”  style  scan/insert/update  (Java/C++  client)   •  Expands  Hadoop  use  cases   •  Real-­‐:me  analy:cs  and  :me  series   •  Internet-­‐of-­‐things   2016-­‐2020  (Next-­‐gen):  Apache  Kudu  (incuba:ng)  
  20. 20. 20  ©  Cloudera,  Inc.  All  rights  reserved.   Kudu:  Open  source,  scalable  and  fast  tabular  storage   •  Scalable   • Designed  to  scale  to  1000s  of  nodes,  tens  of  PBs   •  Fast   • Designed  for  modern  hardware   • Millions  of  read/write  opera:ons  per  second  across  cluster   • MulMple  GB/second  read  throughput  per  node   •  Tabular   • Store  tables  like  a  normal  database  (support  SQL,  Spark,  etc)   • NoSQL-­‐style  access  to  100+  billion  row  tables  (Java/C++/Python  APIs)  
  21. 21. 21  ©  Cloudera,  Inc.  All  rights  reserved.   2016-­‐2020  (Next  gen):  Predic:ons   •  Kudu  will  evolve  an  enterprise  feature  set  and  enable  simple  high-­‐performance   real-­‐:me  architectures   • Increasing  ability  to  migrate  tradi:onal  applica:ons   •  HDFS  and  HBase  will  con:nue  to  innovate  and  adapt  to  next  genera:on   hardware   • Steady  improvements  in  performance,  efficiency,  and  scalability  (e.g.  erasure   coding)     •  Cloud  storage  will  become  increasingly  important   • Hadoop  ecosystem  will  evolve  to  coexist  
  22. 22. 22  ©  Cloudera,  Inc.  All  rights  reserved.   ありがとうございます   @tlipcon   @ApacheKudu   To  learn  more  about  Kudu,  please  aCend  my  session  at   13:45,  Conference  Room  B  (7F)    

×