Apache HBase: Where We've Been and What's Upcoming

1,666 views
1,515 views

Published on

Jon Hsieh, Software Engineer @ Cloudera and HBase Committer

Apache HBase is a distributed non-relational database that provides low-latency random read write access to massive quantities of data. This talk will be broken up into two parts. First I'll talk about how in the past few years, HBase has been deployed in production at companies like Facebook, Pinterest, Groupon, and eBay and about the vibrant community of contributors from around the world include folks at Cloudera, Salesforce.com, Intel, HortonWorks, Yahoo!, and XiaoMi. Second I'll talk about the features in the newest release 0.96.x and in the upcoming 0.98.x release.

Published in: Technology

Apache HBase: Where We've Been and What's Upcoming

  1. 1. Headline  Goes  Here   Speaker  Name  or  Subhead  Goes  Here   DO  NOT  USE  PUBLICLY   PRIOR  TO  10/23/12   Apache  HBase  –  Where  we’ve  been   and  what’s  upcoming   Jonathan  Hsieh  |  @jmhsieh     Tech  lead  /  SoMware  Engineer  at  Cloudera  |  HBase  PMC  Member     Hadoop  Users  Group  UK   April  10,  2014   4/10/14 Hadoop Users Group UK
  2. 2. Who  Am  I?   •  Cloudera:   •  Tech  Lead  HBase  Team   •  So<ware  Engineer   •  Apache  HBase  commiVer  /  PMC     •  Apache  Flume  founder  /  PMC   •  U  of  Washington:   •  Research  in  Distributed  Systems   4/10/14 Hadoop Users Group UK
  3. 3. What  is  Apache  HBase?   Apache  HBase  is  a   reliable,  column-­‐ oriented  data  store  that   provides  consistent,  low-­‐ latency,  random  read/ write  access.   ZK   HDFS   App   MR   4/10/14 Hadoop Users Group UK
  4. 4. HBase  provides  Low-­‐latency  Random  Access   •  Writes:     •  1-­‐3ms,  1k-­‐20k  writes/sec  per  node   •  Reads:     •  0-­‐3ms  cached,  10-­‐30ms  disk   •  10k-­‐40k  reads  /  second  /  node  from   cache   •  Cell  size:     •  0B-­‐3MB     •  Read,  write,  and  insert  data   anywhere  in  the  table   4/10/14 Hadoop Users Group UK 0000000000   1111111111   2222222222   3333333333   4444444444   5555555555   6666666666   7777777777   1   2   3   4   5  
  5. 5. Core  Properces   •  ACID  guarantees  on  a  row     •  Writes  are  durable   •  Strong  consistency  first,  then  availability   •  AMer  failure,  recover  and  return  current  value  instead  of  returning  stale  value   •  CAS  and  atomic  increments  can  be  efficient.   •  Sorted  By  Primary  Key     •  Short  scans  are  efficient   •  Parcconed  by  Primary  Key   •  Log  Structured  Merged  Tree   •  Writes  are  extremely  efficient   •  Reads  are  efficient   •  Periodic  layout  opcmizacons  for  read  opcmizacon  (“compaccons”)  required.   4/10/14 Hadoop Users Group UK
  6. 6. An  HBase  History   Where  We’ve  Been   4/10/14 Hadoop Users Group UK
  7. 7. Jan  ‘12:  0.92.0   Apache  HBase  Timeline   4/10/14 Hadoop Users Group UK 2014  2006   2007   2008   2009   2010   2011   2013  2012   Nov  ’06:   Google     BigTable  OSDI  ‘06   Apr  ‘07:  First   Apache  HBase   commit  as   Hadoop  contrib   project   Apr  ‘10:  Apache   HBase  becomes   top  level  project   Oct  ‘13:  0.96.0   Apr’11:  CDH3  GA   with  HBase  0.90.1       May  ‘12:   HBaseCon  2012   Jun  ‘13:   HBaseCon  2013   Jan‘08:  Promoted   to  Hadoop   subproject   Feb  ‘13:  0.98.0  May  ‘12:  0.94.0   Summer‘11:     Messages     on  HBase    Summer  ‘09   StumbleUpon     goes  produccon  on   HBase  ~0.20   Nov  ‘11:     Cassini   on  HBase   Jan  ‘13   Phoenix   on  HBase   Summer‘11:    Web   Crawl     Cache  
  8. 8. Developer  Community   •  Accve  community!   •  Diverse  commiVers   from  many   organizacons   4/10/14 Hadoop Users Group UK
  9. 9. Apache  HBase  “Nascar”  Slide   4/10/14 Hadoop Users Group UK
  10. 10. Apache  HBase  Core  Development   4/10/14 Hadoop Users Group UK •  Vendors   •  Self  Service  
  11. 11. Apache  HBase  Sample  Users   4/10/14 Hadoop Users Group UK •  Inbox   •  Storage   •  Web   •  Search   •  Analyccs   •  Monitoring      
  12. 12. Apache  HBase  Ecosystem  Projects   4/10/14 Hadoop Users Group UK
  13. 13. What’s  here  and  new  today?   Today:  Apache  0.96.2  /  0.98.1   4/10/14 Hadoop Users Group UK
  14. 14. Criccal  Features   Disaster  Recovery   •  Cluster  Replicacon   •  Table  Snapshots   •  Copy  Table   •  Import  /  Export  Tables   •  Metadata  Corrupcon  repair   tool  (hbck)   AdministraMve  and  ConMnuity   •  Kerberos  based  Authenccacon   •  ACL  based  Authorizacon   •  Config  change  via  rolling  restart.   •  Within  version  rolling  upgrade.   •  Protobuf  based  wire  protocol   for  RPC  future  proofing   4/10/14 Hadoop Users Group UK
  15. 15. Hardened  for  0.96   Table  AdministraMon   •  Online  Schema  change   •  Online  Region  Merging   •  Concnuous  fault  injeccon   tescng  with  “Chaos  Monkey”   Performance  Tuning   •  Alternate  key  encodings  for   efficient  memory  usage   •  Exploring  Compactor  policy   minimizes  compaccon  storms   •  Smart  and  Adapcve  Stochascc   region  load  balancer   •  Fast  split  policy  for  new  tables   4/10/14 Hadoop Users Group UK
  16. 16. MR  over  Table  Snapshots  (0.98,  CDH5.0)   •  Previously  MapReduce  jobs   over  HBase  required  online  full   table  scan   •  Idea:  Take    a  snapshot  and  run   MR  job  over  snapshot  files   •  Doesn’t  use  HBase  client   •  Avoid  affeccng  HBase  caches     •  3-­‐5x  perf  boost.   4/10/14 Hadoop Users Group UK map   map   map   map   map   map   map   map   reduce   reduce   reduce   map   map   map   map   map   map   map   map   reduce   reduce   reduce   snapshot  
  17. 17. Mean  Time  to  Recovery  (MTTR)   •  Machine  failures  happen  in  distributed  systems   •  Average  unavailability  when  automaccally  recovering  from  a   failure.   •  Recovery  cme  for  a  unclean  data  center  power  cycle   4/10/14 Hadoop Users Group UK recovered  nocfy  repair  detect   Region   unavailable   Region  available   client  aware   Region  available   client  unaware  
  18. 18. Fast  nocficacon  and  deteccon  (0.96)   •  Proaccve  nocficacon  of  HMaster  failure  (0.96)   •  Proaccve  nocficacon  of  RS  failure  (0.96)   •  Nocfy  client  on  recovery  (0.96)   •  Fast  server  failover  (Hardware)   4/10/14 Hadoop Users Group UK recovered  replay  assign  split   Region   unavailable   Region  available     for  RW   hdfs   hdfs   detect   hdfs  
  19. 19. •  Previously  had  two  IO  intensive  passes:   •  Log  splitng  to  intermediate  files   •  Assign  and  log  replay   •  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.   •  Improves  read  and  write  recovery  cmes.   •  Off  by  default  currently*.   Distributed  log  replay  (experimental  0.96)   4/10/14 Hadoop Users Group UK recovered  split  +  replay  assign   Region   unavailable   Region  available     for  RW   Region  available     for  replay  writes   hdfs   detect   *Caveat:  If  you  override  cme  stamps  you  could  have     READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  
  20. 20. Cell  Tags  (0.98  experimental)   •  Mechanism  for  aVaching  arbitrary  metadata  to   Cells.     •  Mocvacon:  Finer-­‐grained  isolacon   •  Use  for  Accumulo-­‐style  cell-­‐level  visibility     •  Main  feature  for  0.98   •  Other  uses:   •  Add  sequence  numbers  to  enable  correct  fast   read/write  recovery   •  Potencal  for  schema  tags   4/10/14 Hadoop Users Group UK
  21. 21. Htrace  (0.96  experimental)   •  Problem:   •  Where  is  cme  being  spent  inside  HBase?       •  Solucon:  HTrace  Framework   •  Inspired  by  Google  Dapper   •  Threaded  through  HBase  and  HDFS     •  Tracks  cme  spent  in  calls  in  a  distributed  system  by  tracking  spans*   on  different  machines.   *Some  assembly  scll  required.   4/10/14 Hadoop Users Group UK
  22. 22. HTrace:  Distributed  Tracing  in  HBase  and  HDFS   •  Framework  Inspired  by   Google  Dapper   •  Tracks  cme  spent  in  calls   in  RPCs  across  different   machines.   •  Threaded  through  HBase   (0.96)  and  future  HDFS.   4/10/14 Hadoop Users Group UK HBase     RS   1   HDFS   DN  ZK   HBase   Client   HBase   meta   HDFS       NN   A  span   RPC  calls  
  23. 23. Zipkin  –  Visualizing  Spans   •  UI  +  Visualizacon  System   •  WriVen  by  TwiVer   •  Zipkin  HBase  Storage   •  Zipkin  HTrace  integracon   •  View  where  cme  from  a   specific  call  is  spent  in   HBase,  HDFS,  and  ZK.   4/10/14 Hadoop Users Group UK
  24. 24. A  Future  HBase   What’s  Upcoming   4/10/14 Hadoop Users Group UK
  25. 25. Outline   •  Improved  Mean  cme  to  recovery  (MTTR)   •  Improved  Predictability   •  Improved  Usability   •  Improved  Mulctenancy   4/10/14 Hadoop Users Group UK
  26. 26. Faster  read  recovery   Improving  MTTR  Further     4/10/14 Hadoop Users Group UK
  27. 27. •  Previously  had  two  IO  intensive  passes:   •  Log  splitng  to  intermediate  files   •  Assign  and  log  replay   •  Now  just  one  IO  heavy  pass:    Assign  first,  then  split+replay.   •  Improves  read  and  write  recovery  cmes.   •  Off  by  default  currently*.   Distributed  log  replay  (experimental  0.96)   4/10/14 Hadoop Users Group UK recovered  split  +  replay  assign   Region   unavailable   Region  available     for  RW   Region  available     for  replay  writes   hdfs   detect   *Caveat:  If  you  override  cme  stamps  you  could  have     READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  
  28. 28. recovered  split  +  replay   Distributed  log  replay  with  fast  write  recovery   4/10/14 Hadoop Users Group UK assign   Region   unavailable   Region  available     for  RW   Region  available     for  all  writes   hdfs   detect   •  Writes  in  HBase  do  not  incur  reads.   •  With  distributed  log  replay,  we’ve  already  have  regions  open   for  write.   •  Allow  fresh  writes  while  replaying  old  logs*.   *Caveat:  If  you  override  cme  stamps  you  could  have     READ  REPEATED  isolacon  violacons  (use  tags  to  fix  this)  
  29. 29. Fast  Read  Recovery  (proposed)   •  Idea:  Priscne  Region  fast  read  recovery   •  If  region  not  edited  it  is  consistent  and  can  recover  RW  immediately     •  Idea:  Shadow  Regions  for  fast  read  recovery   •  Shadow  region  tails  the  WAL  of  the  primary  region   •  Shadow  memstore  is  one  HDFS  block  behind,  catch  up  recover  RW   •  Currently  some  progress  for  trunk   4/10/14 Hadoop Users Group UK recovered  assign   Region   unavailable   Can  guarantee  no  new  edits?   Region  available    for  all  RW   detect   Can  guarantee  we  have  all  edits?   Region  available  for  all  RW  
  30. 30. Improving  the  99%cle   Improving  Predictability   4/10/14 Hadoop Users Group UK
  31. 31. Common  causes  of  performance  variability   •  Locality  Loss   •  Favored  Nodes,  HDFS  block  affinity   •  Compaccon   •  Exploring  compactor   •  GC*     •  Off-­‐heap  Cache   •  Hardware  hiccups   •  MulM  WAL,  HDFS  speculaMve  read   4/10/14 Hadoop Users Group UK
  32. 32. Performance  degraded  aMer  recovery   •  AMer  recovery,  reads  suffer  a  performance  hit.   •  Regions  have  lost  locality   •  To  maintain  performance  aMer  failover,  we  need  to  regain  locality.   •  Compact  Region  to  regain  locality   •  We  can  do  beVer  by  using  HDFS  features   4/10/14 Hadoop Users Group UK performance  recovered  recovered   Service  recovered;     degraded  performance  L   recovery   Performance  recovered  because     compaccon  restores  locality  J  
  33. 33. •  Control  and  track  where  block  replicas  are   •  All  files  for  a  region  created  such  that  blocks  go  to  the  same  set  of  favored  nodes   •  When  failing  over,  assign  the  region  to  one  of  those  favored  nodes.   •  Currently  a  preview  feature  in  0.96   •  Disabled  by  default  because  it  doesn’t  work  well  with  the  latest  balancer  or  splits.   •  Will  likely  use  upcoming  HDFS  block  affinity  for  beVer  operability   •  Originally  on  Facebook’s  0.89,  ported  to  0.96   performance  recovered   Read  Throughput:  Favored  Nodes  (experimental   0.96)   4/10/14 Hadoop Users Group UK Service  recovered;  performance  sustained  because     region  assigned  to  favored  node.  J   recovery  
  34. 34. Read  latency:  HDFS  hedged  read  (CDH5.0)   •  HBase’s  Region  servers  use   HDFS  client  to  reads  1  of  3   HDFS  block  replicas   •  If  you  chose  the  slow  node,   your  reads  are  slow.   •  If  a  read  is  taking  too  long,   speculacvely  go  to  another   that  may  be  faster.   4/10/14 Hadoop Users Group UKRS   1   2   3   Slow  read!   Hdfs  replicas  RS   1   2   3   Hdfs  replicas   Too  slow,  read  other  replica  
  35. 35. Read  latency:  Read  Replicas  (in  progress)   •  HBase  client  reads  from  primary   region  servers.   •  If  you  chose  the  slow  node,  your   reads  are  slow.   •  Idea:  Read  replica  assigned  to  other   region  servers.    Replicas  periodically   catch  up  (via  snapshots  or  shadow   region  memstores)       •  Client  specifies  if  stale  read  OK.    If  a   read  is  taking  too  long,  speculacvely   go  to  another  that  may  be  faster.   4/10/14 Hadoop Users Group UK Hbase     Client   1   Slow  read!   1   2   3   Region     replicas   Too  slow,  read  stale  replica   Hbase     Client  
  36. 36. Write  latency:  Mulcple  WALs  (in  progress)   •  HBase’s  HDFS  client  writes  3   replicas     •  Min  write  latency  is  bounded   by  the  slowest  of  the  3   replicas   •  Idea:  If  a  write  is  taking  too   long  let’s  duplicate  it  on   another  set  that  may  be   faster.   4/10/14 Hadoop Users Group UKRS   1   2   3   Slow  Write   Hdfs     replicas  RS   1   2   3   Hdfs     replicas   1   2   3  Hdfs   replicas   Too  slow,  write     to  other  replica  
  37. 37. Improving  Usability   Autotuning,  Tracing,  and  SQL   4/10/14 Hadoop Users Group UK
  38. 38. Making  HBase  easier  to  use  and  tune.   •  Difficult  to  see  what  is  happening  in  HBase   •  Easy  to  make  poor  design  decisions  early  without  realizing     •  New  Developments   •  Memory  auto  tuning   •  HTrace  +  Zipkin   •  Frameworks  for  Schema  design   4/10/14 Hadoop Users Group UK
  39. 39. Memory  Use  Auto-­‐tuning  (trunk)   •  Memory  is  divided  between     •  the  memstore  (used  for  serving  recent  writes)     •  the  block  cache  (used  for  read  hot  spots)   •  Need  to  choose  balance  for  work  load   4/10/14 Hadoop Users Group UK memstore   Block  cache   memstore   Block  cache   memstore   Block  cache   Read  Heavy     Balanced   Write  heavy  
  40. 40. HBase  Schemas   •  HBase  Applicacon  developers  must  iterate  to  find  a  suitable  HBase   schema   •  Schema  criMcal  for  Performance  at  Scale   •  How  can  we  make  this  easier?   •  How  can  we  reduce  the  expercse  required  to  do  this?   •  Today:   •  Lots  of  tuning  knobs   •  Developers  need  to  understand  Column  Families,  Rowkey  design,  Data   encoding,  …   •  Some  are  expensive  to  change  aMer  the  fact   4/10/14 Hadoop Users Group UK
  41. 41. How  should  I  arrange  my  data?   •  Isomorphic  data  representacons!   4/10/14 Hadoop Users Group UK rowkey   d:   bob-­‐col1   aaaa   bob-­‐col2   bbbb   bob-­‐col3   cccc   bob-­‐col4   dddd   jon-­‐col1   eeee   jon-­‐col2   ffff   jon-­‐col3   gggg   jon-­‐col4   hhhh   Rowkey   d:col1   d:col2   d:col3   d:col4   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Rowkey   col1:   col2:   col3:   col4:   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Short  Fat  Table  using  column  qualifiers   Short  Fat  Table  using  column  families   Tall  skinny  with     compound  rowkey  
  42. 42. How  should  I  arrange  my  data?   •  Isomorphic  data  representacons!   4/10/14 Hadoop Users Group UK rowkey   d:   bob-­‐col1   aaaa   bob-­‐col2   bbbb   bob-­‐col3   cccc   bob-­‐col4   dddd   jon-­‐col1   eeee   jon-­‐col2   ffff   jon-­‐col3   gggg   jon-­‐col4   hhhh   Rowkey   d:col1   d:col2   d:col3   d:col4   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Rowkey   col1:   col2:   col3:   col4:   bob   aaaa   bbbb   cccc   dddd   jon   eeee   ffff   gggg   hhhhh   Short  Fat  Table  using  column  qualifiers   Short  Fat  Table  using  column  families   Tall  skinny  with     compound  rowkey   With  great  power  comes  great   responsibility!     How  can  we  make  this  easier  for  users?  
  43. 43. Impala   •  Scalable  Low-­‐latency  SQL  querying  for  HDFS  (and  HBase!)   •  ODBC/JDBC  driver  interface   •  Highlights     •  Use’s  Hive  metastore  and  its  hbase-­‐hbase  connector   configuracon  convencons.   •  Nacve  code  implementacon,  uses  JIT  for  query   execucon  opcmizacon.   •  Authorizacon  via  Kerberos  support   •  Open  sourced  by  Cloudera   •  hVps://github.com/cloudera/impala   4/10/14 Hadoop Users Group UK
  44. 44. Phoenix   •  A  SQL  skin  over  HBase  targecng  low-­‐latency  queries.   •  JDBC  SQL  interface   •  Highlights   •  Adds  Types   •  Handles  Compound  Row  key  encoding     •  Secondary  indices  in  development   •  Provides  some  pushdown  aggregacons  (coprocessor).   •  Open  sourced  by  Salesforce.com   •  Work  from  James  Taylor,  Jesse  Yates,  et  al   •  hVps://github.com/forcedotcom/phoenix   4/10/14 Hadoop Users Group UK
  45. 45. Kite  (nee  Cloudera  Development  Kit/CDK)   •  APIs  that  provides  a  Dataset  abstracMon     •  Provides  get/put/delete  API  in  avro  objects   •  HBase  Support  in  progress   •  Highlights   •  Supports  mulcple  components  of  the  hadoop  distros   (flume,  morphlines,  hive,  crunch,  hcat)   •  Provides  types  using    Avro  and  parquet  formats  for   encoding  encces   •  Manages  schema  evolucon   •  Open  source  by  Cloudera   •  hVps://github.com/kite-­‐sdk/kite   4/10/14 Hadoop Users Group UK
  46. 46. Many  apps  and  users  in  a  single  cluster   Mulc-­‐tenancy   4/10/14 Hadoop Users Group UK
  47. 47. Growing  HBase   •  Pre  0.96.0:  scaling  up  HBase  for   single  HBase  applicacons   •  Essencally  a  single  user  for  single   app.   •  Ex:  Facebook  messages,  one   applicacon,  many  hbase  clusters   •  Shard  users  to  different  pods   •  Focused  on  concnuity  and  disaster   recovery  features   •  Cross-­‐cluster  Replicacon   •  Table  Snapshots   •  Rolling  Upgrades   4/10/14 Hadoop Users Group UK #  of  isolated  applicacons    #  of  clusters   Scalability   One  giant  applicacon,     Mulcple  clusters  
  48. 48. Growing  HBase   •  In  0.96  we  introduce  primicves   for  supporcng  MulMtenancy   •  Many  users,  many  applicacons,   one  HBase  cluster   •  Need  to  have  some  control  of   the  interaccons  different  users   cause.   •  Ex:  Manage  for  MR  analyccs  and   low-­‐latency  serving  in  one   cluster.   4/10/14 Hadoop Users Group UK #  of  isolated  applicacons    #  of  clusters   mulctenancy   Scalability   One  giant  applicacon,     Mulcple  clusters   Many  applicacons     In  one  shared  cluster  
  49. 49. Namespaces  (0.96)   •  Namespaces  provide  an  abstraccon  for  mulcple  tenants  to   create  and  manage  their  own  tables  within  a  large  HBase   instance.   4/10/14 Hadoop Users Group UK Namespace  blue   Namespace  green   Namespace  orange  
  50. 50. Mulctenancy  goals   •  Security  (0.96)   •  A  separate  admin  ACLs  for  different  sets  of  tables   •  Quotas  (in  progress)   •  Max  tables,  max  regions.   •  Performance  Isolacon  (in  progress)   •  Limit  performance  impact  load  on  one  table  has  on  others.   •  Priority  (future)   •  Prioricze  some  workloads/tables/user  before  others   4/10/14 Hadoop Users Group UK
  51. 51. Isolacon  with  Region  Server  Groups  (in  progress)   4/10/14 Hadoop Users Group UK Region  assignment  distribucon  (no  region  server  groups)   Namespace  blue   Namespace  green   Namespace  orange  
  52. 52. Isolacon  with  Region  Server  Groups  (in  progress)   4/10/14 Hadoop Users Group UK RSG  blue   RSG  green  orange   Namespace  blue   Namespace  green   Namespace  orange   Region  assignment  distribucon  with  Region  Server  Groups  (RSG)  
  53. 53. Conclusions   4/10/14 Hadoop Users Group UK
  54. 54. Summary  by  Version   0.90  (CDH3)   0.92  /0.94  (CDH4)   0.96  (CDH5)   Next  (0.98  /  1.0.0)   New   Features   Stability    Reliability     Concnuity   Mulctenancy   MTTR   Recovery  in   Hours   Recovery  in  Minutes   Recovery  of  writes  in   seconds,  reads  in  10’s   of  seconds     Recovery  in  Seconds   (reads+writes)   Perf   Baseline   BeVer  Throughput   Opcmizing   Performance     Predictable   Performance   Usability   HBase   Developer   Expercse   HBase  Operaconal   Experience   Distributed  Systems   Admin  Experience   Applicacon   Developers   Experience   4/10/14 Hadoop Users Group UK
  55. 55. Quescons?   @jmhsieh   4/10/14 Hadoop Users Group UK

×