Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

HBase: Extreme Makeover

2,446 views

Published on

Speaker: Vladimir Rodionov (bigbase.org)

This talks introduces a totally new implementation of a multilayer caching in HBase called BigBase. BigBase has a big advantage over HBase 0.94/0.96 because of an ability to utilize all available server RAM in the most efficient way, and because of a novel implementation of a L3 level cache on fast SSDs. The talk will show that different type of caches in BigBase work best for different type of workloads, and that a combination of these caches (L1/L2/L3) increases the overall performance of HBase by a very wide margin.

  • Be the first to comment

HBase: Extreme Makeover

  1. 1. HBase:  Extreme  makeover   Vladimir  Rodionov   Hadoop/HBase  architect   Founder  of  BigBase.org   HBaseCon  2014   Features  &  Internal  Track  
  2. 2. Agenda  
  3. 3. About  myself   •  Principal  PlaKorm  Engineer  @Carrier  IQ,  Sunnyvale,  CA     •  Prior  to  Carrier  IQ,  I  worked  @  GE,  EBay,  Plumtree/BEA.   •  HBase  user  since  2009.   •  HBase  hacker  since  2013.   •  Areas  of  experTse  include  (but  not  limited  to)  Java,   HBase,  Hadoop,  Hive,  large-­‐scale  OLAP/AnalyTcs,  and  in-­‐ memory  data  processing.   •  Founder  of  BigBase.org  
  4. 4. What?  
  5. 5. BigBase  =  EM(HBase)  
  6. 6. BigBase  =  EM(HBase)   EM(*)  =  ?  
  7. 7. BigBase  =  EM(HBase)   EM(*)  =  
  8. 8. BigBase  =  EM(HBase)   EM(*)  =   Seriously?  
  9. 9. BigBase  =  EM(HBase)   EM(*)  =   Seriously?   for  HBase   It’s  a  MulT-­‐Level  Caching  soluTon  
  10. 10. Real  Agenda   •  Why  BigBase?   •  Brief  history  of  BigBase.org  project   •  BigBase  MLC  high  level  architecture  (L1/L2/L3)   •  Level  1  -­‐  Row  Cache.   •  Level  2/3  -­‐  Block  Cache  RAM/SSD.   •  YCSB  benchmark  results   •  Upcoming  features  in  R1.5,  2.0,  3.0.   •  Q&A  
  11. 11. HBase   •  STll  lacks  some  original  BigTable’s  features.   •  STll  not  able  to  uTlize  efficiently  all  RAM.     •  No  good  mixed  storage  (SSD/HDD)  support.     •  Single  Level  Caching  only.  Simple.     •  HBase  +  Large  JVM  Heap  (MemStore)  =  ?  
  12. 12. BigBase   •  Adds  Row  Cache  and  block  cache  compression.   •  UTlizes  efficiently  all  RAM  (TBs).     •  Supports  mixed  storage  (SSD/HDD).     •  Has  MulT  Level  Caching.  Not  that  simple.     •  Will  move  MemStore  off  heap  in    R2.  
  13. 13. BigBase  History  
  14. 14. Koda  (2010)   •  Koda  -­‐  Java  off  heap  object  cache,  similar  to   Terracola’s  BigMemory.   •  Delivers  4x  Tmes  more  transacTons  …   •  10x  Tmes  beler  latencies  than  BigMemory  4.   •  Compression  (Snappy,  LZ4,  LZ4HC,  Deflate).   •  Disk  persistence  and  periodic  cache  snapshots.   •  Tested  up  to  240GB.  
  15. 15. Karma  (2011-­‐12)   •  Karma  -­‐  Java  off  heap  BTree  implementaTon  to   support  fast  in  memory  queries.   •  Supports  extra  large  heaps,  100s  millions  –  billions   objects.   •  Stores  300M  objects  in  less  than  10G  of  RAM.   •  Block  Compression.   •  Tested  up  to  240GB.   •  Off  Heap  MemStore  in  R2.  
  16. 16. Yamm  (2013)   •  Yet  Another  Memory  Manager.   – Pure  100%  Java  memory  allocator.   – Replaced  jemalloc  in  Koda.     – Now  Koda  is  100%  Java.   – Karma  is  the  next  (sTll  on  jemalloc).   – Similar  to  memcached  slab  allocator.   •  BigBase  project  started  (Summer  2013).  
  17. 17. BigBase  Architecture  
  18. 18. MLC  –  MulT-­‐Level  Caching   HBase  0.94   Disk   JVM    RAM   LRUBlockCache  
  19. 19. MLC  –  MulT-­‐Level  Caching   HBase  0.94   Disk   JVM    RAM   LRUBlockCache   HBase  0.96   Disk   JVM    RAM   Bucket  cache   One  level  of  caching  :     •  RAM  (L2)    
  20. 20. MLC  –  MulT-­‐Level  Caching   HBase  0.94   Disk   JVM    RAM   LRUBlockCache   HBase  0.96   Bucket  cache   JVM    RAM   One  level  of  caching  :     •  RAM  (L2)   •  Or  DISK  (L3)    
  21. 21. MLC  –  MulT-­‐Level  Caching   HBase  0.94   Disk   JVM    RAM   LRUBlockCache   HBase  0.96   Disk   JVM    RAM   Bucket  cache   BigBase  1.0   Block  Cache  L3   SSD   JVM    RAM   Row  Cache  L1   Block  Cache  L2  
  22. 22. MLC  –  MulT-­‐Level  Caching   HBase  0.94   Disk   JVM    RAM   LRUBlockCache   HBase  0.96   Disk   JVM    RAM   Bucket  cache   BigBase  1.0   JVM    RAM   Row  Cache  L1   Block  Cache  L2   BlockCache  L3   Network  
  23. 23. MLC  –  MulT-­‐Level  Caching   HBase  0.94   Disk   JVM    RAM   LRUBlockCache   HBase  0.96   Disk   JVM    RAM   Bucket  cache   BigBase  1.0   JVM    RAM   Row  Cache  L1   Block  Cache  L2   BlockCache  L3   memcached  
  24. 24. MLC  –  MulT-­‐Level  Caching   HBase  0.94   Disk   JVM    RAM   LRUBlockCache   HBase  0.96   Disk   JVM    RAM   Bucket  cache   BigBase  1.0   JVM    RAM   Row  Cache  L1   Block  Cache  L2   BlockCache  L3   DynamoDB  
  25. 25. BigBase  Row  Cache  (L1)  
  26. 26. Where  is  BigTable’s  Scan  Cache?   •  Scan  Cache  caches  hot  rows  data.     •  Complimentary  to  Block  Cache.   •  STll  missing  in  HBase  (as  of  0.98).       •  It’s  very  hard  to  implement  in  Java  (off  heap).   •  Max  GC  pause    is  ~  0.5-­‐2  sec  per  1GB  of  heap   •  G1  GC  in  Java  7  does  not  resolve  the  problem.   •  We  call  it  Row  Cache  in  BigBase.      
  27. 27. Row  Cache  vs.  Block  Cache   HFile  Block   HFile  Block  HFile  Block  HFile  Block  HFile  Block  
  28. 28. Row  Cache  vs.  Block  Cache  
  29. 29. Row  Cache  vs.  Block  Cache   BLOCK  CACHE   ROW  CACHE  
  30. 30. Row  Cache  vs.  Block  Cache   ROW  CACHE   BLOCK  CACHE  
  31. 31. Row  Cache  vs.  Block  Cache   ROW  CACHE   BLOCK  CACHE  
  32. 32. BigBase  Row  Cache   •  Off  Heap  Scan  Cache    for  HBase.   •  Cache  size:  100’s  of  GBs  to  TBs.     •  EvicTon  policies:  LRU,  LFU,  FIFO,   Random.     •  Pure  100%  -­‐  compaTble  Java.     •  Sub-­‐millisecond  latencies,  zero  GC.   •  Implemented  as  RegionObserver   coprocessor.       Row  Cache   YAMM   Codecs   Kryo   SerDe   KODA  
  33. 33. BigBase  Row  Cache   •  Read  through  cache.     •  It  caches  rowkey:CF.     •  Invalidates  key  on  every  mutaTon.   •  Can  be  enabled/disabled  per  table   and  per  table:CF.   •  New  ROWCACHE  alribute.   •  Best  for  small  rows  (<  block  size)       Row  Cache   YAMM   Codecs   Kryo   SerDe   KODA  
  34. 34. Performance-­‐Scalability   •  GET  (small  rows  <  100  bytes):  175K  operaTons  per  sec   per  one  Region  Server  (from  cache).   •  MULTI-­‐GET  (small  rows  <  100  bytes):  >  1M  records  per   second  (network  limited)  per  one  Region  Server.   •  LATENCY  :    99%  <  1ms  (for  GETs)  with  100K  ops.   •  VerTcal  scalability:  tested  up  to  240GB  (the  maximum   available  in  Amazon  EC2).   •  Horizontal  scalability:  limited  by  HBase  scalability.     •  No  more  memcached  farms  in  front  of  HBase  clusters.  
  35. 35. BigBase  Block  Cache  (L2,  L3)  
  36. 36. What  is  wrong  with  Bucket  Cache?   Scalability   LIMITED   MulT-­‐Level  Caching  (MLC)   NOT  SUPPORTED   Persistence  (‘ozeap’  mode)   NOT  SUPPORTED   Low  latency  apps   NOT  SUPPORTED   SSD  friendliness  (‘file’  mode)   NOT  FRIENDLY   Compression   NOT  SUPPORTED  
  37. 37. What  is  wrong  with  Bucket  Cache?   Scalability   LIMITED   MulT-­‐Level  Caching  (MLC)   NOT  SUPPORTED   Persistence  (‘ozeap’  mode)   NOT  SUPPORTED   Low  latency  apps   NOT  SUPPORTED   SSD  friendliness  (‘file’  mode)   NOT  FRIENDLY   Compression   NOT  SUPPORTED  
  38. 38. What  is  wrong  with  Bucket  Cache?   Scalability   LIMITED   MulT-­‐Level  Caching  (MLC)   NOT  SUPPORTED   Persistence  (‘ozeap’  mode)   NOT  SUPPORTED   Low  latency  apps   NOT  SUPPORTED   SSD  friendliness  (‘file’  mode)   NOT  FRIENDLY   Compression   NOT  SUPPORTED  
  39. 39. What  is  wrong  with  Bucket  Cache?   Scalability   LIMITED   MulT-­‐Level  Caching  (MLC)   NOT  SUPPORTED   Persistence  (‘ozeap’  mode)   NOT  SUPPORTED   Low  latency  apps   ?   SSD  friendliness  (‘file’  mode)   NOT  FRIENDLY   Compression   NOT  SUPPORTED  
  40. 40. What  is  wrong  with  Bucket  Cache?   Scalability   LIMITED   MulT-­‐Level  Caching  (MLC)   NOT  SUPPORTED   Persistence  (‘ozeap’  mode)   NOT  SUPPORTED   Low  latency  apps   NOT  SUPPORTED   SSD  friendliness  (‘file’  mode)   NOT  FRIENDLY   Compression   NOT  SUPPORTED  
  41. 41. What  is  wrong  with  Bucket  Cache?   Scalability   LIMITED   MulT-­‐Level  Caching  (MLC)   NOT  SUPPORTED   Persistence  (‘ozeap’  mode)   NOT  SUPPORTED   Low  latency  apps   NOT  SUPPORTED   SSD  friendliness  (‘file’  mode)   NOT  FRIENDLY   Compression   NOT  SUPPORTED  
  42. 42. Here  comes  BigBase   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE  
  43. 43. Here  comes  BigBase   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE  
  44. 44. Here  comes  BigBase   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE  
  45. 45. Here  comes  BigBase   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE  
  46. 46. Here  comes  BigBase   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE  
  47. 47. Here  comes  BigBase   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE  
  48. 48. Wait,  there  are  more  …   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE   Non  disk–based  L3  cache   SUPPORTED   RAM  Cache  opTmizaTon   IBCO  
  49. 49. Wait,  there  are  more  …   Scalability   HIGH   MulT-­‐Level  Caching  (MLC)   SUPPORTED   Persistence  (‘ozeap’  mode)   SUPPORTED   Low  latency  apps   SUPPORTED   SSD  friendliness  (‘file’  mode)   SSD-­‐FRIENDLY   Compression   SNAPPY,  LZ4,  LZHC,  DEFLATE   Non  disk–based  L3  cache   SUPPORTED   RAM  Cache  opTmizaTon   IBCO  
  50. 50. BigBase  1.0  vs.  HBase  0.98   BigBase   HBase  0.98   Row  Cache  (L1)   YES   NO   Block  Cache  RAM  (L2)   YES  (fully  off  heap)   YES  (parTally  off  heap)   Block  Cache  (L3)  DISK   YES  (SSD-­‐  friendly)   YES  (not  SSD  –  friendly)   Block  Cache  (L3)  NON  DISK   YES   NO   Compression   YES   NO   RAM  Cache  persistence   YES  (both  L1  and  L2)   NO   Low  Latency  opTmized   YES   NO   MLC  support   YES  (L1,  L2,  L3)   NO  (either  L2  or  L3)   Scalability   HIGH   MEDIUM  (limited  by  JVM  heap)  
  51. 51. YCSB  Benchmark  
  52. 52. Test  setup  (AWS)   •  HBase  0.94.15  –  RS:  11.5GB  heap  (6GB  LruBlockCache  on  heap);  Master:  4GB  heap.       •  Clients:  5  (30  threads  each),  collocated  with  Region  Servers.   •  Data  sets:  100M  and  200M.  120GB  /  240GB  approximately.  Only  25%  fits  in  a  cache.     •  Workloads:  100%  read  (read100,  read200,  hotspot100),  100%  scan  (scan100,  scan200)  –zipfian.   •  YCSB  0.1.4  (modified  to  generate  compressible  data).  We  generated  compressible  data   (with  factor  of  2.5x)  only  for  scan  workloads  to  evaluate  effect  of  compression  in  BigBase   block  cache  implementaTon.   •  Common  –  Whirr  0.8.2;  1  (Master  +  Zk)  +  5  RS;  m1.xlarge:  15GB  RAM,  4  vCPU,  4x420  HDD      •  BigBase  1.0  (0.94.15)  –  RS:  4GB  heap  (6GB  off  heap  cache);  Master:  4GB  heap.   •  HBase  0.96.2  –  RS:  4GB  heap  (6GB  Bucket  Cache  off  heap);  Master:  4GB  heap.      
  53. 53. Test  setup  (AWS)   •  HBase  0.94.15  –  RS:  11.5GB  heap  (6GB  LruBlockCache  on  heap);  Master:  4GB  heap.       •  Clients:  5  (30  threads  each),  collocated  with  Region  Servers.   •  Data  sets:  100M  and  200M.  120GB  /  240GB  approximately.  Only  25%  fits  in  a  cache.     •  Workloads:  100%  read  (read100,  read200,  hotspot100),  100%  scan  (scan100,  scan200)  –zipfian.   •  YCSB  0.1.4  (modified  to  generate  compressible  data).  We  generated  compressible  data   (with  factor  of  2.5x)  only  for  scan  workloads  to  evaluate  effect  of  compression  in  BigBase   block  cache  implementaTon.   •  Common  –  Whirr  0.8.2;  1  (Master  +  Zk)  +  5  RS;  m1.xlarge:  15GB  RAM,  4  vCPU,  4x420  HDD      •  BigBase  1.0  (0.94.15)  –  RS:  4GB  heap  (6GB  off  heap  cache);  Master:  4GB  heap.   •  HBase  0.96.2  –  RS:  4GB  heap  (6GB  Bucket  Cache  off  heap);  Master:  4GB  heap.      
  54. 54. Benchmark  results  (RPS)   11405   6123   5553   6265   4086   3850   15150   3512   2855  3224   1500   709  820   434   228   0   2000   4000   6000   8000   10000   12000   14000   16000   BigBase  R1.0   HBase  0.96.2   HBase  0.94.15   read100   read200   hotspot100   scan100   scan200  
  55. 55. Average  latency  (ms)   13   24   27  23   36   39  10   44   52  48   102   223   187   375   700   0   100   200   300   400   500   600   700   800   BigBase  R1.0   HBase  0.96.2   HBase  0.94.15   read100   read200   hotspot100   scan100   scan200  
  56. 56. 95%  latency  (ms)   51   91   100  88   124   138   38   152   197  175   405   950   729   0   100   200   300   400   500   600   700   800   900   1000   BigBase  R1.0   HBase  0.96.2   HBase  0.94.15   read100   read200   hotspot100   scan100   scan200  
  57. 57. 99%  latency  (ms)   133   190   213  225   304   338   111   554   632   367   811   0   100   200   300   400   500   600   700   800   900   BigBase  R1.0   HBase  0.96.2   HBase  0.94.15   read100   read200   hotspot100   scan100   scan200  
  58. 58. YCSB  100%  Read   3621   1308   2281   1111  1253   770   0   500   1000   1500   2000   2500   3000   3500   4000   BigBase  R1.0   HBase  0.94.15   Per  Server   50M   100M   200M   •  50M      =  2.77X   •  100M  =  2.05X   •  200M  =  1.63X   •  50M  =  40%  fits  cache   •  100M  =  20%  fits  cache   •  200M  =  10%  fits  cache   •  What  is  the  maximum?  
  59. 59. YCSB  100%  Read   3621   1308   2281   1111  1253   770   0   500   1000   1500   2000   2500   3000   3500   4000   BigBase  R1.0   HBase  0.94.15   Per  Server   50M   100M   200M   •  50M      =  2.77X   •  100M  =  2.05X   •  200M  =  1.63X   •  50M  =  40%  fits  cache   •  100M  =  20%  fits  cache   •  200M  =  10%  fits  cache   •  What  is  the  maximum?   •  ~  75X  (hotspot  2.5/100)   •  56K  (BB)  vs.  750  (HBase)   •  100%  in  cache  
  60. 60. All  data  in  cache   •  Setup:  BigBase  1.0,  48G   RAM,  (8/16)  CPU  cores  –  5   nodes  (1+  4)   •  Data  set:  200M  (300GB)     •  Test:  Read  100%,  hotspot   •  YCSB  0.1.4  –  4  clients   •  40  threads  –      100K   •  100  threads  –  168K   •  200  threads  –  224K   •  400  threads  -­‐    262K     100,000   168,000   224,000   262,000   99%   1   2   3   7   95%   1   1   2   3   avg   0.4   0.6   0.9   1.5   0   1   2   3   4   5   6   7   8   Latency  (ms)   Hotspot  (2.5/100  –  200M  data)  
  61. 61. All  data  in  cache   •  Setup:  BigBase  1.0,  48G   RAM,  (8/16)  CPU  cores  –  5   nodes  (1+  4)   •  Data  set:  200M  (300GB)     •  Test:  Read  100%,  hotspot   •  YCSB  0.1.4  –  4  clients   •  40  threads  –      100K   •  100  threads  –  168K   •  200  threads  –  224K   •  400  threads  -­‐    262K     100,000   168,000   224,000   262,000   99%   1   2   3   7   95%   1   1   2   3   avg   0.4   0.6   0.9   1.5   0   1   2   3   4   5   6   7   8   Latency  (ms)   Hotspot  (2.5/100  –  200M  data)   100K  ops:  99%  <  1ms  
  62. 62. What  is  next?   •  Release  1.1  (2014  Q2)   –  Support  HBase  0.96,  0.98,  trunk   –  Fully  tested  L3  cache  (SSD)   •  Release  1.5  (2014  Q3)   –  YAMM:  memory  allocator  compacTng  mode  .   –  IntegraTon  with  Hadoop  metrics.   –  Row  Cache:  merge  rows  on  update  (good  for  counters).   –  Block  Cache:  new  evicTon  policy  (LRU-­‐2Q).   –  File  read  posix_fadvise  (  bypass  OS  page  cache).   –  Row  Cache:  make  it  available  for  server-­‐side  apps  
  63. 63. What  is  next?     •  Release  2.0  (2014  Q3)   –  HBASE-­‐5263:  Preserving  cache  data  on  compacTon   –  Cache  data  blocks  on  memstore  flush  (configurable).     –  HBASE-­‐10648:  Pluggable  Memstore.  Off  heap   implementaTon,  based  on  Karma  (off  heap  BTree  lib).   •  Release  3.0  (2014  Q4)   –  Real  Scan  Cache  –    caches  results  of  Scan  operaTons  on   immutable  store  files.   –  Scan  Cache  integraTon  with  Phoenix  and  with  other  3rd   party  libs  provided    rich  query  API  for  HBase.      
  64. 64. Download/Install/Uninstall   •  Download  BigBase  1.0  from  www.bigbase.org   •  InstallaTon/upgrade  takes  10-­‐20  minutes   •  BeaTficaTon  operator  EM(*)  is  inverTble:     HBase  =  EM-­‐1(BigBase)  (the  same  10-­‐20  min)  
  65. 65. Q  &  A     Vladimir  Rodionov   Hadoop/HBase  architect   Founder  of  BigBase.org   HBase:  Extreme  makeover   Features  &  Internal  Track  

×