HBase Applications - Atlanta HUG - May 2014
Upcoming SlideShare
Loading in...5
×
 

HBase Applications - Atlanta HUG - May 2014

on

  • 466 views

HBase is good a various workloads, ranging from sequential range scans to purely random access. These access patterns can be translated into application types, usually falling into two major groups: ...

HBase is good a various workloads, ranging from sequential range scans to purely random access. These access patterns can be translated into application types, usually falling into two major groups: entities and events. This presentation discussed the underlying implications and how to approach those use-cases. Examples taken from Facebook show how this has been tackled in real life.

Statistics

Views

Total Views
466
Views on SlideShare
460
Embed Views
6

Actions

Likes
7
Downloads
13
Comments
0

2 Embeds 6

http://www.slideee.com 4
https://twitter.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

HBase Applications - Atlanta HUG - May 2014 HBase Applications - Atlanta HUG - May 2014 Presentation Transcript

  • 1   HBase  Applica-ons   Selected  Use-­‐Cases  around  a  Common  Theme   Atlanta  HUG  –May  2014   Lars  George,  Cloudera   EMEA  Chief  Architect  
  • 2   About  Me   •  EMEA  Chief  Architect  @  Cloudera   •  Consul-ng  on  Hadoop  projects  (everywhere)   •  Apache  CommiNer   •  HBase  and  Whirr   •  O’Reilly  Author   •  HBase  –  The  Defini-ve  Guide   •  Now  in  Japanese!   •  Contact   •  lars@cloudera.com   •  @larsgeorge   日本語版も出ました!
  • 3   The  Content...   •  HBase  -­‐  Strengths  and  weaknesses   •  Common  use-­‐cases  and  paNerns   •  Focus  on  specific  type  of  applica-ons   •  Summary  
  • 4   CONFIDENTIAL  -­‐  RESTRICTED   HBase   Strength  and  Weaknesses  
  • 5   IOPS  vs  Throughput  Mythbusters   It  is  all  physics  in  the  end,  you  cannot  solve  an  I/O   problem  without  reducing  I/O  in  general.  Parallelize   access  and  read/write  sequen-ally.  
  • 6   HBase:  Strengths  &  Weaknesses   Strengths:   •  Random  access  to  small(ish)  key-­‐value  pairs   •  Rows  and  columns  stored  sorted  lexicographically     •  Adds  table  and  region  concepts  to  group  related  KVs   •  Stores  and  reads  data  sequen-ally   •  Parallelizes  across  all  clients   •  Non-­‐blocking  I/O  throughout  
  • 7   HBase:  Strengths  &  Weaknesses   Weaknesses:   •  Not  op-mized  (yet)  for  100%  possible  throughput  of   underlying  storage  layer   •  And  HDFS  is  not  op-mized  fully  either   •  Single  writer  issue  with  WALs   •  Single  server  hot-­‐spojng  with  non-­‐distributed  keys  
  • 8   PaNerns   •  There  are  common  paNerns  in  many  common  use-­‐ cases,  like  programming  paNerns.     •  We  need  to  extract  these  common  paNerns  and  make   them  repeatable.   •  Similar  to  the  “Gang  of  Four”  (Gamma,  Helm,   Johnson,  Vlissides),  or  the  “Three  Amigos”  (Booch,   Jacobson,  Rumbaugh)  
  • 9   CONFIDENTIAL  -­‐  RESTRICTED   Common  PaNerns  
  • 10   HBase  Dilemma   Although  HBase  can  host  many  applica-ons,  they  may   require  completely  opposite  features   Events Entities Time Series Message Store
  • 11   This  talk  (at  this  event)   •  Message  Store   •  Informa-on  exchange  between  en--es   •  Sending/Receiving  informa-on  is  an  event   •  Time-­‐Series   •  Sequence  of  data  points  measure  at  successive  points  in   -me,  spaced  at  uniform  intervals   •  Measuring  of  a  data  point  is  an  event  
  • 12   Using  HBase  Strengths  
  • 13   HBase  “Indexes”  (cont.)   •  Use  primary  keys,  aka  the  row  keys,  as  sorted  index   •  One  sort  direc-on  only   •  Use  “secondary  index”  to  get  reverse  sor-ng   •  Lookup  table  or  same  table   •  Use  secondary  keys,  aka  the  column  qualifiers,  as   sorted  index  within  main  record   •  Use  prefixes  within  a  column  family  or  separate  column   families    
  • 14   CONFIDENTIAL  -­‐  RESTRICTED   Common  Use-­‐Cases  
  • 15   Use-­‐Case  I:  Messages  
  • 16   HBase  Message  Store   Use-­‐Case:   •  Store  incoming  messages  in  HBase,  such  as  Emails,   SMS,  MMS,  IM   •  Constant  updates  of  exis-ng  en--es   •  e.g.  Email  read,  flagged,  starred,  moved,  deleted   •  Reading  of  top-­‐N  entries,  sorted  by  -me   •  Newest  20  messages,  last  20  conversa-ons   •  Examples:   •  Facebook  Messages  
  • 17   Problem  Descrip-on   •  Records  are  of  varying  size   •  Large  ones  hinder  smaller  ones   •  Massive  index  issue   •  User  can  sort,  filter  by  everything     •  At  the  same  -me  reading  top-­‐N  should  be  fast   •  But  what  to  do  for  automated  accounts?  80/20  rule?   •  Only  doable  with  heuris-cs   •  Only  create  minimal  indexes   •  Create  addi-onal  ones  when  user  asks  for  it   •  Cross  mailbox  issues  with  Conversa-ons   •  Similar  to  -meline  in  Facebook   •  Overall  requirements  for  I/O  
  • 18   Interlude I: Compaction Details Write Amplification in HBase
  • 19   Compac-ons  in  HBase   •  Must  happen  to  keep  data  in  check   •  Combine  small  flush  files  into  larger  ones   •  Remove  old  data  (during  major  compac-ons)   •  Two  types:  Minor  and  Major  Compac-ons   •  Minor  are  triggered  with  API  muta-on  calls   •  Major  are  -me  scheduled  (or  auto-­‐promoted)   •  Both  can  be  triggered  manually  if  needed   •  Add  extra  background  I/O  that  grows  over  -me   •  Write  amplifica-on!   •  Have  to  be  tuned  for  heavy  write  systems  
  • 20   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF1 hbase.hregion.memstore.flush.size = 128MB
  • 21   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF1HF2HF1
  • 22   Writes:  Flushes  and  Compac-ons   HF3 Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF2HF1 hbase.hstore.compaction.min = 3 hbase.hstore.compactionThreshold = 3 (0.90) hbase.hstore.compaction.max = 10
  • 23   Writes:  Flushes  and  Compac-ons   CF1 Older NewerTIME SIZE (MB) 1000 0 250 500 750 1. Compaction (Major auto promoted)
  • 24   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF4
  • 25   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF4 HF5HF4
  • 26   Writes:  Flushes  and  Compac-ons   HF6 Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF5HF4
  • 27   Writes:  Flushes  and  Compac-ons   HF6 Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF5HF4 hbase.hstore.compaction.ratio = 1.2 hbase.hstore.compaction.min.size = flush size
  • 28   Writes:  Flushes  and  Compac-ons   HF6 Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF1 HF5 HF4 hbase.hstore.compaction.ratio = 1.2 120%
  • 29   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF2 2. Compaction (Major auto promoted)
  • 30   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF2 HF7 CF2
  • 31   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2
  • 32   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2 HF9
  • 33   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2 HF9 HF10
  • 34   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 HF7 HF8 CF2 HF9 HF10 hbase.hstore.compaction.ratio = 1.2 120% Eliminate older to newer files, until in ratio
  • 35   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 CF2 CF3 3. Compaction
  • 36   Fast Forward...
  • 37   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750
  • 38   Addi-onal  Notes  #1   There  are  a  few  more  sejngs  for  compac-ons:   •  hbase.hstore.compaction.max = 10 Limit  per  maximum  number  of  files  per  compac-on     •  hbase.hstore.compaction.max.size = Long.MAX_VALUE Exclude  files  larger  than  that  sejng  (0.92+)   •  hbase.hregion.majorcompaction = 1d Scheduled  major  compac-ons  
  • 39   Addi-onal  Notes  #2   •  hbase.hstore.compaction.kv.max = 10 Limits  internal  scanner  caching  during  read  of  files  to   be  compacted     •  hbase.hstore.blockingStoreFiles = 7   Enforces  upper  limit  of  files  for  compac-ons  to  catch   up  -­‐  blocks  user  opera-ons!     •  hbase.hstore.blockingWaitTime = 90s   Upper  limit  on  blocking  user  opera-ons  
  • 40   Write Fragmentation Yo, where’s the data at?
  • 41   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts We are looking at two specific rows, one is never changed, the other frequently
  • 42   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 43   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 44   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 1. Compaction (Major auto promoted) Existing Row Mutations Unique Row Inserts
  • 45   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 46   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 47   Skip forward again...
  • 48   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts
  • 49   Source:http://www.ngdata.com/visualizing-hbase-flushes-and-compactions/
  • 50   Compac-on  Summary   •  Compac-on  tuning  is  important   •  Do  not  be  too  aggressive  or  write  amplifica-on  is   no-ceable  under  load   •  Use  -mestamp/-­‐ranges  in  Get/Scan  to  limit  files   Ra+o   Effect   1.0   Dampened,  causes  more  store  files,  needs  to  be  combined  with  an   effec-ve  Bloom  filter  usage  (non  random)   1.2   Default  value,  moderate  sejng   1.4   More  aggressive,  keeps  number  of  files  low,  causes  more  auto   promoted  major  compac-ons  to  occur  
  • 51   Interlude II: Bloom Filter Call me maybe, baby?
  • 52   Background  on  Bloom  Filters  
  • 53   Background  on  Bloom  Filters   •  Bit  arrays  of  m  bits,  an  k  hash  func-ons   •  HBase  uses  Hash  folding   •  Returns  “No”  or  “Maybe”  only   •  Error  rate  tunable,  usually  about  1%   •  At  1%  error  rate,  op-mal  k              9.6  bits  per  key   m=18, k=3
  • 54   Seeking  with  Bloom  Filters  
  • 55   Read  Time  Series  Entry   •  Event  record  is  wriNen  once  and  never  deleted  or   updated   •  Keeps  en-re  record  in  specific  loca-on  in  storage  files   •  Use  -me  range  to  indicate  what  is  needed     •  {Get|Scan}.setTimeRange() •  Helps  system  to  skip  unnecessary  (older)  files   •  Bloom  Filter  helps  for  given  row  key(s)  and  column   qualifiers   •  Can  skip  files  not  containing  requested  details  
  • 56   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Existing Row Mutations Unique Row Inserts Single  Block  Read  (64K)   Block  filter  and/or  -me  range   eliminates  all  other  store  files  
  • 57   Read  Updateable  En-ty   •  Data  is  updated  regularly,  aging  out  at  intervals   •  Reading  en-ty  needs  to  read  all  details  to   recons-tute  the  current  state   •  Deletes  mask  out  aNributes   •  Updates  overrides  (or  complements)  aNributes   •  Bloom  filters  will  have  a  hard  -me  to  say  “no”  since   most  files  might  contain  en-ty  aNributes   •  Time  filter  on  scans  or  gets  also  has  few  op-ons  to   skip  files  since  older  aNributes  might  s-ll  be   important  
  • 58   Writes:  Flushes  and  Compac-ons   Older NewerTIME SIZE (MB) 1000 0 250 500 750 Bloom  Filter  returns  “yes”  for   all  but  two  files:   7+  block  loads  (64KB)  needed   yes   yes   yes  yes   yes   no   yes   yes   no  
  • 59   Bloom  Filter  Op-ons   There  are  three  choices:   •  NONE   Duh!  Use  this  when  the  Bloom  Filter  is  not  useful   based  on  the  use-­‐case  (Default  sejng)   •  ROW   Index  only  row  key,  needs  an  entry  per  row  key  in   Bloom  Filter   •  ROWCOL   Index  row  and  column  key,  requires  an  entry  in  the   Filter  for  every  column  cell  (KeyValue)  
  • 60   How  to  decide?  
  • 61   Bloom  Filter  Summary   •  They  help  a  lot  -­‐  but  not  always   •  Highly  depends  on  write  paNerns   •  Keep  an  eye  on  size,  since  they  are  cached   •  HFile  v2  helps  here  as  it  only  loads  root  index  info   “Bloom  filters  can  get  as  large  as  100  MB  per  HFile,  which   adds  up  to  2  GB  when  aggregated  over  20  regions.  Block   indexes  can  grow  as  large  as  6  GB  in  aggregate  size  over  the   same  set  of  regions.”   Source:  hNp://hbase.apache.org/book/hfilev2.html  
  • 62   Interlude III: Write-ahead Log The lonesome writer tale.
  • 63   Write-­‐ahead  Log  -­‐  Data  Flow    
  • 64   Write-­‐ahead  Log  -­‐  Overview   •  One  file  per  Region  Server   •  All  regions  have  a  reference  to  this  file   •  Actually  a  wrapper  around  the  physical  file   •  The  file  is  in  the  end  a  Hadoop  SequenceFile     •  Stored  in  HDFS  so  it  can  be  recovered  ater  a  server   failure   •  There  is  a  synchroniza+on  barrier  that  impacts  all   parallel  writers,  aka  clients   •  Overall  performance  is  BAD,  maybe  10MB/s  
  • 65   Write-­‐ahead  Log  -­‐  Workarounds   •  Enable  log  compression   hbase.regionserver.wal.enablecompression •  Disable  WAL  for  secondary  records   •  Restore  indexes  or  derived  records  from  main  one   •  But  be  careful  to  use  coprocessor  hook  as  it  cannot  access   currently  replaying  region   •  Work  on  upstream  JIRAs   •  Mul+ple  logs  per  server   •  Fix  single  writer  issue  in  HDFS  
  • 66   Back to the main theme... Yes, message stores.
  • 67   Schema   •  Every  line  is  an  inbox   •  Indexes  as  CFs  or  separate  tables   •  Random  updates  and  inserts  cause  storage  file  churn   •  Facebook  used  more  than  4  or  5  schema  itera+ons   •  Not  representa-ve  really:  pure  blob  storage   •  Evolved  over  -me  to  be  more  HBase  like   •  Another  customer  iterated  about  the  same  -me  over   various  schemas   •  Difficult  to  keep  indexes  up  to  date    
  • 68   Facebook Messages An interesting use-case…
  • 69   Facebook  Messages  -­‐  Sta-s-cs   Source: HBaseCon 2012 - Anshuman Singh
  • 70  
  • 71  
  • 72   Schema 1
  • 73   Notes  on  Facebook  Schema  1   This  is  basically  the  same  as  the  NameNode,  i.e.  the   applica-on  only  writes  edits  and  those  are  merged   with  a  snapshot  of  the  data.     The  applica-on  does  not  use  HBase  as  an  opera-onal   store,  but  all  data  is  cached  in  memory.     Writes  occasionally  large  chunks,  and  reads  only  a  few   -mes  to  merge  or  recover.  
  • 74   Notes  on  Facebook  Schema  1   Three  column  families:     •  Snapshot,  Ac+ons,  Keywords   Sejngs  changes:   •  DFS  Block  Size:  256MB   •  Since  large  KVs  are  wriNen   •  Efficiency  of  HFile  block  index  a  concern   •  Compac-on  ra-o:  1.4   •  Be  more  aggressive  to  clean  up  files   •  Split  Size:  2TB   •  Manage  splijng  manually   •  Major  Compac-ons:  3  days  
  • 75   Schema 2
  • 76   Notes  on  Facebook  Schema  2   •  Eight  column  families   •  Snapshots  per  thread  (user  to  user)   Sejngs  changes:   •  Block  Cache  Size:  55%   •  Cache  more  data  on  HBase  side   •  Blocking  Store  Files:  25   •  Allow  more  files  to  be  around   •  Compac-on  Min  Size:  4MB   •  Reduce  number  of  uncondi-onally  selected  files   •  Major  Compac-ons:  14  days  
  • 77   Schema 2
  • 78   Notes  on  Facebook  Schema  3   •  Eleven  column  families   •  Twenty  regions  per  server   •  One  hundred  server  per  cluster   Sejngs  changes:   •  Block  Cache  Size:  60%   •  Cache  more  data  on  HBase  side   •  Region  Slop:  5%  (from  20%)   •  Keep  strict  boundaries  on  regions  per  server  
  • 79  
  • 80   Note  the  imbalance!  Recall  flushes  are  interconnected   and  causes  compac-on  storms.    
  • 81   FB  Messages  Summary   •  Triggered  many  changes  in  HBase:   •  Change  compac-on  selec-on  algorithm   •  Upper  bounds  on  file  sizes   •  Pools  for  small  and  large  compac-ons   •  Online  schema  changes   •  Finer  grained  metrics   •  Lazy  seeking  in  files   •  Point-­‐seek  op-miza-ons   •  …  
  • 82   FB  Messages  Summary   •  Went  from  “Snapshot”  to  more  proper  schema   •  Needed  to  wait  for  schema  to  seNle   •  Could  sustain  warped  load  for  a  while   •  Eventually  uses  HBase  more  as  KV  store   •  Tweaked  sejngs  depending  on  schema   •  Tuned  compac-ons  from  aggressive  to  relaxed   •  Changed  block  sizes  to  fit  KV  sizes   •  Strict  limit  on  I/O     •  100  server     •  20  regions  per  server   •  50  million  users  per  cluster  
  • 83   Use-­‐Case  II:  Time  Series  Database  
  • 84   Events  make  big  data  big   •  Majority  use  cases  are  dealing  with  event  based  data   •  Especially  on  HDFS  and  MapReduce  level   •  Machine  Scale  vs.  Human  Scale   •  Event  has  aNributes   •  Type   •  Iden-fier   •  Actor   •  Other  aNributes  
  • 85   Events  contd.   •  Accessing  event  data   •  Give  me  everything  about  event  e_id1   •  Give  me  everything  in  [t1,t2]   •  Give  me  everything  for  event  type  e_t1  in  [t1,t2]   •  Give  me  everything  for  actor  a1  in  [t1,t2]   •  Give  me  everything  for  event  type  e_t1  by  actor  a1  in   [t1,t2]   •  Aggregate  based  on  some  parameters  (like  above)  and   report   •  Find  events  that  match  some  other  given  criteria  
  • 86   HBase  and  Time  Series   •  Access  paNerns  suited  for  HBase   •  Random  access  to  event  data  or  aggregate  data   •  Serving…  Not  real  -me  compu-ng  (that’s  Impala)   •  Schema  design  is  the  tricky  thing   •  OpenTSDB  does  this  well  (but  limited)   •  Key  principle:   •  Collocate  data  you  want  to  read  together   •  Spread  out  as  much  as  possible  at  write  -me   •  The  above  two  are  conflic-ng  in  a  lot  of  cases.  So,  you   decide  on  trade  off  
  • 87   Time  Series  design  paNerns   •  Ingest   •  Flume  or  direct  wri-ng  via  app   •  HDFS   •  Batch  queries  in  Hive   •  Faster  queries  in  Impala     •  No  user  -me  serving   •  HBase   •  Serve  individual  events  (OpenTSDB)   •  Serve  pre-­‐computed  aggregates  (OpenTSDB,  FB  Insights)   •  Solr   •  To  make  individual  events  searchable  
  • 88   Time  Series  design  paNerns   •  Land  data  in  HDFS  and  HBase   •  Aggregate  in  HDFS  and  write  to  HBase   •  HBase  can  do  some  aggregates  too  (counters)   •  Keep  serve-­‐able  data  in  HBase.  Then  discard  (TTL  tw)   •  Keep  all  data  in  HDFS  for  future  use  
  • 89   The  story  with  only  HBase   •  Landing  des-na-on   •  Aggregates  via  counters   •  Serving  end  users   •  Event  -­‐>  Flume/App  -­‐>  HBase   •  Raw  entry  in  HBase  for  exact  value   •  Mul-ple  counter  increments  for  aggregates   •  OSS  implementa-on  -­‐  OpenTSDB  
  • 90   Overall  Summary  
  • 91   Applica-ons  in  HBase   Requires  working  with  schema  peculiari-es  and   implementa-on  idiosyncrasies.     Important  is  to  compute  write  rate  and  un-­‐op+mize   schema  to  fit  given  hardware.  If  hardware  is  no  issue   then  the  op-mum  is  achievable.     Trifacta  of  good  performance:  Compac+ons,  Bloom   Filters,  and  key  design.   (but  also  look  out  for  Memstore  and  Blockcache  sejngs)  
  • 92   Ques-ons?