1
HBase	
  0.96+	
  	
  
A	
  Report	
  on	
  the	
  Current	
  Status	
  
Lars	
  George	
  |	
  EMEA	
  Chief	
  Architect	
  
About	
  Me	
  
•  EMEA	
  Chief	
  Architect	
  @	
  Cloudera	
  
•  ConsulDng	
  on	
  Hadoop	
  projects	
  (everywhere)	
  
•  Apache	
  CommiLer	
  
•  HBase	
  and	
  Whirr	
  
•  O’Reilly	
  Author	
  
•  HBase	
  –	
  The	
  DefiniDve	
  Guide	
  
•  Now	
  in	
  Japanese!	
  
•  Contact	
  
•  lars@cloudera.com	
  
•  @larsgeorge	
  
日本語版も出ました!	
  
The	
  Content...	
  
•  Version	
  History	
  
•  Overview	
  of	
  new	
  Features	
  
•  Summary	
  
CONFIDENTIAL	
  -­‐	
  RESTRICTED	
  
Version	
  History	
  
A	
  Timeline	
  Overview	
  
HBase	
  Releases	
  
5
URL:	
  hLp://s.apache.org/hbase-­‐releases	
  
HBase	
  Releases	
  –	
  Issues	
  Closed	
  (JIRA)	
  
6
URL:	
  hLp://s.apache.org/hbase-­‐releases	
  
HBase	
  Releases	
  –	
  Issues	
  Closed	
  (DisDnct)	
  
7
URL:	
  hLp://s.apache.org/hbase-­‐releases	
  
HBase	
  Book?	
  
I	
  targeted	
  0.92.0	
  but…	
  
	
  
r1130336 | stack | 2011-06-02 00:52:45 +0200 ⤦
(Thu, 02 Jun 2011) | 1 line
Add link to meet up
...
r1234894 | stack | 2012-01-23 17:50:43 +0100 ⤦
(Mon, 23 Jan 2012) | 1 line
Move version on past 0.92.0 to 0.92.1-SNAPSHOT
$ svn log -r 1130336:1234894 | grep "^r" | wc -l
807
8
HBase	
  Book?	
  
I	
  am	
  trailing	
  0.92.0	
  by	
  800+	
  commits,	
  including	
  for	
  
example	
  	
  
r1153634 | tedyu | 2011-08-03 21:59:48 +0200 ⤦
(Wed, 03 Aug 2011) | 2 lines
HBASE-3857 Change the HFile Format (Mikhail & Liyin)
…which	
  is	
  not	
  “unimportant”.	
  J	
  
	
  
I	
  am	
  working	
  on	
  an	
  update!	
  
9
10
Coprocessors	
  and	
  more…	
  
HBase	
  0.92	
  
HBase	
  0.92	
  -­‐	
  Highlights	
  
•  682	
  issues	
  addressed	
  
•  811	
  issues	
  total	
  in	
  0.92.x	
  line	
  
•  New	
  logo!	
   	
  (HBASE-­‐4312)	
  
•  HFile	
  v2	
   	
  (HBASE-­‐3857)	
  
•  Distributed	
  Log	
  Splilng	
   	
  (HBASE-­‐1364)	
  
•  Enhanced	
  Master	
  UI	
  
•  Major	
  compacDon	
  progress	
   	
  (HBASE-­‐3900)	
  
•  Regions	
  in	
  transiDon	
   	
  (HBASE-­‐4291)	
  
•  Tasks	
   	
  (HBASE-­‐3839)	
  
•  Slow	
  Query	
  Metrics	
   	
  (HBASE-­‐4117)	
  
11
HBase	
  0.92	
  -­‐	
  Highlights	
  
•  Coprocessors 	
  (HBASE-­‐2000)	
  
•  Oneap	
  cache 	
  (HBASE-­‐4027)	
  
•  Online	
  Table	
  Schema	
  Change 	
  (HBASE-­‐1730)	
  
•  Regions	
  Size	
  from	
  256MB	
  to	
  1GB 	
  (HBASE-­‐4374)	
  
•  Hadoop	
  1	
  Support 	
  (HBASE-­‐5125)	
  
•  Snappy	
  Support 	
  (HBASE-­‐3691)	
  
•  Keep	
  last	
  version	
  with	
  TTL 	
  (HBASE-­‐4071)	
  
•  MulDthreaded	
  CompacDons 	
  (HBASE-­‐4572)	
  
12
HFile	
  v1	
  –	
  HBase	
  0.90	
  
13
•  Previously	
  the	
  file	
  layout	
  was	
  data	
  blocks,	
  meta	
  blocks	
  
and	
  then	
  file	
  metadata	
  like	
  indexes.	
  
•  Each	
  data	
  block	
  held	
  a	
  magic	
  header	
  and	
  then	
  the	
  
actual	
  data	
  sequenDally.	
  
HFile	
  v2	
  –	
  HBase	
  0.92+	
  
The	
  2nd	
  version	
  of	
  HFile	
  splits	
  the	
  indexes	
  and	
  Bloom	
  
filters	
  up	
  into	
  a	
  hierarchy	
  and	
  interleaves	
  those	
  with	
  
data	
  blocks.	
  
14
The	
  data	
  block	
  header	
  now	
  holds	
  addiDonal	
  info	
  on	
  
the	
  block	
  itself.	
  	
  
	
  
Source:	
  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/	
  
Coprocessors:	
  Observers	
  
15
Coprocessors:	
  RPC	
  Calls	
  
16
Slab	
  Cache	
  –	
  Off-­‐heap	
  Block	
  Cache	
  
17 hLp://blog.cloudera.com/blog/2012/01/caching-­‐in-­‐hbase-­‐slabcache/	
  
•  The	
  off-­‐heap	
  cache	
  uses	
  Java	
  NIO’s	
  Direct	
  ByteBuffer	
  
structures	
  
•  Uses	
  its	
  on	
  slab	
  allocaDon	
  handling	
  
•  Does	
  copy-­‐on-­‐read	
  
during	
  access	
  of	
  data	
  
•  Uses	
  L2	
  cache	
  and	
  
replaces	
  OS	
  buffer	
  
cache	
  
18
Performance	
  Tuning…	
  
HBase	
  0.94	
  
HBase	
  0.94	
  -­‐	
  Highlights	
  
•  420	
  issues	
  addressed	
  
•  1394	
  issues	
  total	
  in	
  0.94.x	
  line	
  
•  Read	
  Caching	
  Improvements 	
  (HBASE-­‐5074)	
  
•  Seek	
  OpDmizaDon	
  
•  Bloom	
  Filter	
  for	
  Delete	
  Family 	
  (HBASE-­‐4532)	
  
•  Lazy	
  Seeks 	
  (HBASE-­‐4465)	
  
•  Write	
  to	
  WAL	
  OpDmizaDons 	
  	
  
•  WAL	
  Compression 	
  (HBASE-­‐4608)	
  
•  Data	
  Block	
  Encoding	
  of	
  KeyValues	
   	
  (HBASE-­‐4218)	
  
•  Improved	
  HBaseFsck 	
  (HBASE-­‐5128)	
  
19
HBase	
  0.94	
  -­‐	
  Highlights	
  
•  Simplified	
  Region	
  Sizing 	
  (HBASE-­‐4365)	
  
•  Smarter	
  TransacDon	
  SemanDcs 	
  	
  
•  Atomic	
  Put&Delete	
  in	
  One	
  Call 	
  (HBASE-­‐3584)	
  
•  Snapshots	
  (0.94.6) 	
  (HBASE-­‐7360)	
  
•  Atomic	
  Appends 	
  (HBASE-­‐4102)	
  
•  MulD	
  Increment	
  and	
  Append 	
  (HBASE-­‐2947)	
  
•  More	
  Aggressive	
  Off-­‐Peak	
  CompacDons	
  (HBASE-­‐4463)	
  
20
HBase	
  0.94	
  -­‐	
  Highlights	
  
•  Per	
  Column	
  Family	
  Metrics 	
  (HBASE-­‐4219)	
  
•  MulD-­‐row	
  local	
  transacDons 	
  (HBASE-­‐5229)	
  
•  Pluggable	
  Split	
  Key	
  Policy 	
  (HBASE-­‐5304)	
  
•  Load	
  balance	
  regions	
  by	
  table 	
  (HBASE-­‐3373)	
  
•  Also	
  backported	
  to	
  0.92.1 	
  	
  
•  Make	
  CompacDon	
  Code	
  Pluggable 	
  (HBASE-­‐6427)	
  
•  Deprecate	
  HTablePool	
  (0.94.11) 	
  (HBASE-­‐6580)	
  
•  Canary	
  Test	
  Tool 	
  (HBASE-­‐4393)	
  
21
Block	
  Encoding	
  
•  Allows	
  to	
  reduce	
  data	
  footprint	
  in	
  memory	
  
•  Only	
  encodes	
  the	
  key	
  porDon	
  of	
  a	
  key/value	
  pair	
  
•  Encoded	
  keys	
  stay	
  encoded	
  also	
  during	
  flushes	
  
•  Compression	
  on	
  top	
  of	
  encoding	
  takes	
  care	
  of	
  the	
  
values	
  and	
  remainder	
  of	
  key	
  data	
  
	
  	
  
Example:	
  
•  Key	
  length:	
  90B	
  
•  Value	
  length:	
  8B	
  
22
Type	
   Ra0o	
  
Key	
  Compression	
   92%	
  
Total	
  Compression	
   85%	
  
LZO	
  on	
  same	
  data	
   85%	
  
LZO	
  axer	
  encoding	
   91%	
  
hLps://issues.apache.org/jira/browse/HBASE-­‐4218	
  
Block	
  Encoding:	
  None	
  
23
Source:	
  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/	
  
•  With	
  no	
  encoding	
  the	
  Key/Value	
  structures	
  are	
  stored	
  
verbaDm	
  (with	
  some	
  overhead	
  for	
  lengths)	
  
•  In	
  the	
  past	
  you	
  were	
  advised	
  to	
  keep	
  the	
  “keys”	
  short	
  
for	
  that	
  reason	
  
	
  
Block	
  Encoding:	
  Prefix	
  Encoding	
  
24
Source:	
  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/	
  
•  The	
  encoding	
  patch	
  added	
  a	
  new	
  Cell	
  abstracDon	
  that	
  
allows	
  for	
  extra	
  fields	
  in	
  a	
  Key/Value	
  
•  The	
  fields	
  are	
  used	
  to	
  track	
  necessary	
  details	
  for	
  the	
  
encoding	
  
Block	
  Encoding:	
  Diff	
  Encoding	
  
25
Source:	
  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/	
  
•  Apart	
  from	
  the	
  prefix	
  encoding	
  there	
  are	
  other	
  ways	
  
of	
  encoding	
  the	
  keys	
  
•  The	
  diff	
  encoding	
  is	
  one	
  of	
  such	
  approaches	
  
Block	
  Encoding	
  
•  Advantage	
  of	
  block	
  encoding	
  is	
  faster	
  decompression/
decoding	
  	
  
•  20-­‐80%	
  faster	
  than	
  LZO	
  
•  Also	
  it	
  allows	
  to	
  seek	
  data	
  sDll,	
  which	
  is	
  not	
  possible	
  
with	
  compressed	
  data	
  
•  Penalty	
  is	
  a	
  slightly	
  slower	
  read	
  performance	
  
compared	
  to	
  non-­‐encoded	
  keys	
  
•  Important	
  is	
  to	
  watch	
  the	
  sizes	
  and	
  repeDDon	
  of	
  key	
  
data,	
  encoding	
  might	
  not	
  be	
  useful	
  for	
  random	
  data	
  
26 hLps://issues.apache.org/jira/browse/HBASE-­‐4218	
  
27
The	
  Singularity	
  
HBase	
  0.96	
  
HBase	
  0.96	
  -­‐	
  Highlights	
  
•  1219	
  issues	
  addressed	
  
•  2243	
  issues	
  total	
  in	
  0.96.x	
  line	
  
•  Improved	
  Stability 	
  (HBASE-­‐6241/6201)	
  
•  ZK	
  based	
  Read/Write	
  locks	
  for	
  table	
  operaDons 	
  (HBASE-­‐7305)	
  
•  Scalability	
  Improvements 	
  (HBASE-­‐8877)	
  
•  Schema	
  Storage 	
  (HBASE-­‐8778)	
  
•  Log	
  Cleaner	
  for	
  ReplicaDon	
  Speed	
  Up 	
  (HBASE-­‐9208)	
  
•  Mean-­‐Time-­‐To-­‐Recovery	
  (MTTR)	
  Improvements
	
  (HBASE-­‐5844/5926)	
  
•  Distributed	
  Log	
  Replay 	
  (HBASE-­‐7006)	
  
•  Dedicated	
  WAL	
  for	
  System	
  Table 	
  (HBASE-­‐7213/8631)	
  
28
HBase	
  0.96	
  -­‐	
  Highlights	
  
•  Operability	
  Improvements	
  
•  Hooks	
  for	
  Health	
  Scripts 	
  (HBASE-­‐7399/7406)	
  
•  Trace	
  Lagging	
  Calls	
  with	
  HTrace 	
  (HBASE-­‐9121)	
  
•  Versioned	
  RPCs	
  and	
  Metadata	
  (Protobufs)	
   	
  (HBASE-­‐3505)	
  
•  Parallel	
  Seeks	
  in	
  Stores 	
  (HBASE-­‐7495)	
  
•  Hadoop	
  1	
  and	
  2	
  Support 	
  	
  
•  Secure	
  Short	
  Circuit	
  Reads	
  on	
  H2 	
  (HBASE-­‐6783)	
  
•  Namespaces	
  Support 	
  (HBASE-­‐8015)	
  
•  New	
  Metrics	
  v2 	
  (HBASE-­‐4050)	
  
•  Cell	
  Interface	
  vs	
  KeyValue 	
  (HBASE-­‐7162)	
  
29
HBase	
  0.96	
  -­‐	
  Highlights	
  
•  No	
  more	
  ROOT	
  table 	
  (HBASE-­‐3171)	
  
•  Remove	
  HFile	
  v1 	
  (HBASE-­‐7660)	
  
•  Trie	
  Data	
  Block	
  Encoding	
   	
  (HBASE-­‐4676)	
  
•  Remove	
  Client-­‐side	
  Row	
  Locks 	
  (HBASE-­‐7263/7315)	
  
•  CompacDon	
  and	
  Flush	
  Improvements
	
  (HBASE-­‐7516/7763/6466/7678)	
  
	
  (HBASE-­‐7667/7110/7603/7519/7842)	
  
•  Improved	
  Default	
  ConfiguraDon 	
  (HBASE-­‐4657?)	
  
•  Client-­‐side	
  Type	
  Library 	
  (HBASE-­‐8089)	
  
	
  
30
HBase	
  0.96	
  -­‐	
  Highlights	
  
•  Online	
  Region	
  Merging 	
  (HBASE-­‐7403/8219)	
  
•  Bucket	
  Cache	
  Support 	
  (HBASE-­‐7404)	
  
•  Remove	
  older	
  ICV	
  Calls 	
  (HBASE-­‐7032)	
  
•  New	
  “Bootstrap”	
  based	
  UIs!	
   	
  (HBASE-­‐6135)	
  
•  Remove	
  Client-­‐side	
  Row	
  Locks 	
  (HBASE-­‐7263/7315)	
  
•  CompacDon	
  and	
  Flush	
  Improvements
	
  (HBASE-­‐7516/7763/6466/7678)	
  
	
  (HBASE-­‐7667/7110/7603/7519/7842)	
  
	
  
31
32
—	
  Michael	
  Stack,	
  HBase	
  PMC	
  Chair	
  
Mean-­‐Time-­‐To-­‐Recovery	
  (MTTR)	
  
•  Lot‘s	
  of	
  effort	
  put	
  into	
  improve	
  how	
  long	
  data	
  might	
  
not	
  be	
  accessible	
  during	
  a	
  region	
  move	
  
•  The	
  offline	
  period	
  is	
  made	
  up	
  of	
  phases:	
  	
  
•  a	
  detecDon	
  phase,	
  	
  
•  a	
  repair	
  phase,	
  	
  
•  reassignment,	
  and	
  finally,	
  	
  
•  clients	
  noDcing	
  the	
  data	
  available	
  in	
  its	
  new	
  locaDon	
  
•  Improvements	
  in	
  many	
  of	
  those	
  areas	
  
•  Faster	
  detecDon,	
  efficient	
  repair,	
  parallel	
  replay	
  
•  Dedicated	
  WAL	
  for	
  system	
  tables	
  
33 hLps://blog.cloudera.com/blog/2013/10/hbase-­‐0-­‐96-­‐0-­‐released/	
  
34
Cell	
  Level	
  Security	
  
HBase	
  0.98	
  
HBase	
  0.98	
  -­‐	
  Highlights	
  
•  1303	
  issues	
  addressed	
  
•  1458	
  issues	
  total	
  in	
  0.98.x	
  line	
  
•  Cell	
  Level	
  Security 	
  (HBASE-­‐6222/7663/7662)	
  
•  Server-­‐side	
  EncrypDon 	
  (HBASE-­‐7544)	
  
•  WAL	
  Throughput	
  Improvements 	
  (HBASE-­‐8755)	
  
•  Reverse	
  Scanner 	
  (HBASE-­‐4811)	
  
•  MapReduce	
  over	
  Snapshot	
  Files 	
  (HBASE-­‐8369)	
  
•  Striped	
  CompacDons 	
  (HBASE-­‐7667)	
  
•  ThroLle	
  ReplicaDon 	
  (HBASE-­‐9501)	
  
35
Cell	
  Level	
  Security	
  
•  Added	
  HFile	
  v3	
  which	
  can	
  store	
  arbitrary	
  metadata	
  in	
  
a	
  cell,	
  called	
  tags	
  
•  Also	
  extended	
  ACL	
  checks	
  to	
  apply	
  to	
  cell	
  levels	
  
•  With	
  this	
  visibility	
  labels	
  can	
  be	
  stored	
  in	
  tags	
  
•  An	
  API	
  and	
  CLI	
  tools	
  are	
  provided	
  that	
  are	
  akin	
  to	
  
Accumulo’s,	
  axer	
  which	
  it	
  is	
  modeled	
  
•  AddiDonal	
  encrypDon	
  of	
  data	
  at	
  rest	
  ensures	
  further	
  
security	
  of	
  sensiDve	
  data	
  
36 hLps://blogs.apache.org/hbase/entry/hbase_cell_security	
  
Visibility	
  Labels	
  
The	
  API	
  allows	
  to	
  set	
  visibility	
  by	
  using	
  expressions	
  with	
  
“&”,	
  “|”,	
  and	
  “!”,	
  as	
  well	
  as	
  “(“	
  and	
  “)”,	
  e.g.	
  label	
  set	
  of	
  
{	
  confidenDal,	
  secret,	
  topsecret,	
  probaDonary	
  }	
  could	
  
be	
  combined	
  as	
  	
  
( secret | topsecret ) & !probationary
	
  
At	
  runDme	
  the	
  expressions	
  are	
  evaluated	
  against	
  a	
  user	
  
and	
  then	
  applied	
  to	
  each	
  cell.	
  
37
38
The	
  Future…	
  
HBase	
  0.??	
  
HBase	
  Future	
  
•  Not	
  much	
  is	
  wriDng	
  in	
  stone	
  yet	
  
•  Master	
  gets	
  rewriLen	
  and	
  also	
  META	
  table	
  handling	
  
•  Build	
  in	
  consensus 	
  (HBASE-­‐10296)	
  
•  Co-­‐locate	
  Master	
  and	
  META 	
  (HBASE-­‐10569)	
  
•  MTTR	
  is	
  further	
  extended	
  into	
  interesDng	
  areas	
  
•  Read	
  replicas 	
  (HBASE-­‐10070)	
  
It	
  has	
  to	
  be	
  seen	
  when	
  1.0.0	
  is	
  released	
  and	
  what	
  it	
  
contains.	
  Your	
  opinion	
  counts!	
  
39
40
Ques0ons?	
  
@larsgeorge	
  

HBase Status Report - Hadoop Summit Europe 2014

  • 1.
    1 HBase  0.96+     A  Report  on  the  Current  Status   Lars  George  |  EMEA  Chief  Architect  
  • 2.
    About  Me   • EMEA  Chief  Architect  @  Cloudera   •  ConsulDng  on  Hadoop  projects  (everywhere)   •  Apache  CommiLer   •  HBase  and  Whirr   •  O’Reilly  Author   •  HBase  –  The  DefiniDve  Guide   •  Now  in  Japanese!   •  Contact   •  lars@cloudera.com   •  @larsgeorge   日本語版も出ました!  
  • 3.
    The  Content...   • Version  History   •  Overview  of  new  Features   •  Summary  
  • 4.
    CONFIDENTIAL  -­‐  RESTRICTED   Version  History   A  Timeline  Overview  
  • 5.
    HBase  Releases   5 URL:  hLp://s.apache.org/hbase-­‐releases  
  • 6.
    HBase  Releases  –  Issues  Closed  (JIRA)   6 URL:  hLp://s.apache.org/hbase-­‐releases  
  • 7.
    HBase  Releases  –  Issues  Closed  (DisDnct)   7 URL:  hLp://s.apache.org/hbase-­‐releases  
  • 8.
    HBase  Book?   I  targeted  0.92.0  but…     r1130336 | stack | 2011-06-02 00:52:45 +0200 ⤦ (Thu, 02 Jun 2011) | 1 line Add link to meet up ... r1234894 | stack | 2012-01-23 17:50:43 +0100 ⤦ (Mon, 23 Jan 2012) | 1 line Move version on past 0.92.0 to 0.92.1-SNAPSHOT $ svn log -r 1130336:1234894 | grep "^r" | wc -l 807 8
  • 9.
    HBase  Book?   I  am  trailing  0.92.0  by  800+  commits,  including  for   example     r1153634 | tedyu | 2011-08-03 21:59:48 +0200 ⤦ (Wed, 03 Aug 2011) | 2 lines HBASE-3857 Change the HFile Format (Mikhail & Liyin) …which  is  not  “unimportant”.  J     I  am  working  on  an  update!   9
  • 10.
  • 11.
    HBase  0.92  -­‐  Highlights   •  682  issues  addressed   •  811  issues  total  in  0.92.x  line   •  New  logo!    (HBASE-­‐4312)   •  HFile  v2    (HBASE-­‐3857)   •  Distributed  Log  Splilng    (HBASE-­‐1364)   •  Enhanced  Master  UI   •  Major  compacDon  progress    (HBASE-­‐3900)   •  Regions  in  transiDon    (HBASE-­‐4291)   •  Tasks    (HBASE-­‐3839)   •  Slow  Query  Metrics    (HBASE-­‐4117)   11
  • 12.
    HBase  0.92  -­‐  Highlights   •  Coprocessors  (HBASE-­‐2000)   •  Oneap  cache  (HBASE-­‐4027)   •  Online  Table  Schema  Change  (HBASE-­‐1730)   •  Regions  Size  from  256MB  to  1GB  (HBASE-­‐4374)   •  Hadoop  1  Support  (HBASE-­‐5125)   •  Snappy  Support  (HBASE-­‐3691)   •  Keep  last  version  with  TTL  (HBASE-­‐4071)   •  MulDthreaded  CompacDons  (HBASE-­‐4572)   12
  • 13.
    HFile  v1  –  HBase  0.90   13 •  Previously  the  file  layout  was  data  blocks,  meta  blocks   and  then  file  metadata  like  indexes.   •  Each  data  block  held  a  magic  header  and  then  the   actual  data  sequenDally.  
  • 14.
    HFile  v2  –  HBase  0.92+   The  2nd  version  of  HFile  splits  the  indexes  and  Bloom   filters  up  into  a  hierarchy  and  interleaves  those  with   data  blocks.   14 The  data  block  header  now  holds  addiDonal  info  on   the  block  itself.       Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/  
  • 15.
  • 16.
  • 17.
    Slab  Cache  –  Off-­‐heap  Block  Cache   17 hLp://blog.cloudera.com/blog/2012/01/caching-­‐in-­‐hbase-­‐slabcache/   •  The  off-­‐heap  cache  uses  Java  NIO’s  Direct  ByteBuffer   structures   •  Uses  its  on  slab  allocaDon  handling   •  Does  copy-­‐on-­‐read   during  access  of  data   •  Uses  L2  cache  and   replaces  OS  buffer   cache  
  • 18.
  • 19.
    HBase  0.94  -­‐  Highlights   •  420  issues  addressed   •  1394  issues  total  in  0.94.x  line   •  Read  Caching  Improvements  (HBASE-­‐5074)   •  Seek  OpDmizaDon   •  Bloom  Filter  for  Delete  Family  (HBASE-­‐4532)   •  Lazy  Seeks  (HBASE-­‐4465)   •  Write  to  WAL  OpDmizaDons     •  WAL  Compression  (HBASE-­‐4608)   •  Data  Block  Encoding  of  KeyValues    (HBASE-­‐4218)   •  Improved  HBaseFsck  (HBASE-­‐5128)   19
  • 20.
    HBase  0.94  -­‐  Highlights   •  Simplified  Region  Sizing  (HBASE-­‐4365)   •  Smarter  TransacDon  SemanDcs     •  Atomic  Put&Delete  in  One  Call  (HBASE-­‐3584)   •  Snapshots  (0.94.6)  (HBASE-­‐7360)   •  Atomic  Appends  (HBASE-­‐4102)   •  MulD  Increment  and  Append  (HBASE-­‐2947)   •  More  Aggressive  Off-­‐Peak  CompacDons  (HBASE-­‐4463)   20
  • 21.
    HBase  0.94  -­‐  Highlights   •  Per  Column  Family  Metrics  (HBASE-­‐4219)   •  MulD-­‐row  local  transacDons  (HBASE-­‐5229)   •  Pluggable  Split  Key  Policy  (HBASE-­‐5304)   •  Load  balance  regions  by  table  (HBASE-­‐3373)   •  Also  backported  to  0.92.1     •  Make  CompacDon  Code  Pluggable  (HBASE-­‐6427)   •  Deprecate  HTablePool  (0.94.11)  (HBASE-­‐6580)   •  Canary  Test  Tool  (HBASE-­‐4393)   21
  • 22.
    Block  Encoding   • Allows  to  reduce  data  footprint  in  memory   •  Only  encodes  the  key  porDon  of  a  key/value  pair   •  Encoded  keys  stay  encoded  also  during  flushes   •  Compression  on  top  of  encoding  takes  care  of  the   values  and  remainder  of  key  data       Example:   •  Key  length:  90B   •  Value  length:  8B   22 Type   Ra0o   Key  Compression   92%   Total  Compression   85%   LZO  on  same  data   85%   LZO  axer  encoding   91%   hLps://issues.apache.org/jira/browse/HBASE-­‐4218  
  • 23.
    Block  Encoding:  None   23 Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/   •  With  no  encoding  the  Key/Value  structures  are  stored   verbaDm  (with  some  overhead  for  lengths)   •  In  the  past  you  were  advised  to  keep  the  “keys”  short   for  that  reason    
  • 24.
    Block  Encoding:  Prefix  Encoding   24 Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/   •  The  encoding  patch  added  a  new  Cell  abstracDon  that   allows  for  extra  fields  in  a  Key/Value   •  The  fields  are  used  to  track  necessary  details  for  the   encoding  
  • 25.
    Block  Encoding:  Diff  Encoding   25 Source:  hLp://blog.cloudera.com/blog/2012/06/hbase-­‐io-­‐hfile-­‐input-­‐output/   •  Apart  from  the  prefix  encoding  there  are  other  ways   of  encoding  the  keys   •  The  diff  encoding  is  one  of  such  approaches  
  • 26.
    Block  Encoding   • Advantage  of  block  encoding  is  faster  decompression/ decoding     •  20-­‐80%  faster  than  LZO   •  Also  it  allows  to  seek  data  sDll,  which  is  not  possible   with  compressed  data   •  Penalty  is  a  slightly  slower  read  performance   compared  to  non-­‐encoded  keys   •  Important  is  to  watch  the  sizes  and  repeDDon  of  key   data,  encoding  might  not  be  useful  for  random  data   26 hLps://issues.apache.org/jira/browse/HBASE-­‐4218  
  • 27.
  • 28.
    HBase  0.96  -­‐  Highlights   •  1219  issues  addressed   •  2243  issues  total  in  0.96.x  line   •  Improved  Stability  (HBASE-­‐6241/6201)   •  ZK  based  Read/Write  locks  for  table  operaDons  (HBASE-­‐7305)   •  Scalability  Improvements  (HBASE-­‐8877)   •  Schema  Storage  (HBASE-­‐8778)   •  Log  Cleaner  for  ReplicaDon  Speed  Up  (HBASE-­‐9208)   •  Mean-­‐Time-­‐To-­‐Recovery  (MTTR)  Improvements  (HBASE-­‐5844/5926)   •  Distributed  Log  Replay  (HBASE-­‐7006)   •  Dedicated  WAL  for  System  Table  (HBASE-­‐7213/8631)   28
  • 29.
    HBase  0.96  -­‐  Highlights   •  Operability  Improvements   •  Hooks  for  Health  Scripts  (HBASE-­‐7399/7406)   •  Trace  Lagging  Calls  with  HTrace  (HBASE-­‐9121)   •  Versioned  RPCs  and  Metadata  (Protobufs)    (HBASE-­‐3505)   •  Parallel  Seeks  in  Stores  (HBASE-­‐7495)   •  Hadoop  1  and  2  Support     •  Secure  Short  Circuit  Reads  on  H2  (HBASE-­‐6783)   •  Namespaces  Support  (HBASE-­‐8015)   •  New  Metrics  v2  (HBASE-­‐4050)   •  Cell  Interface  vs  KeyValue  (HBASE-­‐7162)   29
  • 30.
    HBase  0.96  -­‐  Highlights   •  No  more  ROOT  table  (HBASE-­‐3171)   •  Remove  HFile  v1  (HBASE-­‐7660)   •  Trie  Data  Block  Encoding    (HBASE-­‐4676)   •  Remove  Client-­‐side  Row  Locks  (HBASE-­‐7263/7315)   •  CompacDon  and  Flush  Improvements  (HBASE-­‐7516/7763/6466/7678)    (HBASE-­‐7667/7110/7603/7519/7842)   •  Improved  Default  ConfiguraDon  (HBASE-­‐4657?)   •  Client-­‐side  Type  Library  (HBASE-­‐8089)     30
  • 31.
    HBase  0.96  -­‐  Highlights   •  Online  Region  Merging  (HBASE-­‐7403/8219)   •  Bucket  Cache  Support  (HBASE-­‐7404)   •  Remove  older  ICV  Calls  (HBASE-­‐7032)   •  New  “Bootstrap”  based  UIs!    (HBASE-­‐6135)   •  Remove  Client-­‐side  Row  Locks  (HBASE-­‐7263/7315)   •  CompacDon  and  Flush  Improvements  (HBASE-­‐7516/7763/6466/7678)    (HBASE-­‐7667/7110/7603/7519/7842)     31
  • 32.
    32 —  Michael  Stack,  HBase  PMC  Chair  
  • 33.
    Mean-­‐Time-­‐To-­‐Recovery  (MTTR)   • Lot‘s  of  effort  put  into  improve  how  long  data  might   not  be  accessible  during  a  region  move   •  The  offline  period  is  made  up  of  phases:     •  a  detecDon  phase,     •  a  repair  phase,     •  reassignment,  and  finally,     •  clients  noDcing  the  data  available  in  its  new  locaDon   •  Improvements  in  many  of  those  areas   •  Faster  detecDon,  efficient  repair,  parallel  replay   •  Dedicated  WAL  for  system  tables   33 hLps://blog.cloudera.com/blog/2013/10/hbase-­‐0-­‐96-­‐0-­‐released/  
  • 34.
    34 Cell  Level  Security   HBase  0.98  
  • 35.
    HBase  0.98  -­‐  Highlights   •  1303  issues  addressed   •  1458  issues  total  in  0.98.x  line   •  Cell  Level  Security  (HBASE-­‐6222/7663/7662)   •  Server-­‐side  EncrypDon  (HBASE-­‐7544)   •  WAL  Throughput  Improvements  (HBASE-­‐8755)   •  Reverse  Scanner  (HBASE-­‐4811)   •  MapReduce  over  Snapshot  Files  (HBASE-­‐8369)   •  Striped  CompacDons  (HBASE-­‐7667)   •  ThroLle  ReplicaDon  (HBASE-­‐9501)   35
  • 36.
    Cell  Level  Security   •  Added  HFile  v3  which  can  store  arbitrary  metadata  in   a  cell,  called  tags   •  Also  extended  ACL  checks  to  apply  to  cell  levels   •  With  this  visibility  labels  can  be  stored  in  tags   •  An  API  and  CLI  tools  are  provided  that  are  akin  to   Accumulo’s,  axer  which  it  is  modeled   •  AddiDonal  encrypDon  of  data  at  rest  ensures  further   security  of  sensiDve  data   36 hLps://blogs.apache.org/hbase/entry/hbase_cell_security  
  • 37.
    Visibility  Labels   The  API  allows  to  set  visibility  by  using  expressions  with   “&”,  “|”,  and  “!”,  as  well  as  “(“  and  “)”,  e.g.  label  set  of   {  confidenDal,  secret,  topsecret,  probaDonary  }  could   be  combined  as     ( secret | topsecret ) & !probationary   At  runDme  the  expressions  are  evaluated  against  a  user   and  then  applied  to  each  cell.   37
  • 38.
  • 39.
    HBase  Future   • Not  much  is  wriDng  in  stone  yet   •  Master  gets  rewriLen  and  also  META  table  handling   •  Build  in  consensus  (HBASE-­‐10296)   •  Co-­‐locate  Master  and  META  (HBASE-­‐10569)   •  MTTR  is  further  extended  into  interesDng  areas   •  Read  replicas  (HBASE-­‐10070)   It  has  to  be  seen  when  1.0.0  is  released  and  what  it   contains.  Your  opinion  counts!   39
  • 40.