SlideShare a Scribd company logo
1 of 15
Download to read offline
Lift the Ceiling of Throughputs
Yu Li, Lijin Bin
{jueding.ly, tianzhao.blj}
@alibaba-inc.com
Agenda  
n What/Where/When  
l  History  of  HBase  in  Alibaba  Search  
n Why  
l  Throughputs  mean  a  lot  
n How  
l  Lift  the  ceiling  of  read  throughputs  
l  Lift  the  ceiling  of  write  throughputs  
n About  future  
HBase  in  Alibaba  Search  
n HBase  is  the  core  storage  in  Alibaba  search  system,  since  2010  
n History  of  version  used  online  
l  2010~2014:  0.20.6à0.90.3à0.92.1à0.94.1à0.94.2à0.94.5  
l  2014~2015:  0.94à0.98.1à0.98.4à0.98.8à0.98.12  
l  2016:  0.98.12à1.1.2  
n Cluster  scale  and  use  case  
l  Multiple  clusters,  largest  with  more  than  1,500  nodes  
l  Co-located  with  Flink/Yarn,  serving  over  40Million/s  Ops  throughout  the  day  
l  Main  source/sink  for  search  and  machine  learning  platform  
Throughputs  mean  a  lot  
n Machine  learning  generates  huge  workloads  
l  Both  read  and  write,  no  upper  limit  
l  Both  IO  and  CPU  bound  
n Throughputs  decides  the  speed  of  ML  processing  
l  More  throughputs  means  more  iterations  in  a  time  unit  
n Speed  of  processing  decides  accuracy  of  decision  made  
l  Recommendation  quality  
l  Fraud  detection  accuracy  
Lift  ceiling  of  read  throughput  
n NettyRpcServer  (HBASE-17263)  
l  Why  Netty?  
n  Enlightened  by  real  world  suffering  
l  HBASE-11297  
n  Better  thread  model  and  performance  
l  Effect  
n  Online  RT  under  high  pressure:  0.92msà0.25ms  
n  Throughputs  almost  doubled  
 
 
 
 
Lift  ceiling  of  read  throughput  
n NettyRpcServer  (HBASE-17263)  
l  Why  Netty?  
n  Enlightened  by  real  world  suffering  
l  HBASE-11297  
n  Better  thread  model  and  performance  
l  Effect  
n  Online  RT  under  high  pressure:  0.92msà0.25ms  
n  Throughputs  almost  doubled  
 
 
 
Lift  ceiling  of  read  throughput  (con’t)  
n RowIndexDBE  (HBASE-16213)  
l  Why  
n  Seek  in  the  row  when  random  reading  is  one  of  the  main  consumers  of  CPU  
n  All  DBE  except  Prefix  Tree  use  sequential  search.  
l  How  
n  Add  row  index  in  a  HFileBlock  for  binary  search.  (HBASE-16213)  
l  Effect  
n  Use  less  CPU  and  improve  throughput,  KeyValues64B,  increased  10%  
Lift  ceiling  of  read  throughput  (con’t)  
n End-to-end  read  path  offheap  
l  Why  
n  Advanced  disk  IO  capability  cause  quicker  cache  eviction  
n  Suffering  from  GC  caused  by  on-heap  copy  
l  How  
n  Backport  E2E  read-path  offheap  to  branch-1  (HBASE-17138)  
n  More  details  please  refer  to  Anoop/Ram’s  session  
l  Effect  
n  Throughput  increased  30%  
n  Much  more  stable,  less  spike  

More Related Content

More from HBaseCon

hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comHBaseCon
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architectureHBaseCon
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at HuaweiHBaseCon
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMiHBaseCon
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0HBaseCon
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon
 
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon
 
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...HBaseCon
 
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...HBaseCon
 

More from HBaseCon (20)

hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 
HBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environmentHBaseCon2017 Improving HBase availability in a multi tenant environment
HBaseCon2017 Improving HBase availability in a multi tenant environment
 
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
HBaseCon2017 Spark HBase Connector: Feature Rich and Efficient Access to HBas...
 
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBaseHBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
HBaseCon2017 Efficient and portable data processing with Apache Beam and HBase
 
HBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at XiaomiHBaseCon2017 HBase at Xiaomi
HBaseCon2017 HBase at Xiaomi
 
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ SalesforceHBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
HBaseCon2017 HBase/Phoenix @ Scale @ Salesforce
 
HBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraphHBaseCon2017 Community-Driven Graphs with JanusGraph
HBaseCon2017 Community-Driven Graphs with JanusGraph
 
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...
HBaseCon2017 Warp 10, a novel approach to managing and analyzing time series ...
 
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
 

hbaseconasia2017: Lift the ceiling of HBase throughputs

  • 1. Lift the Ceiling of Throughputs Yu Li, Lijin Bin {jueding.ly, tianzhao.blj} @alibaba-inc.com
  • 2. Agenda   n What/Where/When   l  History  of  HBase  in  Alibaba  Search   n Why   l  Throughputs  mean  a  lot   n How   l  Lift  the  ceiling  of  read  throughputs   l  Lift  the  ceiling  of  write  throughputs   n About  future  
  • 3. HBase  in  Alibaba  Search   n HBase  is  the  core  storage  in  Alibaba  search  system,  since  2010   n History  of  version  used  online   l  2010~2014:  0.20.6à0.90.3à0.92.1à0.94.1à0.94.2à0.94.5   l  2014~2015:  0.94à0.98.1à0.98.4à0.98.8à0.98.12   l  2016:  0.98.12à1.1.2   n Cluster  scale  and  use  case   l  Multiple  clusters,  largest  with  more  than  1,500  nodes   l  Co-located  with  Flink/Yarn,  serving  over  40Million/s  Ops  throughout  the  day   l  Main  source/sink  for  search  and  machine  learning  platform  
  • 4. Throughputs  mean  a  lot   n Machine  learning  generates  huge  workloads   l  Both  read  and  write,  no  upper  limit   l  Both  IO  and  CPU  bound   n Throughputs  decides  the  speed  of  ML  processing   l  More  throughputs  means  more  iterations  in  a  time  unit   n Speed  of  processing  decides  accuracy  of  decision  made   l  Recommendation  quality   l  Fraud  detection  accuracy  
  • 5. Lift  ceiling  of  read  throughput   n NettyRpcServer  (HBASE-17263)   l  Why  Netty?   n  Enlightened  by  real  world  suffering   l  HBASE-11297   n  Better  thread  model  and  performance   l  Effect   n  Online  RT  under  high  pressure:  0.92msà0.25ms   n  Throughputs  almost  doubled  
  • 6.  
  • 7.  
  • 8.  
  • 9.  
  • 10. Lift  ceiling  of  read  throughput   n NettyRpcServer  (HBASE-17263)   l  Why  Netty?   n  Enlightened  by  real  world  suffering   l  HBASE-11297   n  Better  thread  model  and  performance   l  Effect   n  Online  RT  under  high  pressure:  0.92msà0.25ms   n  Throughputs  almost  doubled  
  • 11.  
  • 12.  
  • 13.  
  • 14. Lift  ceiling  of  read  throughput  (con’t)   n RowIndexDBE  (HBASE-16213)   l  Why   n  Seek  in  the  row  when  random  reading  is  one  of  the  main  consumers  of  CPU   n  All  DBE  except  Prefix  Tree  use  sequential  search.   l  How   n  Add  row  index  in  a  HFileBlock  for  binary  search.  (HBASE-16213)   l  Effect   n  Use  less  CPU  and  improve  throughput,  KeyValues64B,  increased  10%  
  • 15. Lift  ceiling  of  read  throughput  (con’t)   n End-to-end  read  path  offheap   l  Why   n  Advanced  disk  IO  capability  cause  quicker  cache  eviction   n  Suffering  from  GC  caused  by  on-heap  copy   l  How   n  Backport  E2E  read-path  offheap  to  branch-1  (HBASE-17138)   n  More  details  please  refer  to  Anoop/Ram’s  session   l  Effect   n  Throughput  increased  30%   n  Much  more  stable,  less  spike  
  • 16. Lift  ceiling  of  read  throughput  (con’t)   n End-to-end  read  path  offheap   l  Before                 l  After  
  • 17. Lift  ceiling  of  write  throughput   n MVCC  pre-assign  (HBASE-16698,  HBASE-17509/17471)   l  Why   n  Issue  located  from  real  world  suffering:  no  more  active  handler   n  MVCC  is  assigned  after  WAL  append   n  WAL  append  is  designed  to  be  RS-level  sequential,  thus  throughput  limited   l  How   n  Assign  mvcc  before  WAL  append,  meanwhile  assure  the  append  order   l  Original  designed  to  use  lock  inside  FSHLog  (HBASE-16698)   l  Improved  by  generating  sequence  id  inside  MVCC  existing  lock  (HBASE-17471)   l  Effect   n  SYNC_WAL  throughput  improved  30%,ASYNC_WAL  even  more  (70%)  
  • 18. Lift  ceiling  of  write  throughput  (cont’d)   n Refine  the  write  path  (Experimenting)   l  Why   n  Far  from  taking  full  usage  of  IO  capacity  of  new  hardware  like  PCIe-SSD   n  WAL  sync  is  IO-bound,  while  RPC  handling  is  CPU-bound   l  Write  handlers  should  be  non-blocking:  do  not  wait  for  sync   l  Respond  asynchronously   n  WAL  append  is  sequential,  while  region  puts  are  parallel   l  Unnecessary  context  switch   n  WAL  append  is  IO-bound,  while  MemStore  insertion  is  CPU-bound   l  Possible  to  parallelize?  
  • 19. Lift  ceiling  of  write  throughput  (cont’d)   n Refine  the  write  path  (Experimenting)   l  How   n  Break  the  write  path  into  3  stages   l  Pre-append,  sync,  post-sync   l  Buffer/queue  between  stages   n  Handlers  only  handle  pre-append  stage,  respond  in  post-sync  stage   n  Bind  regions  to  specific  handler   l  Reduce  unnecessary  context  switch  
  • 20. Lift  ceiling  of  write  throughput  (cont’d)   n Refine  the  write  path  (Experimenting)   l  Effect  (Lab  data)   n  Throughput  tripled:  140K  à  420K  with  PCIe-SSD   l  TODO   n  Currently  PCIe-SSD  IO  util  only  reached  20%,  much  more  space  to  improve   n  Integration  with  write-path  offheap  –  more  to  expect   n  Upstream  the  work  after  it’s  verified  online  
  • 21. About  Future   n HBase  is  still  a  kid  –  only  10  years’old   l  More  ceilings  to  break   n  Improving,  but  still  long  way  to  go   n  Far  from  fully  utilizing  the  hardware  capability,  no  matter  CPU  or  IO   l  More  scenarios  to  try   n  Embedded-mode  (HBASE-17743)   l  More  to  expect   n  2.0  coming,  3.0  in  plan   n Hopefully  more  community  involvement  from  Asia   l  More  upstream,  less  private  
  • 22. Q    A   Thank  You!