Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

hbaseconasia2017: Lift the ceiling of HBase throughputs

Yu Li and Lijin Bin

HBase is the core storage of Alibaba's search infrastructure and meets big challenge on improving its throughputs, which decides the speed of machine learning program processing thus the accuracy of recommendations made. In this session we will talk about work done and in progress to increase both read and write throughputs, as well as the real performance on the past Singles' Day and latest benchmark data in laboratory.

hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#

  • Login to see the comments

hbaseconasia2017: Lift the ceiling of HBase throughputs

  1. 1. Lift the Ceiling of Throughputs Yu Li, Lijin Bin {jueding.ly, tianzhao.blj} @alibaba-inc.com
  2. 2. Agenda   n What/Where/When   l  History  of  HBase  in  Alibaba  Search   n Why   l  Throughputs  mean  a  lot   n How   l  Lift  the  ceiling  of  read  throughputs   l  Lift  the  ceiling  of  write  throughputs   n About  future  
  3. 3. HBase  in  Alibaba  Search   n HBase  is  the  core  storage  in  Alibaba  search  system,  since  2010   n History  of  version  used  online   l  2010~2014:  0.20.6à0.90.3à0.92.1à0.94.1à0.94.2à0.94.5   l  2014~2015:  0.94à0.98.1à0.98.4à0.98.8à0.98.12   l  2016:  0.98.12à1.1.2   n Cluster  scale  and  use  case   l  Multiple  clusters,  largest  with  more  than  1,500  nodes   l  Co-located  with  Flink/Yarn,  serving  over  40Million/s  Ops  throughout  the  day   l  Main  source/sink  for  search  and  machine  learning  platform  
  4. 4. Throughputs  mean  a  lot   n Machine  learning  generates  huge  workloads   l  Both  read  and  write,  no  upper  limit   l  Both  IO  and  CPU  bound   n Throughputs  decides  the  speed  of  ML  processing   l  More  throughputs  means  more  iterations  in  a  time  unit   n Speed  of  processing  decides  accuracy  of  decision  made   l  Recommendation  quality   l  Fraud  detection  accuracy  
  5. 5. Lift  ceiling  of  read  throughput   n NettyRpcServer  (HBASE-17263)   l  Why  Netty?   n  Enlightened  by  real  world  suffering   l  HBASE-11297   n  Better  thread  model  and  performance   l  Effect   n  Online  RT  under  high  pressure:  0.92msà0.25ms   n  Throughputs  almost  doubled  

×