hbaseconasia2017: Lift the ceiling of HBase throughputs

1. Lift the Ceiling of Throughputs Yu Li, Lijin Bin {jueding.ly, tianzhao.blj} @alibaba-inc.com

2. Agenda n What/Where/When l  History of HBase in Alibaba Search n Why l  Throughputs mean a lot n How l  Lift the ceiling of read throughputs l  Lift the ceiling of write throughputs n About future

3. HBase in Alibaba Search n HBase is the core storage in Alibaba search system, since 2010 n History of version used online l  2010~2014: 0.20.6à0.90.3à0.92.1à0.94.1à0.94.2à0.94.5 l  2014~2015: 0.94à0.98.1à0.98.4à0.98.8à0.98.12 l  2016: 0.98.12à1.1.2 n Cluster scale and use case l  Multiple clusters, largest with more than 1,500 nodes l  Co-located with Flink/Yarn, serving over 40Million/s Ops throughout the day l  Main source/sink for search and machine learning platform

4. Throughputs mean a lot n Machine learning generates huge workloads l  Both read and write, no upper limit l  Both IO and CPU bound n Throughputs decides the speed of ML processing l  More throughputs means more iterations in a time unit n Speed of processing decides accuracy of decision made l  Recommendation quality l  Fraud detection accuracy

5. Lift ceiling of read throughput n NettyRpcServer (HBASE-17263) l  Why Netty? n  Enlightened by real world suffering l  HBASE-11297 n  Better thread model and performance l  Effect n  Online RT under high pressure: 0.92msà0.25ms n  Throughputs almost doubled

10. Lift ceiling of read throughput n NettyRpcServer (HBASE-17263) l  Why Netty? n  Enlightened by real world suffering l  HBASE-11297 n  Better thread model and performance l  Effect n  Online RT under high pressure: 0.92msà0.25ms n  Throughputs almost doubled

14. Lift ceiling of read throughput (con’t) n RowIndexDBE (HBASE-16213) l  Why n  Seek in the row when random reading is one of the main consumers of CPU n  All DBE except Prefix Tree use sequential search. l  How n  Add row index in a HFileBlock for binary search. (HBASE-16213) l  Effect n  Use less CPU and improve throughput， KeyValues64B, increased 10%

15. Lift ceiling of read throughput (con’t) n End-to-end read path offheap l  Why n  Advanced disk IO capability cause quicker cache eviction n  Suffering from GC caused by on-heap copy l  How n  Backport E2E read-path offheap to branch-1 (HBASE-17138) n  More details please refer to Anoop/Ram’s session l  Effect n  Throughput increased 30% n  Much more stable, less spike

16. Lift ceiling of read throughput (con’t) n End-to-end read path offheap l  Before l  After

17. Lift ceiling of write throughput n MVCC pre-assign (HBASE-16698, HBASE-17509/17471) l  Why n  Issue located from real world suffering: no more active handler n  MVCC is assigned after WAL append n  WAL append is designed to be RS-level sequential, thus throughput limited l  How n  Assign mvcc before WAL append, meanwhile assure the append order l  Original designed to use lock inside FSHLog (HBASE-16698) l  Improved by generating sequence id inside MVCC existing lock (HBASE-17471) l  Effect n  SYNC_WAL throughput improved 30%，ASYNC_WAL even more (70%)

18. Lift ceiling of write throughput (cont’d) n Refine the write path (Experimenting) l  Why n  Far from taking full usage of IO capacity of new hardware like PCIe-SSD n  WAL sync is IO-bound, while RPC handling is CPU-bound l  Write handlers should be non-blocking: do not wait for sync l  Respond asynchronously n  WAL append is sequential, while region puts are parallel l  Unnecessary context switch n  WAL append is IO-bound, while MemStore insertion is CPU-bound l  Possible to parallelize?

19. Lift ceiling of write throughput (cont’d) n Refine the write path (Experimenting) l  How n  Break the write path into 3 stages l  Pre-append, sync, post-sync l  Buffer/queue between stages n  Handlers only handle pre-append stage, respond in post-sync stage n  Bind regions to specific handler l  Reduce unnecessary context switch

20. Lift ceiling of write throughput (cont’d) n Refine the write path (Experimenting) l  Effect (Lab data) n  Throughput tripled: 140K à 420K with PCIe-SSD l  TODO n  Currently PCIe-SSD IO util only reached 20%, much more space to improve n  Integration with write-path offheap – more to expect n  Upstream the work after it’s verified online

21. About Future n HBase is still a kid – only 10 years’old l  More ceilings to break n  Improving, but still long way to go n  Far from fully utilizing the hardware capability, no matter CPU or IO l  More scenarios to try n  Embedded-mode (HBASE-17743) l  More to expect n  2.0 coming, 3.0 in plan n Hopefully more community involvement from Asia l  More upstream, less private

22. Q A Thank You!

hbaseconasia2017: Lift the ceiling of HBase throughputs

Recommended

Recommended

More Related Content

More from HBaseCon

More from HBaseCon (20)

hbaseconasia2017: Lift the ceiling of HBase throughputs