Yu Li and Lijin Bin
HBase is the core storage of Alibaba's search infrastructure and meets big challenge on improving its throughputs, which decides the speed of machine learning program processing thus the accuracy of recommendations made. In this session we will talk about work done and in progress to increase both read and write throughputs, as well as the real performance on the past Singles' Day and latest benchmark data in laboratory.
hbaseconasia2017 hbasecon hbase https://www.eventbrite.com/e/hbasecon-asia-2017-tickets-34935546159#
HBaseCon2017 Analyzing cryptocurrencies in real time with hBase, Kafka and St...
hbaseconasia2017: Lift the ceiling of HBase throughputs
1. Lift the Ceiling of Throughputs
Yu Li, Lijin Bin
{jueding.ly, tianzhao.blj}
@alibaba-inc.com
2. Agenda
n What/Where/When
l History of HBase in Alibaba Search
n Why
l Throughputs mean a lot
n How
l Lift the ceiling of read throughputs
l Lift the ceiling of write throughputs
n About future
3. HBase in Alibaba Search
n HBase is the core storage in Alibaba search system, since 2010
n History of version used online
l 2010~2014: 0.20.6à0.90.3à0.92.1à0.94.1à0.94.2à0.94.5
l 2014~2015: 0.94à0.98.1à0.98.4à0.98.8à0.98.12
l 2016: 0.98.12à1.1.2
n Cluster scale and use case
l Multiple clusters, largest with more than 1,500 nodes
l Co-located with Flink/Yarn, serving over 40Million/s Ops throughout the day
l Main source/sink for search and machine learning platform
4. Throughputs mean a lot
n Machine learning generates huge workloads
l Both read and write, no upper limit
l Both IO and CPU bound
n Throughputs decides the speed of ML processing
l More throughputs means more iterations in a time unit
n Speed of processing decides accuracy of decision made
l Recommendation quality
l Fraud detection accuracy
5. Lift ceiling of read throughput
n NettyRpcServer (HBASE-17263)
l Why Netty?
n Enlightened by real world suffering
l HBASE-11297
n Better thread model and performance
l Effect
n Online RT under high pressure: 0.92msà0.25ms
n Throughputs almost doubled
6.
7.
8.
9.
10. Lift ceiling of read throughput
n NettyRpcServer (HBASE-17263)
l Why Netty?
n Enlightened by real world suffering
l HBASE-11297
n Better thread model and performance
l Effect
n Online RT under high pressure: 0.92msà0.25ms
n Throughputs almost doubled
11.
12.
13.
14. Lift ceiling of read throughput (con’t)
n RowIndexDBE (HBASE-16213)
l Why
n Seek in the row when random reading is one of the main consumers of CPU
n All DBE except Prefix Tree use sequential search.
l How
n Add row index in a HFileBlock for binary search. (HBASE-16213)
l Effect
n Use less CPU and improve throughput, KeyValues64B, increased 10%
15. Lift ceiling of read throughput (con’t)
n End-to-end read path offheap
l Why
n Advanced disk IO capability cause quicker cache eviction
n Suffering from GC caused by on-heap copy
l How
n Backport E2E read-path offheap to branch-1 (HBASE-17138)
n More details please refer to Anoop/Ram’s session
l Effect
n Throughput increased 30%
n Much more stable, less spike
17. Lift ceiling of write throughput
n MVCC pre-assign (HBASE-16698, HBASE-17509/17471)
l Why
n Issue located from real world suffering: no more active handler
n MVCC is assigned after WAL append
n WAL append is designed to be RS-level sequential, thus throughput limited
l How
n Assign mvcc before WAL append, meanwhile assure the append order
l Original designed to use lock inside FSHLog (HBASE-16698)
l Improved by generating sequence id inside MVCC existing lock (HBASE-17471)
l Effect
n SYNC_WAL throughput improved 30%,ASYNC_WAL even more (70%)
18. Lift ceiling of write throughput (cont’d)
n Refine the write path (Experimenting)
l Why
n Far from taking full usage of IO capacity of new hardware like PCIe-SSD
n WAL sync is IO-bound, while RPC handling is CPU-bound
l Write handlers should be non-blocking: do not wait for sync
l Respond asynchronously
n WAL append is sequential, while region puts are parallel
l Unnecessary context switch
n WAL append is IO-bound, while MemStore insertion is CPU-bound
l Possible to parallelize?
19. Lift ceiling of write throughput (cont’d)
n Refine the write path (Experimenting)
l How
n Break the write path into 3 stages
l Pre-append, sync, post-sync
l Buffer/queue between stages
n Handlers only handle pre-append stage, respond in post-sync stage
n Bind regions to specific handler
l Reduce unnecessary context switch
20. Lift ceiling of write throughput (cont’d)
n Refine the write path (Experimenting)
l Effect (Lab data)
n Throughput tripled: 140K à 420K with PCIe-SSD
l TODO
n Currently PCIe-SSD IO util only reached 20%, much more space to improve
n Integration with write-path offheap – more to expect
n Upstream the work after it’s verified online
21. About Future
n HBase is still a kid – only 10 years’old
l More ceilings to break
n Improving, but still long way to go
n Far from fully utilizing the hardware capability, no matter CPU or IO
l More scenarios to try
n Embedded-mode (HBASE-17743)
l More to expect
n 2.0 coming, 3.0 in plan
n Hopefully more community involvement from Asia
l More upstream, less private