SlideShare a Scribd company logo
1 of 37
Download to read offline
Separating hot-cold data
into heterogeneous storage
based on layered compaction
Allan Yang(HBase Committer)
Content 01
02
03
Typical Scenarios at Alibaba
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— Query Optimizations
Conclusions
Typical Scenarios at Alibaba01
Typical Scenarios at Alibaba
Contacts&Chat AI BOTs Risk Control
Bills Logistics trackingGMV
Typical Scenarios
Commonality in some Scenarios
 Mass data
 No TTLs
 Only very small parts of data is
frequently visited
 Hotspots change as time goes by
Definitions
Hot Data
• Access very frequently
• Relatively small amount
• Low latency is very critical
HOT
COLD
Hot Data
• Access very frequently
• Relatively small amount
• Low latency is very critical
Cold Data
• Access rarely
• Big amount
• Cost is more concerned
Hot-cold Data Separation02
Old Architecture
Pros
• Simple, no HBase code change
needed
Cons
• High maintenance cost
• Client aware
• Hard to keep consistency
Client
Cold TableHot Table
Coprocessor
CopyTable
Replication
……
Write
Hot query
Cold query
Current Architecture
• Separating hot cold data
automatically in a single table
• Transparent to user
• Different storage policy for
each layer
• Auto query optimization
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— Query Optimizations
Separating hot cold data
The Problems of separating data by KV timestamp
• Timestamp may not represents the heat of business data very
well
• KeyValue’s timestamp is also used as version number in HBase
e.g. Write an order ID advance in current ts
e.g. Data Source(Kafka, Spark…) delayed, resulting ts lag
Secondary Field slicer
Besides ts, we provide a way to parse a Secondary Field from
Rowkey, use it as the boundary of Hot/Warn/Cold data
• FixPosFieldSlicer
• DelimiterFieldSlicer
Rowkey:
Fixed size prefix
16bit UserID 64bit timestamp Postfix
Secondary Field
Variant size prefix
Variant prefix 64bit timestamp Postfix
Secondary Field
Rowkey:
#
Delimiter
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— Query Optimizations
Default compaction in HBase
Default HBase compaction
Strategy is Size-Tiered,
which is aimed to compact
small files to bigger files.
Size-Tiered Compaction Strategy
• Size is the only concern
• Old data and new data will spread
around all HFiles
• Can’t be used for Separating hot
cold data
Time range of HFile
Time range we want
Date-Tiered Compaction Strategy
Date-Tiered Compaction Strategy(HBASE-15181)
Time Window
Compact multiple time windows in to one tier when
time goes by. The older, the bigger tier is.
Logic view Physical view
Separating hot cold data
• Only Cold/Warm/Hot window is needed
• Data will move from hot to warn then to Cold window
• Secondary Filed or timestamp is used
Our layered compaction is inspired by Date-tiered
Compaction.
Layered Compaction
• HFile flushed by Memstore is
always in L0
• Hot/Warm/Cold layer have
their own compaction
Strategy
• Data is separated by
secondary field or timestamp
• Data out of boundary will be
compacted out to next layer
Layered Compaction
• Compactor will output multiple HFiles
according to the separation boundary
• Secondary Field range will be written
into the FileInfo section of HFile
e.g.
Rowkey:userid+ts
UserA002
UserA005
UserB003
UserB007
Secondary Field Range: 002…007
HFile
HFile
HFile
Compactor
HFile
(hot)
HFile
(cold)
Secondary
Field
Range
Secondary
Field
Range
Heterogeneous storage
We can specify Data encoding,
Compression, and storage type for
each layer
Here is an example:
Type
Data
Encoding
Compression Storage
Hot None None SSD/RAM
Warn DIFF LZO One_SSD
Cold DIFF LZ4 HDD/EC
RAM
RAM
RAM
SSD
SSD
SSD
All_RAM All_SSD
Erasure-Coding
SSD
HDD
HDD
One_SSD
HDD
HDD
HDD
All_HDD
Storage Computing Separation
• Apsara HBase Provide a
Architecture of storage
computing separation
• High density HDD will be
available in Apsara HBase
about this September.
Welcome to try Apsara HBase at
https://www.aliyun.com/product/hbase
Apsara HBase
Hot-cold Data Separation
— Hot-cold Data Recognition
— Layered Compaction
— Query Optimizations
HBase Read Path
A quick tour of HBase read path
HFile4 is filtered out by:
•Bloom filter
•Time range
•Key range
HFile 1 HFile 2
HFile3 HFile4
Store
Memstore
Region
Memstore HFile 2
HFile3HFile 1HFile 2
HFile3 Memstore
Scan Start
Scan End
KeyValue Heap
Goal of Query Optimization
• Query optimization is only for hot
queries
• We have to try our best to filter out
the cold HFiles, avoid seek in them.
• Seeking in cold HFiles can
tremendously increase RT for hot
queries
HFile
(hot)
HFile
(cold)
KeyValue Heap
Client
Query
Query Optimization: Case 1
• Scenario: Monitoring, e.g. OpenTSDB
• Rowkey: MetricName + ts + postfix(tags)
Rowkey ts
cpuA001server1 001
cpuA002server1 002
cpuA003server1 003
cpuA004server1 004
diskB001server1 001
diskB002server1 002
diskB003server1 003
diskB004server1 004
Separate data by
boundary: ts = 003
HFile(hot)
cpuA003server1
cpuA004server1
disk003server1
diskB004server1
Time Range:
003…004
HFile(cold)
cpuA001server1
cpuA002server1
diskB001serve1
diskB002server
Time Range:
001…002
Optimization:
Scan.setTimeRang
e(003, 004)
Cold HFile can be
filtered out easily by
time range
Query: Scan scan = new Scan(cpuA003, cpuA004)
Query Optimization: Case 2
• Scenario: Tracing system
• Rowkey: TraceID (events are recorded in different column)
Rowkey ts
traceid1 001
traceid2 002
traceid3 003
traceid4 004
traceid5 005
traceid6 006
traceid7 007
traceid8 008
Separate data by
boundary: ts = 004
HFile(hot)
traceid5
traceid6
traceid7
traceid8
Bloom Filter
HFile(cold)
traceid1
traceid2
traceid3
traceid4
Bloom Filter
Optimization:
Cold HFile can be filtered
out by Bloom Filter
Problem:
false positive of bloom filter
can cause spikes
Query: Get get= new Get(“traceid7”)
Row,column,ts
Row1,f:q,008
Row2,f:q,007
Row2,f:q,006
Row3,f:q,005
Time Range:
005…007
Row,column,ts
Row1,f:q,001
Row2,f:q,003
Row2,f:q,002
Row3,f:q,004
Time Range:
001…004
Query: Select row >= Row2,f:q and limit =1
Fake row:
row2,f:q,008
Row1,f:q,008
Row2,f:q,007
Row2,f:q,006
Row3,f:q,005
Time Range:
005…008
Fake row:
row2,f:q,004
Row1,f:q,001
Row2,f:q,002
Row2,f:q,003
Row3,f:q,004
Time Range:
001…004
① ①
① ①
②
KeyValue Heap KeyValue Heap
Lazy Seek
Create a fake row with
biggest ts possible
HFile1 HFile2
HFile1 HFile2
Lazy Seek (HBASE-4465)
Scenario: KV Store
Rowkey: key(with only one qualifier)
28
HFile(hot)
Fake row:
Row5,f:q1, 006
Row4,f:q1
Row5,f:q1
Row6,f:q1
Bloom Filter
Time Range:
004…006
Separate data by
boundary: ts = 004
Query: Get get= new Get(“Row5,f:q1”)
HFile(cold)
Fake row:
Row5,f:q1, 003
Row1,f:q1
Row2,f:q1
Row3,f:q1
Bloom Filter
Time Range:
001…003
Optimization:
Cold HFile will not be
seeked because of
lazy seek
Row,Column ts
Row1,f:q1 001
Row2,f:q1 002
Row3,f:q1 003
Row4,f:q1 004
Row5,f:q1 005
Row6,f:q1 006
False positive of
bloom filter
Query Optimization: Case 3
• Scenario: KV Store
• Rowkey: key(with only one qualifier)
29
HFile(hot)
trace2Collect005
trace2Arrive006
trace2Delivery007
trace2Done008
Time Range:
005…008
Separate data by
boundary: ts = 004
Query: Scan scan = new Scan(“trace2” , “trace2~”)
Rowkey ts
trace1Collect001 001
trace1Arrive002 002
trace1Delivery003 003
trace1Done004 004
trace2Collect005 005
trace2Arrive006 006
trace2Delivery007 007
trace2Done008 008
HFile(cold)
trace1Collect001
trace1Arrive002
trace1Delivery003
trace1Done004
Time Range:
001…004
Problem:
Scan with prefix, no time
range can be provided
Query Optimization: Case 4
• Scenario: Logistics tracking in Alibaba
• Rowkey: traceNo + actionCode + ts
30
Fixed size prefix
32bits traceNo actionCode ts
Other parts of Rowkey
Bloom Filter
Generate Check
HFile(hot)
trace2Collect005
trace2Arrive006
trace2Delivery007
trace2Done008
Prefix
Bloom Filter
HFile(cold)
trace1Collect001
trace1Arrive002
trace1Delivery003
trace1Done004
Prefix
Bloom Filter
Query: Scan scan = new Scan(“trace2” , “trace2~”)
Prefix Bloom Filter
• Use the prefix part of a rowkey to generate and to check
bloom filter
31
HFile(hot)
userA991
userA992
userA993
userA994
Time Range:
005…008
Separate data by
boundary: ts = 004
Query: Scan scan = new Scan(“userA”), Limit 4
Rowkey ts
userA991 008
userA992 007
userA993 006
userA994 005
userA995 004
userA996 003
userA997 002
userA998 001
HFile(cold)
userA995
userA996
userA997
userA998
Time Range:
001…004
Problem:
Scan with prefix, no endkey, no
time range can be provided
Query Optimization: Case 5
• Scenario: Bills History in Alibaba
• Rowkey: userID + reverse(ts) + (oderID)
32
Secondary Field Lazy Seek
prefix
32bits userID ts
Secondary Field
HFile(hot)
Fake Row: userA991
userA991
userA992
userA993
userA994
Secondary Field Range:
991…994
HFile(cold)
Fake Row: userA995
userA995
userA996
userA997
userA998
Secondary Field Range:
995…998
Rowkey:
KeyValue Heap
Query: Scan scan = new Scan(“userA”), Limit 4
① ①
②
③
④
⑤
Secondary Field Lazy Seek
• Store Secondary Filed Range in HFile’s FileInfo section
• Create fake key to perform lazy seek
33
HFile(hot)
cpuA003server1
cpuA004server1
cpuB003server1
cpuB004server1
Secondary
Field Range:
003…004
Separate data by
boundary: ts = 003
Query: Scan scan = new Scan(cpuA003, cpuA004)
Optimization:
Cold HFile can be
filtered out easily by
Secondary Field
Rowkey ts
cpuA001server1 001
cpuA002server1 002
cpuA003server1 003
cpuA004server1 004
cpuB001server1 001
cpuB002server1 002
cpuB003server1 003
cpuB004server1 004
HFile(cold)
cpuA001server1
cpuA002server1
cpuB001server1
cpuB002server1
Secondary
Field Range:
001…002
Query Optimization: Case 1 - revisit
• Scenario: Monitoring, e.g. OpenTSDB
• Rowkey: MetricName + ts(Secondary Field) + postfix(tags)
Conclusion03
Conclusion
• A new approach to separate hot-cold data was introduced
• A new Secondary Field Slicer was used to decide layer boundaries
besides timestamp
• Layered compaction was used to separate data to different layer
• Heterogeneous storage was used to balance cost and performance
• New technology like Prefix Bloom Filter and Secondary Field Range
Lazy Seek was used to do auto query optimization
• Production test shows that our approach can lower the query RT by
50% and decrease the storage usage by 25%
We are hiring!
• If you are interested in or familiar
with Hadoop ecosystem or any
other No-SQL database
• If you are eager to accept challenge
of building high concurrency, low
latency and flexible system
FAQ

More Related Content

More from Michael Stack

hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index ComponentMichael Stack
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at AlibabaMichael Stack
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at XiaomiMichael Stack
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and SparkMichael Stack
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBaseMichael Stack
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index SolutionMichael Stack
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent MemoryMichael Stack
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLMichael Stack
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBaseMichael Stack
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...Michael Stack
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...Michael Stack
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteMichael Stack
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesMichael Stack
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiMichael Stack
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeMichael Stack
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...Michael Stack
 
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceHBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceMichael Stack
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China TelecomMichael Stack
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseMichael Stack
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudMichael Stack
 

More from Michael Stack (20)

hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Componenthbaseconasia2019 Pharos as a Pluggable Secondary Index Component
hbaseconasia2019 Pharos as a Pluggable Secondary Index Component
 
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibabahbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
hbaseconasia2019 Phoenix Improvements and Practices on Cloud HBase at Alibaba
 
hbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomihbaseconasia2019 OpenTSDB at Xiaomi
hbaseconasia2019 OpenTSDB at Xiaomi
 
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Sparkhbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
hbaseconasia2019 BigData NoSQL System: ApsaraDB, HBase and Spark
 
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBasehbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
hbaseconasia2019 Test-suite for Automating Data-consistency checks on HBase
 
hbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solutionhbaseconasia2019 Distributed Bitmap Index Solution
hbaseconasia2019 Distributed Bitmap Index Solution
 
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memoryhbaseconasia2019 HBase Bucket Cache on Persistent Memory
hbaseconasia2019 HBase Bucket Cache on Persistent Memory
 
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACLhbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
hbaseconasia2019 The Procedure v2 Implementation of WAL Splitting and ACL
 
hbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBasehbaseconasia2019 BDS: A data synchronization platform for HBase
hbaseconasia2019 BDS: A data synchronization platform for HBase
 
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
hbaseconasia2019 Further GC optimization for HBase 2.x: Reading HFileBlock in...
 
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
hbaseconasia2019 HBCK2: Concepts, trends, and recipes for fixing issues in HB...
 
HBaseConAsia2019 Keynote
HBaseConAsia2019 KeynoteHBaseConAsia2019 Keynote
HBaseConAsia2019 Keynote
 
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latenciesHBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
HBaseConAsia2018 Track3-1: Serving billions of queries in millisecond latencies
 
HBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at XiaomiHBaseConAsia2018 Track1-3: HBase at Xiaomi
HBaseConAsia2018 Track1-3: HBase at Xiaomi
 
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC timeHBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
HBaseConAsia2018 Track1-1: Use CCSMap to improve HBase YGC time
 
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
 
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life InsuranceHBaseConAsia2018 Track3-3: HBase at China Life Insurance
HBaseConAsia2018 Track3-3: HBase at China Life Insurance
 
HBaseConAsia2018 Track3-2: HBase at China Telecom
HBaseConAsia2018 Track3-2:  HBase at China TelecomHBaseConAsia2018 Track3-2:  HBase at China Telecom
HBaseConAsia2018 Track3-2: HBase at China Telecom
 
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBaseHBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
 
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and CloudHBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
HBaseConAsia2018 Keynote 2: Recent Development of HBase in Alibaba and Cloud
 

Recently uploaded

Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书rnrncn29
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predieusebiomeyer
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleanscorenetworkseo
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxeditsforyah
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationMarko4394
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 

Recently uploaded (20)

young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
『澳洲文凭』买拉筹伯大学毕业证书成绩单办理澳洲LTU文凭学位证书
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
SCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is prediSCM Symposium PPT Format Customer loyalty is predi
SCM Symposium PPT Format Customer loyalty is predi
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
Elevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New OrleansElevate Your Business with Our IT Expertise in New Orleans
Elevate Your Business with Our IT Expertise in New Orleans
 
Q4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptxQ4-1-Illustrating-Hypothesis-Testing.pptx
Q4-1-Illustrating-Hypothesis-Testing.pptx
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
NSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentationNSX-T and Service Interfaces presentation
NSX-T and Service Interfaces presentation
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in  Rk Puram 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Rk Puram 🔝 9953056974 🔝 Delhi escort Service
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 

HBaseConAsia2018 Track1-6: Separating hot-cold data into hetergeneous storage based on layered compaction

  • 1. Separating hot-cold data into heterogeneous storage based on layered compaction Allan Yang(HBase Committer)
  • 2. Content 01 02 03 Typical Scenarios at Alibaba Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations Conclusions
  • 4. Typical Scenarios at Alibaba Contacts&Chat AI BOTs Risk Control Bills Logistics trackingGMV
  • 5. Typical Scenarios Commonality in some Scenarios  Mass data  No TTLs  Only very small parts of data is frequently visited  Hotspots change as time goes by
  • 6. Definitions Hot Data • Access very frequently • Relatively small amount • Low latency is very critical HOT COLD Hot Data • Access very frequently • Relatively small amount • Low latency is very critical Cold Data • Access rarely • Big amount • Cost is more concerned
  • 8. Old Architecture Pros • Simple, no HBase code change needed Cons • High maintenance cost • Client aware • Hard to keep consistency Client Cold TableHot Table Coprocessor CopyTable Replication …… Write Hot query Cold query
  • 9. Current Architecture • Separating hot cold data automatically in a single table • Transparent to user • Different storage policy for each layer • Auto query optimization
  • 10. Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations
  • 11. Separating hot cold data The Problems of separating data by KV timestamp • Timestamp may not represents the heat of business data very well • KeyValue’s timestamp is also used as version number in HBase e.g. Write an order ID advance in current ts e.g. Data Source(Kafka, Spark…) delayed, resulting ts lag
  • 12. Secondary Field slicer Besides ts, we provide a way to parse a Secondary Field from Rowkey, use it as the boundary of Hot/Warn/Cold data • FixPosFieldSlicer • DelimiterFieldSlicer Rowkey: Fixed size prefix 16bit UserID 64bit timestamp Postfix Secondary Field Variant size prefix Variant prefix 64bit timestamp Postfix Secondary Field Rowkey: # Delimiter
  • 13. Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations
  • 14. Default compaction in HBase Default HBase compaction Strategy is Size-Tiered, which is aimed to compact small files to bigger files.
  • 15. Size-Tiered Compaction Strategy • Size is the only concern • Old data and new data will spread around all HFiles • Can’t be used for Separating hot cold data Time range of HFile Time range we want
  • 16. Date-Tiered Compaction Strategy Date-Tiered Compaction Strategy(HBASE-15181) Time Window Compact multiple time windows in to one tier when time goes by. The older, the bigger tier is. Logic view Physical view
  • 17. Separating hot cold data • Only Cold/Warm/Hot window is needed • Data will move from hot to warn then to Cold window • Secondary Filed or timestamp is used Our layered compaction is inspired by Date-tiered Compaction.
  • 18. Layered Compaction • HFile flushed by Memstore is always in L0 • Hot/Warm/Cold layer have their own compaction Strategy • Data is separated by secondary field or timestamp • Data out of boundary will be compacted out to next layer
  • 19. Layered Compaction • Compactor will output multiple HFiles according to the separation boundary • Secondary Field range will be written into the FileInfo section of HFile e.g. Rowkey:userid+ts UserA002 UserA005 UserB003 UserB007 Secondary Field Range: 002…007 HFile HFile HFile Compactor HFile (hot) HFile (cold) Secondary Field Range Secondary Field Range
  • 20. Heterogeneous storage We can specify Data encoding, Compression, and storage type for each layer Here is an example: Type Data Encoding Compression Storage Hot None None SSD/RAM Warn DIFF LZO One_SSD Cold DIFF LZ4 HDD/EC RAM RAM RAM SSD SSD SSD All_RAM All_SSD Erasure-Coding SSD HDD HDD One_SSD HDD HDD HDD All_HDD
  • 21. Storage Computing Separation • Apsara HBase Provide a Architecture of storage computing separation • High density HDD will be available in Apsara HBase about this September. Welcome to try Apsara HBase at https://www.aliyun.com/product/hbase Apsara HBase
  • 22. Hot-cold Data Separation — Hot-cold Data Recognition — Layered Compaction — Query Optimizations
  • 23. HBase Read Path A quick tour of HBase read path HFile4 is filtered out by: •Bloom filter •Time range •Key range HFile 1 HFile 2 HFile3 HFile4 Store Memstore Region Memstore HFile 2 HFile3HFile 1HFile 2 HFile3 Memstore Scan Start Scan End KeyValue Heap
  • 24. Goal of Query Optimization • Query optimization is only for hot queries • We have to try our best to filter out the cold HFiles, avoid seek in them. • Seeking in cold HFiles can tremendously increase RT for hot queries HFile (hot) HFile (cold) KeyValue Heap Client Query
  • 25. Query Optimization: Case 1 • Scenario: Monitoring, e.g. OpenTSDB • Rowkey: MetricName + ts + postfix(tags) Rowkey ts cpuA001server1 001 cpuA002server1 002 cpuA003server1 003 cpuA004server1 004 diskB001server1 001 diskB002server1 002 diskB003server1 003 diskB004server1 004 Separate data by boundary: ts = 003 HFile(hot) cpuA003server1 cpuA004server1 disk003server1 diskB004server1 Time Range: 003…004 HFile(cold) cpuA001server1 cpuA002server1 diskB001serve1 diskB002server Time Range: 001…002 Optimization: Scan.setTimeRang e(003, 004) Cold HFile can be filtered out easily by time range Query: Scan scan = new Scan(cpuA003, cpuA004)
  • 26. Query Optimization: Case 2 • Scenario: Tracing system • Rowkey: TraceID (events are recorded in different column) Rowkey ts traceid1 001 traceid2 002 traceid3 003 traceid4 004 traceid5 005 traceid6 006 traceid7 007 traceid8 008 Separate data by boundary: ts = 004 HFile(hot) traceid5 traceid6 traceid7 traceid8 Bloom Filter HFile(cold) traceid1 traceid2 traceid3 traceid4 Bloom Filter Optimization: Cold HFile can be filtered out by Bloom Filter Problem: false positive of bloom filter can cause spikes Query: Get get= new Get(“traceid7”)
  • 27. Row,column,ts Row1,f:q,008 Row2,f:q,007 Row2,f:q,006 Row3,f:q,005 Time Range: 005…007 Row,column,ts Row1,f:q,001 Row2,f:q,003 Row2,f:q,002 Row3,f:q,004 Time Range: 001…004 Query: Select row >= Row2,f:q and limit =1 Fake row: row2,f:q,008 Row1,f:q,008 Row2,f:q,007 Row2,f:q,006 Row3,f:q,005 Time Range: 005…008 Fake row: row2,f:q,004 Row1,f:q,001 Row2,f:q,002 Row2,f:q,003 Row3,f:q,004 Time Range: 001…004 ① ① ① ① ② KeyValue Heap KeyValue Heap Lazy Seek Create a fake row with biggest ts possible HFile1 HFile2 HFile1 HFile2 Lazy Seek (HBASE-4465)
  • 28. Scenario: KV Store Rowkey: key(with only one qualifier) 28 HFile(hot) Fake row: Row5,f:q1, 006 Row4,f:q1 Row5,f:q1 Row6,f:q1 Bloom Filter Time Range: 004…006 Separate data by boundary: ts = 004 Query: Get get= new Get(“Row5,f:q1”) HFile(cold) Fake row: Row5,f:q1, 003 Row1,f:q1 Row2,f:q1 Row3,f:q1 Bloom Filter Time Range: 001…003 Optimization: Cold HFile will not be seeked because of lazy seek Row,Column ts Row1,f:q1 001 Row2,f:q1 002 Row3,f:q1 003 Row4,f:q1 004 Row5,f:q1 005 Row6,f:q1 006 False positive of bloom filter Query Optimization: Case 3 • Scenario: KV Store • Rowkey: key(with only one qualifier)
  • 29. 29 HFile(hot) trace2Collect005 trace2Arrive006 trace2Delivery007 trace2Done008 Time Range: 005…008 Separate data by boundary: ts = 004 Query: Scan scan = new Scan(“trace2” , “trace2~”) Rowkey ts trace1Collect001 001 trace1Arrive002 002 trace1Delivery003 003 trace1Done004 004 trace2Collect005 005 trace2Arrive006 006 trace2Delivery007 007 trace2Done008 008 HFile(cold) trace1Collect001 trace1Arrive002 trace1Delivery003 trace1Done004 Time Range: 001…004 Problem: Scan with prefix, no time range can be provided Query Optimization: Case 4 • Scenario: Logistics tracking in Alibaba • Rowkey: traceNo + actionCode + ts
  • 30. 30 Fixed size prefix 32bits traceNo actionCode ts Other parts of Rowkey Bloom Filter Generate Check HFile(hot) trace2Collect005 trace2Arrive006 trace2Delivery007 trace2Done008 Prefix Bloom Filter HFile(cold) trace1Collect001 trace1Arrive002 trace1Delivery003 trace1Done004 Prefix Bloom Filter Query: Scan scan = new Scan(“trace2” , “trace2~”) Prefix Bloom Filter • Use the prefix part of a rowkey to generate and to check bloom filter
  • 31. 31 HFile(hot) userA991 userA992 userA993 userA994 Time Range: 005…008 Separate data by boundary: ts = 004 Query: Scan scan = new Scan(“userA”), Limit 4 Rowkey ts userA991 008 userA992 007 userA993 006 userA994 005 userA995 004 userA996 003 userA997 002 userA998 001 HFile(cold) userA995 userA996 userA997 userA998 Time Range: 001…004 Problem: Scan with prefix, no endkey, no time range can be provided Query Optimization: Case 5 • Scenario: Bills History in Alibaba • Rowkey: userID + reverse(ts) + (oderID)
  • 32. 32 Secondary Field Lazy Seek prefix 32bits userID ts Secondary Field HFile(hot) Fake Row: userA991 userA991 userA992 userA993 userA994 Secondary Field Range: 991…994 HFile(cold) Fake Row: userA995 userA995 userA996 userA997 userA998 Secondary Field Range: 995…998 Rowkey: KeyValue Heap Query: Scan scan = new Scan(“userA”), Limit 4 ① ① ② ③ ④ ⑤ Secondary Field Lazy Seek • Store Secondary Filed Range in HFile’s FileInfo section • Create fake key to perform lazy seek
  • 33. 33 HFile(hot) cpuA003server1 cpuA004server1 cpuB003server1 cpuB004server1 Secondary Field Range: 003…004 Separate data by boundary: ts = 003 Query: Scan scan = new Scan(cpuA003, cpuA004) Optimization: Cold HFile can be filtered out easily by Secondary Field Rowkey ts cpuA001server1 001 cpuA002server1 002 cpuA003server1 003 cpuA004server1 004 cpuB001server1 001 cpuB002server1 002 cpuB003server1 003 cpuB004server1 004 HFile(cold) cpuA001server1 cpuA002server1 cpuB001server1 cpuB002server1 Secondary Field Range: 001…002 Query Optimization: Case 1 - revisit • Scenario: Monitoring, e.g. OpenTSDB • Rowkey: MetricName + ts(Secondary Field) + postfix(tags)
  • 35. Conclusion • A new approach to separate hot-cold data was introduced • A new Secondary Field Slicer was used to decide layer boundaries besides timestamp • Layered compaction was used to separate data to different layer • Heterogeneous storage was used to balance cost and performance • New technology like Prefix Bloom Filter and Secondary Field Range Lazy Seek was used to do auto query optimization • Production test shows that our approach can lower the query RT by 50% and decrease the storage usage by 25%
  • 36. We are hiring! • If you are interested in or familiar with Hadoop ecosystem or any other No-SQL database • If you are eager to accept challenge of building high concurrency, low latency and flexible system
  • 37. FAQ