Time-Series Apache HBase

TIME SERIES IN HBASE
Staff Software Engineer
VLADIMIR RODIONOV

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
TIME SERIES
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE]
 Stock Closing Value DJIA
 User behavior (web clicks)
 Credit card transactions
 Health data
 Fitness indicators
 Sensor data (IoT)
 Application and system metrics - ODS

Time Series DB requirements
 Data Store MUST preserve temporal locality of data for better in-memory caching
– Facebook ODS : 85% queries are for last 26 hours
 Data Store MUST provide efficient compression
– Time – series are highly compressible (less than 2 bytes per data point in some cases)
– Facebook custom compression codec produces less than 1.4 bytes per data point
 Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg,
min, max, etc., by min, hour, day and so on – configurable. Most of the time its
aggregated data we are interested in.
 Efficient caching policy (RAM/SSD), presumably FIFO
 SQL API (nice to have, but it is optional)

OpenTSDB 2.x
 Data Store MUST preserve temporal locality of data for better in-memory caching – NO
– Size-Tiered HBase compaction does not preserve temporal locality of data. Major compaction
creates single file, for example, where recent data is stored with data which is months or years old.
– Compaction trashes block cache as well decreases read performance and increases latencies.
 Data Store MUST provide efficient compression – NO
– OpenTSDB supports compression, but its very heavy (runs externally) and usually users disable it in
production.
 Data Store MUST provide automatic time-based rollup aggregations – NOT
IMPLEMENTED
 SQL – Not supported

Ideal HBase Time Series DB
 Keeps raw data for hours
 Does not compact raw data at all
 Preserves raw data in memory cache for periodic compactions and time-based rollup
aggregations
 Stores full resolution data only in compressed form
 Has different TTL for different aggregation resolutions:
– Days for by_min, by_10min etc.
– Months, years for by_hour
 Compaction should preserve temporal locality of both: full resolution data and
aggregated data.
 FIFO block cache
 Integration with Phoenix (SQL)

Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours

HBASE-14468 FIFO compaction
 First-In-First-Out
 No compaction at all
 TTL expired data just get archived
 Ideal for raw data storage
 No compaction – no block cache trashing
 Raw data can be cached on write or on read
 Sustains 100s MB/s write throughput per RS
 Available 0.98.17, 1.2+, HDP-2.4+
 Can be easily back ported to 1.0-1.1

Exploring (Size-Tiered) Compaction
 Does not preserve temporal locality of data.
 Compaction trashes block cache
 No efficient caching of data is possible
 It hurts most-recent-most-valuable data access pattern.
 Compression/Aggregation is very heavy.
 To read back recent raw data and run it through compressor, many IO operations are
required, because …
 We can’t guarantee recent data in a block cache.

HBASE-15181 Date Tiered Compaction
 DateTieredCompactionPolicy
 CASSANDRA-6602
 Works better for time series than ExploringCompactionPolicy
 Better temporal locality helps with reads
 Good choice for compressed full resolution and aggregated data.
 Available in 0.98.17, 1.2+, HDP-2.4 will have it as well
 But, too many knobs to control

Date Tiered Compaction Policy

Exploring Compaction + Max Size
 Set hbase.hstore.compaction.max.size
 This efficiently emulates Date-Tiered Compaction
 Preserves temporal locality of data.
 Compaction does not trash block cache
 Efficient caching of recent data is possible
 Good for most-recent-most-valuable data access pattern.
 Use it for compressed and aggregated data
 Helps to keep recent data in a block cache.
 ECPM

HBASE-14496 Delayed compaction
 Files are eligible for minor compaction if their age > delay
 Good for application where most recent data is most valuable.
 Prevents block cache from trashing for recent data due to frequent minor compactions
of a fresh store files
 Will enable this feature for Exploring Compaction Policy
 Improves read latency for most recent data.
 ECP + Max +Delay (1-2 days) is good option for compressed full resolution and
aggregated data. ECPMD
 Patch available.
 HBase 1.0+ (can be back-ported to 0.98)

Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours
ECPM(D)
FIFO
ECPM(D)

HBase Block Cache
 Current policy (LRU) is not optimal for time-series applications
 We need something similar to FIFO (both in RAM and on SSD)
 We need support for TB size RAM/SSD-based caches
 Current off-heap bucket cache does not scale well (it keeps keys in Java heap)
 For SSD cache we could mirror most recent store files, thus providing FIFO semantics
w/o any complexity of disk-based cache management.
 Today …
– Disable cache for raw data
– Enable cache on write/read for compressed data and aggregations

Summary
 Disable major compaction
 Do not run HDFS balancer
 Disable HBase auto region balancing: balance_switch false
 Disable region splits (DisabledRegionSplitPolicy)
 Presplit table in advance.
 Have separate column families for raw, compressed and aggregated data (each
aggregate resolution – its own family)
 Increase hbase.hstore.blockingStoreFiles for all column families
 FIFO for Raw, ECPM(D) or DTCP for compressed and aggregations

Summary (continued)
 Run periodically internal job (coprocessor) to compress data and produce time-based
rollup aggregations.
 Do not cache raw data, write/read cache for others (if ECPM(D))
 Enable WAL Compression, use maximum compression for Raw data.

Thank you
 Q&A

Time-Series Apache HBase

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Time-Series Apache HBase

Similar to Time-Series Apache HBase (20)

More from HBaseCon

More from HBaseCon (20)

Recently uploaded

Recently uploaded (20)

Time-Series Apache HBase