IoT:what about data storage?

1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT: what about data storage?
Vladimir Rodionov
Staff Software Engineer

IoT data stream
 Sequence of data points

IoT data stream
 Triplet: [ID][TIME][VALUE] – basic time-series

IoT data stream
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags

IoT data stream
 Sometimes with location – spatial data

IoT data stream
 But, strictly time-series

IoT data stream
 Do we have good time series data store?

IoT data stream
 Open source?

IoT data stream
 Open source?
 But commercially supported?

Apache HBase
 Open Source
 Scalable
 Distributed
 NoSQL Data Store
 Commercially supported
 Temporal?

Apache HBase
 Open Source
 Scalable
 Distributed
 Temporal? Sure, you can do temporal

Apache HBase
 Open Source
 Scalable
 Distributed
 Temporal? Sure, you can do temporal stuff!
 Out of box?

Time Series DB requirements
 Data Store MUST preserve temporal locality of data for better in-memory caching
 Data Store MUST provide efficient compression
– Time – series are highly compressible (less than 2 bytes per data point in some cases)
– Facebook custom compression codec produces less than 1.4 bytes per data point
 Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg,
min, max, etc., by min, hour, day and so on – configurable. Most of the time its
aggregated data we are interested in.
 Efficient caching policy (RAM/SSD)
 SQL API (nice to have, but it is optional)
 Support IoT use cases ( write/read ratio up to 99/1, millions ops)

Ideal HBase Time Series DB
 Keeps raw data for hours
 Does not compact raw data at all
 Preserves raw data in memory cache for periodic compactions and time-based rollup
aggregations
 Stores full resolution data only in compressed form
 Has different TTL for different aggregation resolutions:
– Days for by_min, by_10min etc.
– Months, years for by_hour
 Compaction should preserve temporal locality of both: full resolution data and
aggregated data.
 Integration with Phoenix (SQL)

Write Path (for 99%)

Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours

HBASE-14468 FIFO compaction
 First-In-First-Out
 No compaction at all
 TTL expired data just get archived
 Ideal for raw data storage
 No compaction – no block cache trashing
 Raw data can be cached on write or on read
 Sustains 100s MB/s write throughput per RS
 Available 0.98.17, 1.1+, 1.2+, HDP-2.4+
 Can be easily back ported to 1.0 (do we need this?)

Exploring (Size-Tiered) Compaction
 Does not preserve temporal locality of data.
 Compaction trashes block cache
 No efficient caching of data is possible
 It hurts most-recent-most-valuable data access pattern.
 Compression/Aggregation is very heavy.
 To read back recent raw data and run it through compressor, many IO operations are
required, because …
 We can’t guarantee recent data in a block cache.

HBASE-15181 Date Tiered Compaction
 DateTieredCompactionPolicy
 CASSANDRA-6602
 Works better for time series than ExploringCompactionPolicy
 Better temporal locality helps with reads
 Good choice for compressed full resolution and aggregated data.
 Available in 0.98.17, 1.2+, HDP-2.4 has it as well

Exploring Compaction + Max Size
 Set hbase.hstore.compaction.max.size
 This emulates Date-Tiered Compaction
 Preserves temporal locality of data – data point which are close will be stored in a same
file, distant ones – in separate files.
 Compaction works better with block cache
 More efficient caching of recent data is possible
 Good for most-recent-most-valuable data access pattern.
 Use it for compressed and aggregated data
 Helps to keep recent data in a block cache.
 ECPM

HBASE-14496 Delayed compaction
 Files are eligible for minor compaction if their age > delay
 Good for application where most recent data is most valuable.
 Prevents block cache from trashing for recent data due to frequent minor compactions
of a fresh store files
 Will enable this feature for Exploring Compaction Policy
 Improves read latency for most recent data.
 ECP + Max +Delay (1-2 days) is good option for compressed full resolution and
aggregated data. ECPMD
 Patch available.
 HBase 1.0+ (can be back-ported to 0.98)

Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours
ECPM or DTCP
FIFO
ECPM or DTCP

HBase Block Cache and Time Series
 Current policy (LRU) is not optimal for time-series applications
 We need something similar to FIFO (both in RAM and on SSD)
 We need support for TB size RAM/SSD-based caches
 Current off-heap bucket cache does not scale well (it keeps keys in Java heap)
 For SSD cache we could mirror most recent store files, thus providing FIFO semantics
w/o any complexity of disk-based cache management.
 This all above are work items for future, but today …
– Disable cache for raw data (prevent extreme cache churn)
– Enable cache on write/read for compressed data and aggregations

Flexible Retention Policies
Raw
Compressed
Aggregates
Hours Months Years

Read/Write IO Reduction
100
~50
~10
Base
FIFO+ECPM
+Compaction

Read/Write IO Reduction
100
~50
~10
Base
FIFO+ECPM
+Compaction
50-100MB/s
25-50MB/s
5-10MB/s

Read/Write IO Reduction (estimate for 250K/sec data points)
100
~50
~10
Base
FIFO+ECPM
+Compaction
50-100MB/s
25-50MB/s
5-10MB/s

Summary
 Disable major compaction
 Do not run HDFS balancer
 Disable HBase auto region balancing: balance_switch false
 Disable region splits (DisabledRegionSplitPolicy)
 Presplit table in advance.
 Have separate column families for raw, compressed and aggregated data (each
aggregate resolution – its own family)
 Increase hbase.hstore.blockingStoreFiles for all column families
 FIFO for Raw, ECPM(D) or DTCP (next session) for compressed and aggregated data

Summary (continued)
 Run periodically internal job (coprocessor) to compress data and produce time-based
rollup aggregations.
 Do not cache raw data, write/read cache for others (if ECPM(D))
 Enable WAL Compression - decrease write IO.
 Use maximum compression for Raw data (GZ) – decrease write IO.

Read Path (for 1%)

SQL (Phoenix) integration
 Each time series has set of named attributes, which we call meta (tags in OpenTSDB)
 Keep time-series meta in Phoenix type table(s).
 Adding new time series, deleting time-series or updating time-series is DML/DDL
operation on a Phoenix table.
 Meta is static (mostly)
 Define set of attributes in meta which create PK
 Have PK translation to unique ID.
 Store ID, RTS (reversed time stamp), VALUE in HBase
 Now you can index time-series by any attribute(s) in Phoenix
 Query is two-step process: Phoenix first to select list of IDs, then HBase to run query on
ID list

Query Flow
ID Active Version … MFG
11 true 1.1 SA
12 true 1.3 SA
15 true 1.4 GE
17 true 1.1 GE
… … … … …
345 false 1.0 SA
Phoenix SQL
Time-Series Definition - META
ID Timestamp Value
11 143897653 10.0
12 143897753 11.3
15 143897953 11.6
17 143897853 11.9
… … …
345 143897753 11.0
HBase Time Series DB
Time-Series Data
2)GetAvgByIdSet(ID
set, now(), now() -
24h)
1)SELECT ID FROM META
WHERE MFG=‘SA’AND
Version = ‘1.1’
1. 2.
ID set

Time-Series DB API
 Group operations on ID sets by time range
– Min, Max, Avg, Count, Sum, other aggregations
 Pluggable aggregation functions
 Support for different time resolutions
 With different approximations (linear, cubic, bi-cubic)
 Batch load support (for writes)
 Can be implemented in a HBase coprocessor layer
 Can work much-much faster than regular SQL DBMS

Time-Series DB API
 Group operations on ID sets by time range
– Min, Max, Avg, Count, Sum, other aggregations
 Pluggable aggregation functions
 Support for different time resolutions
 With different approximations (linear, cubic, bi-cubic)
 Batch load support (for writes)
 Can be implemented in a HBase coprocessor layer
 Can work much-much faster than regular SQL DBMS
 Because we have already aggregated data

Thank you
 Q&A

IoT:what about data storage?

In this document

More Related Content

What's hot

Viewers also liked

Similar to IoT:what about data storage?

More from DataWorks Summit/Hadoop Summit

Recently uploaded

IoT:what about data storage?