1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT: what about data storage?
Vladimir Rodionov
Staff Software Engineer
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic time-series
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic time-series
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic time-series
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
 Sometimes with location – spatial data
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic time-series
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
 Sometimes with location – spatial data
 But, strictly time-series
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic time-series
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
 Sometimes with location – spatial data
 But, strictly time-series
 Do we have good time series data store?
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic time-series
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
 Sometimes with location – spatial data
 But, strictly time-series
 Do we have good time series data store?
 Open source?
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
IoT data stream
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic time-series
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
 Sometimes with location – spatial data
 But, strictly time-series
 Do we have good time series data store?
 Open source?
 But commercially supported?
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache HBase
 Open Source
 Scalable
 Distributed
 NoSQL Data Store
 Commercially supported
 Temporal?
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache HBase
 Open Source
 Scalable
 Distributed
 NoSQL Data Store
 Commercially supported
 Temporal? Sure, you can do temporal
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Apache HBase
 Open Source
 Scalable
 Distributed
 NoSQL Data Store
 Commercially supported
 Temporal? Sure, you can do temporal stuff!
 Out of box?
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series DB requirements
 Data Store MUST preserve temporal locality of data for better in-memory caching
 Data Store MUST provide efficient compression
– Time – series are highly compressible (less than 2 bytes per data point in some cases)
– Facebook custom compression codec produces less than 1.4 bytes per data point
 Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg,
min, max, etc., by min, hour, day and so on – configurable. Most of the time its
aggregated data we are interested in.
 Efficient caching policy (RAM/SSD)
 SQL API (nice to have, but it is optional)
 Support IoT use cases ( write/read ratio up to 99/1, millions ops)
18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ideal HBase Time Series DB
 Keeps raw data for hours
 Does not compact raw data at all
 Preserves raw data in memory cache for periodic compactions and time-based rollup
aggregations
 Stores full resolution data only in compressed form
 Has different TTL for different aggregation resolutions:
– Days for by_min, by_10min etc.
– Months, years for by_hour
 Compaction should preserve temporal locality of both: full resolution data and
aggregated data.
 Integration with Phoenix (SQL)
19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Write Path (for 99%)
20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours
21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBASE-14468 FIFO compaction
 First-In-First-Out
 No compaction at all
 TTL expired data just get archived
 Ideal for raw data storage
 No compaction – no block cache trashing
 Raw data can be cached on write or on read
 Sustains 100s MB/s write throughput per RS
 Available 0.98.17, 1.1+, 1.2+, HDP-2.4+
 Can be easily back ported to 1.0 (do we need this?)
22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploring (Size-Tiered) Compaction
 Does not preserve temporal locality of data.
 Compaction trashes block cache
 No efficient caching of data is possible
 It hurts most-recent-most-valuable data access pattern.
 Compression/Aggregation is very heavy.
 To read back recent raw data and run it through compressor, many IO operations are
required, because …
 We can’t guarantee recent data in a block cache.
23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBASE-15181 Date Tiered Compaction
 DateTieredCompactionPolicy
 CASSANDRA-6602
 Works better for time series than ExploringCompactionPolicy
 Better temporal locality helps with reads
 Good choice for compressed full resolution and aggregated data.
 Available in 0.98.17, 1.2+, HDP-2.4 has it as well
24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploring Compaction + Max Size
 Set hbase.hstore.compaction.max.size
 This emulates Date-Tiered Compaction
 Preserves temporal locality of data – data point which are close will be stored in a same
file, distant ones – in separate files.
 Compaction works better with block cache
 More efficient caching of recent data is possible
 Good for most-recent-most-valuable data access pattern.
 Use it for compressed and aggregated data
 Helps to keep recent data in a block cache.
 ECPM
25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBASE-14496 Delayed compaction
 Files are eligible for minor compaction if their age > delay
 Good for application where most recent data is most valuable.
 Prevents block cache from trashing for recent data due to frequent minor compactions
of a fresh store files
 Will enable this feature for Exploring Compaction Policy
 Improves read latency for most recent data.
 ECP + Max +Delay (1-2 days) is good option for compressed full resolution and
aggregated data. ECPMD
 Patch available.
 HBase 1.0+ (can be back-ported to 0.98)
26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours
ECPM or DTCP
FIFO
ECPM or DTCP
27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Block Cache and Time Series
 Current policy (LRU) is not optimal for time-series applications
 We need something similar to FIFO (both in RAM and on SSD)
 We need support for TB size RAM/SSD-based caches
 Current off-heap bucket cache does not scale well (it keeps keys in Java heap)
 For SSD cache we could mirror most recent store files, thus providing FIFO semantics
w/o any complexity of disk-based cache management.
 This all above are work items for future, but today …
– Disable cache for raw data (prevent extreme cache churn)
– Enable cache on write/read for compressed data and aggregations
28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Flexible Retention Policies
Raw
Compressed
Aggregates
Hours Months Years
29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Read/Write IO Reduction
100
~50
~10
Base
FIFO+ECPM
+Compaction
30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Read/Write IO Reduction
100
~50
~10
Base
FIFO+ECPM
+Compaction
50-100MB/s
25-50MB/s
5-10MB/s
31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Read/Write IO Reduction (estimate for 250K/sec data points)
100
~50
~10
Base
FIFO+ECPM
+Compaction
50-100MB/s
25-50MB/s
5-10MB/s
32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary
 Disable major compaction
 Do not run HDFS balancer
 Disable HBase auto region balancing: balance_switch false
 Disable region splits (DisabledRegionSplitPolicy)
 Presplit table in advance.
 Have separate column families for raw, compressed and aggregated data (each
aggregate resolution – its own family)
 Increase hbase.hstore.blockingStoreFiles for all column families
 FIFO for Raw, ECPM(D) or DTCP (next session) for compressed and aggregated data
33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary (continued)
 Run periodically internal job (coprocessor) to compress data and produce time-based
rollup aggregations.
 Do not cache raw data, write/read cache for others (if ECPM(D))
 Enable WAL Compression - decrease write IO.
 Use maximum compression for Raw data (GZ) – decrease write IO.
34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Read Path (for 1%)
35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
SQL (Phoenix) integration
 Each time series has set of named attributes, which we call meta (tags in OpenTSDB)
 Keep time-series meta in Phoenix type table(s).
 Adding new time series, deleting time-series or updating time-series is DML/DDL
operation on a Phoenix table.
 Meta is static (mostly)
 Define set of attributes in meta which create PK
 Have PK translation to unique ID.
 Store ID, RTS (reversed time stamp), VALUE in HBase
 Now you can index time-series by any attribute(s) in Phoenix
 Query is two-step process: Phoenix first to select list of IDs, then HBase to run query on
ID list
36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Query Flow
ID Active Version … MFG
11 true 1.1 SA
12 true 1.3 SA
15 true 1.4 GE
17 true 1.1 GE
… … … … …
345 false 1.0 SA
Phoenix SQL
Time-Series Definition - META
ID Timestamp Value
11 143897653 10.0
12 143897753 11.3
15 143897953 11.6
17 143897853 11.9
… … …
345 143897753 11.0
HBase Time Series DB
Time-Series Data
2)GetAvgByIdSet(ID
set, now(), now() -
24h)
1)SELECT ID FROM META
WHERE MFG=‘SA’AND
Version = ‘1.1’
1. 2.
ID set
37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time-Series DB API
 Group operations on ID sets by time range
– Min, Max, Avg, Count, Sum, other aggregations
 Pluggable aggregation functions
 Support for different time resolutions
 With different approximations (linear, cubic, bi-cubic)
 Batch load support (for writes)
 Can be implemented in a HBase coprocessor layer
 Can work much-much faster than regular SQL DBMS
38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time-Series DB API
 Group operations on ID sets by time range
– Min, Max, Avg, Count, Sum, other aggregations
 Pluggable aggregation functions
 Support for different time resolutions
 With different approximations (linear, cubic, bi-cubic)
 Batch load support (for writes)
 Can be implemented in a HBase coprocessor layer
 Can work much-much faster than regular SQL DBMS
 Because we have already aggregated data
39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you
 Q&A

IoT:what about data storage?

  • 1.
    1 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT: what about data storage? Vladimir Rodionov Staff Software Engineer
  • 2.
    2 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points
  • 3.
    3 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series
  • 4.
    4 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
  • 5.
    5 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data
  • 6.
    6 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series
  • 7.
    7 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series  Do we have good time series data store?
  • 8.
    8 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series  Do we have good time series data store?  Open source?
  • 9.
    9 © HortonworksInc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series  Do we have good time series data store?  Open source?  But commercially supported?
  • 10.
    10 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 11.
    11 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 12.
    12 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache HBase  Open Source  Scalable  Distributed  NoSQL Data Store  Commercially supported  Temporal?
  • 13.
    13 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache HBase  Open Source  Scalable  Distributed  NoSQL Data Store  Commercially supported  Temporal? Sure, you can do temporal
  • 14.
    14 © HortonworksInc. 2011 – 2016. All Rights Reserved Apache HBase  Open Source  Scalable  Distributed  NoSQL Data Store  Commercially supported  Temporal? Sure, you can do temporal stuff!  Out of box?
  • 15.
    15 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 16.
    16 © HortonworksInc. 2011 – 2016. All Rights Reserved
  • 17.
    17 © HortonworksInc. 2011 – 2016. All Rights Reserved Time Series DB requirements  Data Store MUST preserve temporal locality of data for better in-memory caching  Data Store MUST provide efficient compression – Time – series are highly compressible (less than 2 bytes per data point in some cases) – Facebook custom compression codec produces less than 1.4 bytes per data point  Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg, min, max, etc., by min, hour, day and so on – configurable. Most of the time its aggregated data we are interested in.  Efficient caching policy (RAM/SSD)  SQL API (nice to have, but it is optional)  Support IoT use cases ( write/read ratio up to 99/1, millions ops)
  • 18.
    18 © HortonworksInc. 2011 – 2016. All Rights Reserved Ideal HBase Time Series DB  Keeps raw data for hours  Does not compact raw data at all  Preserves raw data in memory cache for periodic compactions and time-based rollup aggregations  Stores full resolution data only in compressed form  Has different TTL for different aggregation resolutions: – Days for by_min, by_10min etc. – Months, years for by_hour  Compaction should preserve temporal locality of both: full resolution data and aggregated data.  Integration with Phoenix (SQL)
  • 19.
    19 © HortonworksInc. 2011 – 2016. All Rights Reserved Write Path (for 99%)
  • 20.
    20 © HortonworksInc. 2011 – 2016. All Rights Reserved Time Series DB HBase Raw Events Region Server HDFS CF:Compressed CF:Raw CF:Aggregates C A C A Compressor Coprocessor Aggregator Coprocessor CF:Aggregates CF:Compressed – TTL days/months CF:Aggregates – TTL months/years (CF per resolution) CF:Raw – TTL hours
  • 21.
    21 © HortonworksInc. 2011 – 2016. All Rights Reserved HBASE-14468 FIFO compaction  First-In-First-Out  No compaction at all  TTL expired data just get archived  Ideal for raw data storage  No compaction – no block cache trashing  Raw data can be cached on write or on read  Sustains 100s MB/s write throughput per RS  Available 0.98.17, 1.1+, 1.2+, HDP-2.4+  Can be easily back ported to 1.0 (do we need this?)
  • 22.
    22 © HortonworksInc. 2011 – 2016. All Rights Reserved Exploring (Size-Tiered) Compaction  Does not preserve temporal locality of data.  Compaction trashes block cache  No efficient caching of data is possible  It hurts most-recent-most-valuable data access pattern.  Compression/Aggregation is very heavy.  To read back recent raw data and run it through compressor, many IO operations are required, because …  We can’t guarantee recent data in a block cache.
  • 23.
    23 © HortonworksInc. 2011 – 2016. All Rights Reserved HBASE-15181 Date Tiered Compaction  DateTieredCompactionPolicy  CASSANDRA-6602  Works better for time series than ExploringCompactionPolicy  Better temporal locality helps with reads  Good choice for compressed full resolution and aggregated data.  Available in 0.98.17, 1.2+, HDP-2.4 has it as well
  • 24.
    24 © HortonworksInc. 2011 – 2016. All Rights Reserved Exploring Compaction + Max Size  Set hbase.hstore.compaction.max.size  This emulates Date-Tiered Compaction  Preserves temporal locality of data – data point which are close will be stored in a same file, distant ones – in separate files.  Compaction works better with block cache  More efficient caching of recent data is possible  Good for most-recent-most-valuable data access pattern.  Use it for compressed and aggregated data  Helps to keep recent data in a block cache.  ECPM
  • 25.
    25 © HortonworksInc. 2011 – 2016. All Rights Reserved HBASE-14496 Delayed compaction  Files are eligible for minor compaction if their age > delay  Good for application where most recent data is most valuable.  Prevents block cache from trashing for recent data due to frequent minor compactions of a fresh store files  Will enable this feature for Exploring Compaction Policy  Improves read latency for most recent data.  ECP + Max +Delay (1-2 days) is good option for compressed full resolution and aggregated data. ECPMD  Patch available.  HBase 1.0+ (can be back-ported to 0.98)
  • 26.
    26 © HortonworksInc. 2011 – 2016. All Rights Reserved Time Series DB HBase Raw Events Region Server HDFS CF:Compressed CF:Raw CF:Aggregates C A C A Compressor Coprocessor Aggregator Coprocessor CF:Aggregates CF:Compressed – TTL days/months CF:Aggregates – TTL months/years (CF per resolution) CF:Raw – TTL hours ECPM or DTCP FIFO ECPM or DTCP
  • 27.
    27 © HortonworksInc. 2011 – 2016. All Rights Reserved HBase Block Cache and Time Series  Current policy (LRU) is not optimal for time-series applications  We need something similar to FIFO (both in RAM and on SSD)  We need support for TB size RAM/SSD-based caches  Current off-heap bucket cache does not scale well (it keeps keys in Java heap)  For SSD cache we could mirror most recent store files, thus providing FIFO semantics w/o any complexity of disk-based cache management.  This all above are work items for future, but today … – Disable cache for raw data (prevent extreme cache churn) – Enable cache on write/read for compressed data and aggregations
  • 28.
    28 © HortonworksInc. 2011 – 2016. All Rights Reserved Flexible Retention Policies Raw Compressed Aggregates Hours Months Years
  • 29.
    29 © HortonworksInc. 2011 – 2016. All Rights Reserved Read/Write IO Reduction 100 ~50 ~10 Base FIFO+ECPM +Compaction
  • 30.
    30 © HortonworksInc. 2011 – 2016. All Rights Reserved Read/Write IO Reduction 100 ~50 ~10 Base FIFO+ECPM +Compaction 50-100MB/s 25-50MB/s 5-10MB/s
  • 31.
    31 © HortonworksInc. 2011 – 2016. All Rights Reserved Read/Write IO Reduction (estimate for 250K/sec data points) 100 ~50 ~10 Base FIFO+ECPM +Compaction 50-100MB/s 25-50MB/s 5-10MB/s
  • 32.
    32 © HortonworksInc. 2011 – 2016. All Rights Reserved Summary  Disable major compaction  Do not run HDFS balancer  Disable HBase auto region balancing: balance_switch false  Disable region splits (DisabledRegionSplitPolicy)  Presplit table in advance.  Have separate column families for raw, compressed and aggregated data (each aggregate resolution – its own family)  Increase hbase.hstore.blockingStoreFiles for all column families  FIFO for Raw, ECPM(D) or DTCP (next session) for compressed and aggregated data
  • 33.
    33 © HortonworksInc. 2011 – 2016. All Rights Reserved Summary (continued)  Run periodically internal job (coprocessor) to compress data and produce time-based rollup aggregations.  Do not cache raw data, write/read cache for others (if ECPM(D))  Enable WAL Compression - decrease write IO.  Use maximum compression for Raw data (GZ) – decrease write IO.
  • 34.
    34 © HortonworksInc. 2011 – 2016. All Rights Reserved Read Path (for 1%)
  • 35.
    35 © HortonworksInc. 2011 – 2016. All Rights Reserved SQL (Phoenix) integration  Each time series has set of named attributes, which we call meta (tags in OpenTSDB)  Keep time-series meta in Phoenix type table(s).  Adding new time series, deleting time-series or updating time-series is DML/DDL operation on a Phoenix table.  Meta is static (mostly)  Define set of attributes in meta which create PK  Have PK translation to unique ID.  Store ID, RTS (reversed time stamp), VALUE in HBase  Now you can index time-series by any attribute(s) in Phoenix  Query is two-step process: Phoenix first to select list of IDs, then HBase to run query on ID list
  • 36.
    36 © HortonworksInc. 2011 – 2016. All Rights Reserved Query Flow ID Active Version … MFG 11 true 1.1 SA 12 true 1.3 SA 15 true 1.4 GE 17 true 1.1 GE … … … … … 345 false 1.0 SA Phoenix SQL Time-Series Definition - META ID Timestamp Value 11 143897653 10.0 12 143897753 11.3 15 143897953 11.6 17 143897853 11.9 … … … 345 143897753 11.0 HBase Time Series DB Time-Series Data 2)GetAvgByIdSet(ID set, now(), now() - 24h) 1)SELECT ID FROM META WHERE MFG=‘SA’AND Version = ‘1.1’ 1. 2. ID set
  • 37.
    37 © HortonworksInc. 2011 – 2016. All Rights Reserved Time-Series DB API  Group operations on ID sets by time range – Min, Max, Avg, Count, Sum, other aggregations  Pluggable aggregation functions  Support for different time resolutions  With different approximations (linear, cubic, bi-cubic)  Batch load support (for writes)  Can be implemented in a HBase coprocessor layer  Can work much-much faster than regular SQL DBMS
  • 38.
    38 © HortonworksInc. 2011 – 2016. All Rights Reserved Time-Series DB API  Group operations on ID sets by time range – Min, Max, Avg, Count, Sum, other aggregations  Pluggable aggregation functions  Support for different time resolutions  With different approximations (linear, cubic, bi-cubic)  Batch load support (for writes)  Can be implemented in a HBase coprocessor layer  Can work much-much faster than regular SQL DBMS  Because we have already aggregated data
  • 39.
    39 © HortonworksInc. 2011 – 2016. All Rights Reserved Thank you  Q&A