Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

IoT:what about data storage?

673 views

Published on

IoT:what about data storage?

Published in: Technology
  • Be the first to comment

IoT:what about data storage?

  1. 1. 1 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT: what about data storage? Vladimir Rodionov Staff Software Engineer
  2. 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points
  3. 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series
  4. 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags
  5. 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data
  6. 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series
  7. 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series  Do we have good time series data store?
  8. 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series  Do we have good time series data store?  Open source?
  9. 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved IoT data stream  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic time-series  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE] – time-series with tags  Sometimes with location – spatial data  But, strictly time-series  Do we have good time series data store?  Open source?  But commercially supported?
  10. 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  11. 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  12. 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache HBase  Open Source  Scalable  Distributed  NoSQL Data Store  Commercially supported  Temporal?
  13. 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache HBase  Open Source  Scalable  Distributed  NoSQL Data Store  Commercially supported  Temporal? Sure, you can do temporal
  14. 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache HBase  Open Source  Scalable  Distributed  NoSQL Data Store  Commercially supported  Temporal? Sure, you can do temporal stuff!  Out of box?
  15. 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  16. 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
  17. 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series DB requirements  Data Store MUST preserve temporal locality of data for better in-memory caching  Data Store MUST provide efficient compression – Time – series are highly compressible (less than 2 bytes per data point in some cases) – Facebook custom compression codec produces less than 1.4 bytes per data point  Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg, min, max, etc., by min, hour, day and so on – configurable. Most of the time its aggregated data we are interested in.  Efficient caching policy (RAM/SSD)  SQL API (nice to have, but it is optional)  Support IoT use cases ( write/read ratio up to 99/1, millions ops)
  18. 18. 18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ideal HBase Time Series DB  Keeps raw data for hours  Does not compact raw data at all  Preserves raw data in memory cache for periodic compactions and time-based rollup aggregations  Stores full resolution data only in compressed form  Has different TTL for different aggregation resolutions: – Days for by_min, by_10min etc. – Months, years for by_hour  Compaction should preserve temporal locality of both: full resolution data and aggregated data.  Integration with Phoenix (SQL)
  19. 19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Write Path (for 99%)
  20. 20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series DB HBase Raw Events Region Server HDFS CF:Compressed CF:Raw CF:Aggregates C A C A Compressor Coprocessor Aggregator Coprocessor CF:Aggregates CF:Compressed – TTL days/months CF:Aggregates – TTL months/years (CF per resolution) CF:Raw – TTL hours
  21. 21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBASE-14468 FIFO compaction  First-In-First-Out  No compaction at all  TTL expired data just get archived  Ideal for raw data storage  No compaction – no block cache trashing  Raw data can be cached on write or on read  Sustains 100s MB/s write throughput per RS  Available 0.98.17, 1.1+, 1.2+, HDP-2.4+  Can be easily back ported to 1.0 (do we need this?)
  22. 22. 22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Exploring (Size-Tiered) Compaction  Does not preserve temporal locality of data.  Compaction trashes block cache  No efficient caching of data is possible  It hurts most-recent-most-valuable data access pattern.  Compression/Aggregation is very heavy.  To read back recent raw data and run it through compressor, many IO operations are required, because …  We can’t guarantee recent data in a block cache.
  23. 23. 23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBASE-15181 Date Tiered Compaction  DateTieredCompactionPolicy  CASSANDRA-6602  Works better for time series than ExploringCompactionPolicy  Better temporal locality helps with reads  Good choice for compressed full resolution and aggregated data.  Available in 0.98.17, 1.2+, HDP-2.4 has it as well
  24. 24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Exploring Compaction + Max Size  Set hbase.hstore.compaction.max.size  This emulates Date-Tiered Compaction  Preserves temporal locality of data – data point which are close will be stored in a same file, distant ones – in separate files.  Compaction works better with block cache  More efficient caching of recent data is possible  Good for most-recent-most-valuable data access pattern.  Use it for compressed and aggregated data  Helps to keep recent data in a block cache.  ECPM
  25. 25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBASE-14496 Delayed compaction  Files are eligible for minor compaction if their age > delay  Good for application where most recent data is most valuable.  Prevents block cache from trashing for recent data due to frequent minor compactions of a fresh store files  Will enable this feature for Exploring Compaction Policy  Improves read latency for most recent data.  ECP + Max +Delay (1-2 days) is good option for compressed full resolution and aggregated data. ECPMD  Patch available.  HBase 1.0+ (can be back-ported to 0.98)
  26. 26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series DB HBase Raw Events Region Server HDFS CF:Compressed CF:Raw CF:Aggregates C A C A Compressor Coprocessor Aggregator Coprocessor CF:Aggregates CF:Compressed – TTL days/months CF:Aggregates – TTL months/years (CF per resolution) CF:Raw – TTL hours ECPM or DTCP FIFO ECPM or DTCP
  27. 27. 27 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Block Cache and Time Series  Current policy (LRU) is not optimal for time-series applications  We need something similar to FIFO (both in RAM and on SSD)  We need support for TB size RAM/SSD-based caches  Current off-heap bucket cache does not scale well (it keeps keys in Java heap)  For SSD cache we could mirror most recent store files, thus providing FIFO semantics w/o any complexity of disk-based cache management.  This all above are work items for future, but today … – Disable cache for raw data (prevent extreme cache churn) – Enable cache on write/read for compressed data and aggregations
  28. 28. 28 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Flexible Retention Policies Raw Compressed Aggregates Hours Months Years
  29. 29. 29 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Read/Write IO Reduction 100 ~50 ~10 Base FIFO+ECPM +Compaction
  30. 30. 30 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Read/Write IO Reduction 100 ~50 ~10 Base FIFO+ECPM +Compaction 50-100MB/s 25-50MB/s 5-10MB/s
  31. 31. 31 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Read/Write IO Reduction (estimate for 250K/sec data points) 100 ~50 ~10 Base FIFO+ECPM +Compaction 50-100MB/s 25-50MB/s 5-10MB/s
  32. 32. 32 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary  Disable major compaction  Do not run HDFS balancer  Disable HBase auto region balancing: balance_switch false  Disable region splits (DisabledRegionSplitPolicy)  Presplit table in advance.  Have separate column families for raw, compressed and aggregated data (each aggregate resolution – its own family)  Increase hbase.hstore.blockingStoreFiles for all column families  FIFO for Raw, ECPM(D) or DTCP (next session) for compressed and aggregated data
  33. 33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary (continued)  Run periodically internal job (coprocessor) to compress data and produce time-based rollup aggregations.  Do not cache raw data, write/read cache for others (if ECPM(D))  Enable WAL Compression - decrease write IO.  Use maximum compression for Raw data (GZ) – decrease write IO.
  34. 34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Read Path (for 1%)
  35. 35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SQL (Phoenix) integration  Each time series has set of named attributes, which we call meta (tags in OpenTSDB)  Keep time-series meta in Phoenix type table(s).  Adding new time series, deleting time-series or updating time-series is DML/DDL operation on a Phoenix table.  Meta is static (mostly)  Define set of attributes in meta which create PK  Have PK translation to unique ID.  Store ID, RTS (reversed time stamp), VALUE in HBase  Now you can index time-series by any attribute(s) in Phoenix  Query is two-step process: Phoenix first to select list of IDs, then HBase to run query on ID list
  36. 36. 36 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query Flow ID Active Version … MFG 11 true 1.1 SA 12 true 1.3 SA 15 true 1.4 GE 17 true 1.1 GE … … … … … 345 false 1.0 SA Phoenix SQL Time-Series Definition - META ID Timestamp Value 11 143897653 10.0 12 143897753 11.3 15 143897953 11.6 17 143897853 11.9 … … … 345 143897753 11.0 HBase Time Series DB Time-Series Data 2)GetAvgByIdSet(ID set, now(), now() - 24h) 1)SELECT ID FROM META WHERE MFG=‘SA’AND Version = ‘1.1’ 1. 2. ID set
  37. 37. 37 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time-Series DB API  Group operations on ID sets by time range – Min, Max, Avg, Count, Sum, other aggregations  Pluggable aggregation functions  Support for different time resolutions  With different approximations (linear, cubic, bi-cubic)  Batch load support (for writes)  Can be implemented in a HBase coprocessor layer  Can work much-much faster than regular SQL DBMS
  38. 38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time-Series DB API  Group operations on ID sets by time range – Min, Max, Avg, Count, Sum, other aggregations  Pluggable aggregation functions  Support for different time resolutions  With different approximations (linear, cubic, bi-cubic)  Batch load support (for writes)  Can be implemented in a HBase coprocessor layer  Can work much-much faster than regular SQL DBMS  Because we have already aggregated data
  39. 39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you  Q&A

×