SlideShare a Scribd company logo
1 of 17
TIME SERIES IN HBASE
Staff Software Engineer
VLADIMIR RODIONOV
2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
TIME SERIES
 Sequence of data points
 Triplet: [ID][TIME][VALUE] – basic
 Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE]
 Stock Closing Value DJIA
 User behavior (web clicks)
 Credit card transactions
 Health data
 Fitness indicators
 Sensor data (IoT)
 Application and system metrics - ODS
3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series DB requirements
 Data Store MUST preserve temporal locality of data for better in-memory caching
– Facebook ODS : 85% queries are for last 26 hours
 Data Store MUST provide efficient compression
– Time – series are highly compressible (less than 2 bytes per data point in some cases)
– Facebook custom compression codec produces less than 1.4 bytes per data point
 Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg,
min, max, etc., by min, hour, day and so on – configurable. Most of the time its
aggregated data we are interested in.
 Efficient caching policy (RAM/SSD), presumably FIFO
 SQL API (nice to have, but it is optional)
4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
OpenTSDB 2.x
 Data Store MUST preserve temporal locality of data for better in-memory caching – NO
– Size-Tiered HBase compaction does not preserve temporal locality of data. Major compaction
creates single file, for example, where recent data is stored with data which is months or years old.
– Compaction trashes block cache as well decreases read performance and increases latencies.
 Data Store MUST provide efficient compression – NO
– OpenTSDB supports compression, but its very heavy (runs externally) and usually users disable it in
production.
 Data Store MUST provide automatic time-based rollup aggregations – NOT
IMPLEMENTED
 SQL – Not supported
5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Ideal HBase Time Series DB
 Keeps raw data for hours
 Does not compact raw data at all
 Preserves raw data in memory cache for periodic compactions and time-based rollup
aggregations
 Stores full resolution data only in compressed form
 Has different TTL for different aggregation resolutions:
– Days for by_min, by_10min etc.
– Months, years for by_hour
 Compaction should preserve temporal locality of both: full resolution data and
aggregated data.
 FIFO block cache
 Integration with Phoenix (SQL)
6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours
7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBASE-14468 FIFO compaction
 First-In-First-Out
 No compaction at all
 TTL expired data just get archived
 Ideal for raw data storage
 No compaction – no block cache trashing
 Raw data can be cached on write or on read
 Sustains 100s MB/s write throughput per RS
 Available 0.98.17, 1.2+, HDP-2.4+
 Can be easily back ported to 1.0-1.1
8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploring (Size-Tiered) Compaction
 Does not preserve temporal locality of data.
 Compaction trashes block cache
 No efficient caching of data is possible
 It hurts most-recent-most-valuable data access pattern.
 Compression/Aggregation is very heavy.
 To read back recent raw data and run it through compressor, many IO operations are
required, because …
 We can’t guarantee recent data in a block cache.
9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBASE-15181 Date Tiered Compaction
 DateTieredCompactionPolicy
 CASSANDRA-6602
 Works better for time series than ExploringCompactionPolicy
 Better temporal locality helps with reads
 Good choice for compressed full resolution and aggregated data.
 Available in 0.98.17, 1.2+, HDP-2.4 will have it as well
 But, too many knobs to control
10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Date Tiered Compaction Policy
11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Exploring Compaction + Max Size
 Set hbase.hstore.compaction.max.size
 This efficiently emulates Date-Tiered Compaction
 Preserves temporal locality of data.
 Compaction does not trash block cache
 Efficient caching of recent data is possible
 Good for most-recent-most-valuable data access pattern.
 Use it for compressed and aggregated data
 Helps to keep recent data in a block cache.
 ECPM
12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBASE-14496 Delayed compaction
 Files are eligible for minor compaction if their age > delay
 Good for application where most recent data is most valuable.
 Prevents block cache from trashing for recent data due to frequent minor compactions
of a fresh store files
 Will enable this feature for Exploring Compaction Policy
 Improves read latency for most recent data.
 ECP + Max +Delay (1-2 days) is good option for compressed full resolution and
aggregated data. ECPMD
 Patch available.
 HBase 1.0+ (can be back-ported to 0.98)
13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Time Series DB HBase
Raw Events
Region Server
HDFS
CF:Compressed
CF:Raw
CF:Aggregates
C
A
C
A
Compressor Coprocessor
Aggregator Coprocessor
CF:Aggregates
CF:Compressed – TTL days/months
CF:Aggregates – TTL months/years (CF per resolution)
CF:Raw – TTL hours
ECPM(D)
FIFO
ECPM(D)
14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
HBase Block Cache
 Current policy (LRU) is not optimal for time-series applications
 We need something similar to FIFO (both in RAM and on SSD)
 We need support for TB size RAM/SSD-based caches
 Current off-heap bucket cache does not scale well (it keeps keys in Java heap)
 For SSD cache we could mirror most recent store files, thus providing FIFO semantics
w/o any complexity of disk-based cache management.
 Today …
– Disable cache for raw data
– Enable cache on write/read for compressed data and aggregations
15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary
 Disable major compaction
 Do not run HDFS balancer
 Disable HBase auto region balancing: balance_switch false
 Disable region splits (DisabledRegionSplitPolicy)
 Presplit table in advance.
 Have separate column families for raw, compressed and aggregated data (each
aggregate resolution – its own family)
 Increase hbase.hstore.blockingStoreFiles for all column families
 FIFO for Raw, ECPM(D) or DTCP for compressed and aggregations
16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Summary (continued)
 Run periodically internal job (coprocessor) to compress data and produce time-based
rollup aggregations.
 Do not cache raw data, write/read cache for others (if ECPM(D))
 Enable WAL Compression, use maximum compression for Raw data.
17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved
Thank you
 Q&A

More Related Content

What's hot

Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 

What's hot (20)

HBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ SalesforceHBaseCon 2015: HBase Performance Tuning @ Salesforce
HBaseCon 2015: HBase Performance Tuning @ Salesforce
 
File Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & ParquetFile Format Benchmark - Avro, JSON, ORC & Parquet
File Format Benchmark - Avro, JSON, ORC & Parquet
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
 
Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive

Apache Kudu: Technical Deep Dive


Apache Kudu: Technical Deep Dive


 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and Hudi
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
 
HBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBaseHBase and HDFS: Understanding FileSystem Usage in HBase
HBase and HDFS: Understanding FileSystem Usage in HBase
 
HBase Application Performance Improvement
HBase Application Performance ImprovementHBase Application Performance Improvement
HBase Application Performance Improvement
 
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data AnalyticsApache Ignite vs Alluxio: Memory Speed Big Data Analytics
Apache Ignite vs Alluxio: Memory Speed Big Data Analytics
 
What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?What is new in Apache Hive 3.0?
What is new in Apache Hive 3.0?
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 
ORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big DataORC File - Optimizing Your Big Data
ORC File - Optimizing Your Big Data
 
Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem Apache NiFi in the Hadoop Ecosystem
Apache NiFi in the Hadoop Ecosystem
 
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested ColumnsMaterialized Column: An Efficient Way to Optimize Queries on Nested Columns
Materialized Column: An Efficient Way to Optimize Queries on Nested Columns
 
Facebook Messages & HBase
Facebook Messages & HBaseFacebook Messages & HBase
Facebook Messages & HBase
 
Ozone- Object store for Apache Hadoop
Ozone- Object store for Apache HadoopOzone- Object store for Apache Hadoop
Ozone- Object store for Apache Hadoop
 
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBaseHBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
HBaseCon 2013: Apache HBase and HDFS - Understanding Filesystem Usage in HBase
 

Viewers also liked

Viewers also liked (6)

Apache HBase: State of the Union
Apache HBase: State of the UnionApache HBase: State of the Union
Apache HBase: State of the Union
 
Date-tiered Compaction Policy for Time-series Data
Date-tiered Compaction Policy for Time-series DataDate-tiered Compaction Policy for Time-series Data
Date-tiered Compaction Policy for Time-series Data
 
Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future Apache Spark on Apache HBase: Current and Future
Apache Spark on Apache HBase: Current and Future
 
Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search Improvements to Apache HBase and Its Applications in Alibaba Search
Improvements to Apache HBase and Its Applications in Alibaba Search
 
Apache HBase - Just the Basics
Apache HBase - Just the BasicsApache HBase - Just the Basics
Apache HBase - Just the Basics
 
Apache HBase at Airbnb
Apache HBase at Airbnb Apache HBase at Airbnb
Apache HBase at Airbnb
 

Similar to Time-Series Apache HBase

Similar to Time-Series Apache HBase (20)

IoT:what about data storage?
IoT:what about data storage?IoT:what about data storage?
IoT:what about data storage?
 
Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...Dancing elephants - efficiently working with object stores from Apache Spark ...
Dancing elephants - efficiently working with object stores from Apache Spark ...
 
Apache Hive ACID Project
Apache Hive ACID ProjectApache Hive ACID Project
Apache Hive ACID Project
 
ACID Transactions in Hive
ACID Transactions in HiveACID Transactions in Hive
ACID Transactions in Hive
 
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
Dancing Elephants - Efficiently Working with Object Stores from Apache Spark ...
 
LLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in HiveLLAP: Sub-Second Analytical Queries in Hive
LLAP: Sub-Second Analytical Queries in Hive
 
Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016Hive ACID Apache BigData 2016
Hive ACID Apache BigData 2016
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Apache Hive on ACID
Apache Hive on ACIDApache Hive on ACID
Apache Hive on ACID
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 
Evolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage SubsystemEvolving HDFS to a Generalized Storage Subsystem
Evolving HDFS to a Generalized Storage Subsystem
 
Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019Hive Performance Dataworks Summit Melbourne February 2019
Hive Performance Dataworks Summit Melbourne February 2019
 
Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?Fast SQL on Hadoop, Really?
Fast SQL on Hadoop, Really?
 
Operating and supporting HBase Clusters
Operating and supporting HBase ClustersOperating and supporting HBase Clusters
Operating and supporting HBase Clusters
 
Operating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and ImprovementsOperating and Supporting Apache HBase Best Practices and Improvements
Operating and Supporting Apache HBase Best Practices and Improvements
 
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...Building a high-performance data lake analytics engine at Alibaba Cloud with ...
Building a high-performance data lake analytics engine at Alibaba Cloud with ...
 
Hadoop project design and a usecase
Hadoop project design and  a usecaseHadoop project design and  a usecase
Hadoop project design and a usecase
 
How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...How the Development Bank of Singapore solves on-prem compute capacity challen...
How the Development Bank of Singapore solves on-prem compute capacity challen...
 
VM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache PrefetchingVM-aware Adaptive Storage Cache Prefetching
VM-aware Adaptive Storage Cache Prefetching
 

More from HBaseCon

More from HBaseCon (20)

hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kuberneteshbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
hbaseconasia2017: Building online HBase cluster of Zhihu based on Kubernetes
 
hbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beamhbaseconasia2017: HBase on Beam
hbaseconasia2017: HBase on Beam
 
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huaweihbaseconasia2017: HBase Disaster Recovery Solution at Huawei
hbaseconasia2017: HBase Disaster Recovery Solution at Huawei
 
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinteresthbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
hbaseconasia2017: Removable singularity: a story of HBase upgrade in Pinterest
 
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
hbaseconasia2017: HareQL:快速HBase查詢工具的發展過程
 
hbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Neteasehbaseconasia2017: Apache HBase at Netease
hbaseconasia2017: Apache HBase at Netease
 
hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践hbaseconasia2017: HBase在Hulu的使用和实践
hbaseconasia2017: HBase在Hulu的使用和实践
 
hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台hbaseconasia2017: 基于HBase的企业级大数据平台
hbaseconasia2017: 基于HBase的企业级大数据平台
 
hbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.comhbaseconasia2017: HBase at JD.com
hbaseconasia2017: HBase at JD.com
 
hbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecturehbaseconasia2017: Large scale data near-line loading method and architecture
hbaseconasia2017: Large scale data near-line loading method and architecture
 
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huaweihbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
hbaseconasia2017: Ecosystems with HBase and CloudTable service at Huawei
 
hbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMihbaseconasia2017: HBase Practice At XiaoMi
hbaseconasia2017: HBase Practice At XiaoMi
 
hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0hbaseconasia2017: hbase-2.0.0
hbaseconasia2017: hbase-2.0.0
 
HBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBaseHBaseCon2017 Democratizing HBase
HBaseCon2017 Democratizing HBase
 
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in PinterestHBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
HBaseCon2017 Removable singularity: a story of HBase upgrade in Pinterest
 
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBaseHBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
HBaseCon2017 Quanta: Quora's hierarchical counting system on HBase
 
HBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBaseHBaseCon2017 Transactions in HBase
HBaseCon2017 Transactions in HBase
 
HBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBaseHBaseCon2017 Highly-Available HBase
HBaseCon2017 Highly-Available HBase
 
HBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at DidiHBaseCon2017 Apache HBase at Didi
HBaseCon2017 Apache HBase at Didi
 
HBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase ClientHBaseCon2017 gohbase: Pure Go HBase Client
HBaseCon2017 gohbase: Pure Go HBase Client
 

Recently uploaded

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
chiefasafspells
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
masabamasaba
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 

Recently uploaded (20)

Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
Love witchcraft +27768521739 Binding love spell in Sandy Springs, GA |psychic...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?WSO2CON 2024 - Does Open Source Still Matter?
WSO2CON 2024 - Does Open Source Still Matter?
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
WSO2Con2024 - From Code To Cloud: Fast Track Your Cloud Native Journey with C...
 
Artyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptxArtyushina_Guest lecture_YorkU CS May 2024.pptx
Artyushina_Guest lecture_YorkU CS May 2024.pptx
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 
What Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the SituationWhat Goes Wrong with Language Definitions and How to Improve the Situation
What Goes Wrong with Language Definitions and How to Improve the Situation
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 

Time-Series Apache HBase

  • 1. TIME SERIES IN HBASE Staff Software Engineer VLADIMIR RODIONOV
  • 2. 2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved TIME SERIES  Sequence of data points  Triplet: [ID][TIME][VALUE] – basic  Multiplet: [ID][TIME][TAG1][…][TAGN][VALUE]  Stock Closing Value DJIA  User behavior (web clicks)  Credit card transactions  Health data  Fitness indicators  Sensor data (IoT)  Application and system metrics - ODS
  • 3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series DB requirements  Data Store MUST preserve temporal locality of data for better in-memory caching – Facebook ODS : 85% queries are for last 26 hours  Data Store MUST provide efficient compression – Time – series are highly compressible (less than 2 bytes per data point in some cases) – Facebook custom compression codec produces less than 1.4 bytes per data point  Data Store MUST provide automatic time-based rollup aggregations: sum, count, avg, min, max, etc., by min, hour, day and so on – configurable. Most of the time its aggregated data we are interested in.  Efficient caching policy (RAM/SSD), presumably FIFO  SQL API (nice to have, but it is optional)
  • 4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved OpenTSDB 2.x  Data Store MUST preserve temporal locality of data for better in-memory caching – NO – Size-Tiered HBase compaction does not preserve temporal locality of data. Major compaction creates single file, for example, where recent data is stored with data which is months or years old. – Compaction trashes block cache as well decreases read performance and increases latencies.  Data Store MUST provide efficient compression – NO – OpenTSDB supports compression, but its very heavy (runs externally) and usually users disable it in production.  Data Store MUST provide automatic time-based rollup aggregations – NOT IMPLEMENTED  SQL – Not supported
  • 5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Ideal HBase Time Series DB  Keeps raw data for hours  Does not compact raw data at all  Preserves raw data in memory cache for periodic compactions and time-based rollup aggregations  Stores full resolution data only in compressed form  Has different TTL for different aggregation resolutions: – Days for by_min, by_10min etc. – Months, years for by_hour  Compaction should preserve temporal locality of both: full resolution data and aggregated data.  FIFO block cache  Integration with Phoenix (SQL)
  • 6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series DB HBase Raw Events Region Server HDFS CF:Compressed CF:Raw CF:Aggregates C A C A Compressor Coprocessor Aggregator Coprocessor CF:Aggregates CF:Compressed – TTL days/months CF:Aggregates – TTL months/years (CF per resolution) CF:Raw – TTL hours
  • 7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBASE-14468 FIFO compaction  First-In-First-Out  No compaction at all  TTL expired data just get archived  Ideal for raw data storage  No compaction – no block cache trashing  Raw data can be cached on write or on read  Sustains 100s MB/s write throughput per RS  Available 0.98.17, 1.2+, HDP-2.4+  Can be easily back ported to 1.0-1.1
  • 8. 8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Exploring (Size-Tiered) Compaction  Does not preserve temporal locality of data.  Compaction trashes block cache  No efficient caching of data is possible  It hurts most-recent-most-valuable data access pattern.  Compression/Aggregation is very heavy.  To read back recent raw data and run it through compressor, many IO operations are required, because …  We can’t guarantee recent data in a block cache.
  • 9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBASE-15181 Date Tiered Compaction  DateTieredCompactionPolicy  CASSANDRA-6602  Works better for time series than ExploringCompactionPolicy  Better temporal locality helps with reads  Good choice for compressed full resolution and aggregated data.  Available in 0.98.17, 1.2+, HDP-2.4 will have it as well  But, too many knobs to control
  • 10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Date Tiered Compaction Policy
  • 11. 11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Exploring Compaction + Max Size  Set hbase.hstore.compaction.max.size  This efficiently emulates Date-Tiered Compaction  Preserves temporal locality of data.  Compaction does not trash block cache  Efficient caching of recent data is possible  Good for most-recent-most-valuable data access pattern.  Use it for compressed and aggregated data  Helps to keep recent data in a block cache.  ECPM
  • 12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBASE-14496 Delayed compaction  Files are eligible for minor compaction if their age > delay  Good for application where most recent data is most valuable.  Prevents block cache from trashing for recent data due to frequent minor compactions of a fresh store files  Will enable this feature for Exploring Compaction Policy  Improves read latency for most recent data.  ECP + Max +Delay (1-2 days) is good option for compressed full resolution and aggregated data. ECPMD  Patch available.  HBase 1.0+ (can be back-ported to 0.98)
  • 13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Time Series DB HBase Raw Events Region Server HDFS CF:Compressed CF:Raw CF:Aggregates C A C A Compressor Coprocessor Aggregator Coprocessor CF:Aggregates CF:Compressed – TTL days/months CF:Aggregates – TTL months/years (CF per resolution) CF:Raw – TTL hours ECPM(D) FIFO ECPM(D)
  • 14. 14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved HBase Block Cache  Current policy (LRU) is not optimal for time-series applications  We need something similar to FIFO (both in RAM and on SSD)  We need support for TB size RAM/SSD-based caches  Current off-heap bucket cache does not scale well (it keeps keys in Java heap)  For SSD cache we could mirror most recent store files, thus providing FIFO semantics w/o any complexity of disk-based cache management.  Today … – Disable cache for raw data – Enable cache on write/read for compressed data and aggregations
  • 15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary  Disable major compaction  Do not run HDFS balancer  Disable HBase auto region balancing: balance_switch false  Disable region splits (DisabledRegionSplitPolicy)  Presplit table in advance.  Have separate column families for raw, compressed and aggregated data (each aggregate resolution – its own family)  Increase hbase.hstore.blockingStoreFiles for all column families  FIFO for Raw, ECPM(D) or DTCP for compressed and aggregations
  • 16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Summary (continued)  Run periodically internal job (coprocessor) to compress data and produce time-based rollup aggregations.  Do not cache raw data, write/read cache for others (if ECPM(D))  Enable WAL Compression, use maximum compression for Raw data.
  • 17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Thank you  Q&A