OpenTSDB 2.x
Distributed, Scalable Time Series Database
Benoit Sigoure tsunanet@gmail.com
Chris Larsen clarsen@yahoo-inc.com
Who We Are
Benoit Sigoure
● Created OpenTSDB at StumbleUpon
● Software Engineer @ Arista Networks
Chris Larsen
● Release manager for OpenTSDB 2.x
● Software Engineer @ Yahoo
What Is OpenTSDB?
● Open Source Time Series Database
● Store trillions of data points
● Sucks up all data and keeps going
● Never lose precision
● Scales using HBase
What good is it?
● Systems Monitoring & Measurement
o Servers
o Networks
● Sensor Data
o The Internet of Things
o SCADA
● Financial Data
● Scientific Experiment Results
Use Cases
● > 100 Region Servers ~ 30TB
● 60 TSDs
● 600,000 writes per second
● Primary data store for operational data
● Telemetry from StatsD, Ostrich and derived
data from Storm
Use Cases
● Monitoring application performance and
statistics
● 50 region servers, 2.4M writes/s ~ 200TB
● Multi-tenant and Kerberos secure HBase
● ~200k writes per second per TSD
● Central monitoring for all Yahoo properties
● Over a billion time series served
Some Other Users
● Box: 23 servers, 90K wps, System, app network, business metrics
● Limelight Networks: 8 servers, 30k wps, 24TB of data
● Ticketmaster: 13 servers, 90K wps, ~40GB a day
What Are Time Series?
● Time Series: data points for an identity
over time
● Typical Identity:
o Dotted string: web01.sys.cpu.user.0
● OpenTSDB Identity:
o Metric: sys.cpu.user
o Tags (name/value pairs):
host=web01 cpu=0
What Are Time Series?
Data Point:
● Metric + Tags
● + Value: 42
● + Timestamp: 1234567890
sys.cpu.user 1234567890 42 host=web01 cpu=0
^ a data point ^
How it Works
Writing Data
1) Open Telnet style socket, write:
put sys.cpu.user 1234567890 42 host=web01 cpu=0
2) ..or, post JSON to:
http://<host>:<port>/api/put
3) .. or import big files with CLI
● No schema definition
● No RRD file creation
● Just write!
Querying Data
● Graph with the GUI
● CLI tools
● HTTP API
● Aggregate multiple series
● Simple query language
To average all CPUs on host:
start=1h-ago
avg sys.cpu.user
host=web01
HBase Data Tables
● tsdb - Data point table. Massive
● tsdb-uid - Name to UID and UID to
name mappings
● tsdb-meta - Time series index and
meta-data
● tsdb-tree - Config and index for
hierarchical naming schema
Data Table Schema
● Row key is a concatenation of UIDs and time:
o metric + timestamp + tagk1 + tagv1… + tagkN + tagvN
● sys.cpu.user 1234567890 42 host=web01 cpu=0
x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02
● Timestamp normalized on 1 hour boundaries
● All data points for an hour are stored in one row
● Enables fast scans of all time series for a metric
● …or pass a row key filter for specific time series with
particular tags
Salting
1 Metric
+
Tag Cardinality > 1 Million
+
1 Write per Series, per Second
=
1 Sad Region
Salting
● Hash on the metric and tags
● Modulo into one of 20 buckets
● Prepend bucket ID to key
● Writes now hit 20 regions/servers
sys.cpu.user 1234567890 host=web01
x00x00x00x01x49x95xFBx70x00x00x01x00x00x01
sys.cpu.user 1234567890 host=web02
x01x00x00x01x49x95xFBx70x00x00x01x00x00x02
Salting
● 1M writes per second
to 50K per second =
But wait, there’s more!
● Query with 20
asynchronous
scanners...
20 Happy Regions
Salting
Appends
● Row holds one hour of data
o 60 columns @ 1 minute resolution
o 3,600 columns @ 1 second resolution
o 3,600,000 columns @ millisecond resolution
● 3.6M columns + row key
+ timestamp = network overhead
Appends
● Qualifiers encode
○ offset from row base time
○ data type (float or integer)
○ value length (1 to 8 bytes)
● Timestamp = 4 bytes
● Row key >= 13 bytes to 55 bytes
● 1 serialized column >= 20 to 69 bytes
[ 0b00001111, 0b11000111 ]
<---------------->^<->
delta (seconds) type value length
Appends
Compact it into one column!
● Concatenate qualifiers
● Concatenate values
● 1 row key, 1 timestamp
● 1 row from 69K to 14K, 83% savings
Qualifier: [ 0b00000000, 0b00000000, ... 0b00000000, 0b00010000 ]
Value: [ 0b00000001, ... 0b00000002 ]
Key: [ x00x00x00x01x49x95xFBx70x00x00x01x00x00x01 ]
Timestamp: [ x49x95xFBx70 ]
Appends
After each hour TSDs iterate over rows written:
● Read row from HBase (via a Get)
● Compact in memory
● Write new column to HBase (via a Put)
● Delete all old columns (via a MultiAction)
Drawbacks:
● Network traversal
● HBase RPC queue
● Cache busting
Appends
Try HBase Appends
● Qualifier special prefix
● Concatenate offsets and values in value array
● Still one key, one timestamp, one column
Qualifier: [ 0b00000004 ]
Value: [ 0b00000000, 0b00000000, 0b00000001,
<-Offset/type/length-> <-value ->
...0b00000000, 0b00010000, 0b00000002 ]
Appends
Reduces RPC count and network traffic but
increases region server CPU usage
<-------- Appends -------><-------------- Puts ------------->
Storage Exception Plugin
What to do during...
● Region Splits
● Region Moves
● Server Crashes
● AsyncHBase queues retries
NSREs
● Requeue into message buss:
Kafka
● Spool on disk: RocksDB
Downsampling
● Previous timestamps based on
first value
● Now snap to proper modulo
buckets
● Fill missing values with NaN, Null
or zeros during emission
New for OpenTSDB 2.1
● Improved compaction code path
● Last value query
● Meta table based lookup filtering
● Read/write TSD modes
● Preload UIDs
● fsck utility update
● CORS support
New for OpenTSDB 2.2
● Salting
● Appends
● Random UID Assignment
● NaN, Null or Zero fill policy during
downsampling
● Fully async query path
● Query tracking and statistics
● Additional thread, JVM and AsyncHBase stats
OpenTSDB Community
Bosun - Monitoring system based on OpenTSDB from the
folks at Stack Exchange
OpenTSDB Community
Graphana - Kibana and elasticsearch based front-end for
OpenTSDB, Graphite and InfluxDB
AsyncHBase 1.7
● AsyncHBase is a fully asynchronous, multi-
threaded HBase client
● Supports HBase 0.90 to 1.0
● Remains 2x faster than HTable in
PerformanceEvaluation
● Support for scanner filters, META prefetch,
“fail-fast” RPCs
New for 1.7
● Secure RPC support
● Per region client statistics
● RPC timeouts
● Atomic AppendRequest
● Unit tests!
GoHBase
GoHBase
● Prototype for a new pure-Go HBase client
● Apache 2.0 License
● https://github.com/tsuna/gohbase
● Still early on but feel free to help
The Future of OpenTSDB
The Future
● New query language with
support for
o Complex filters
o Cross metric expressions
o Data manipulation / Analysis
● Distributed queries
● Quality/Confidence measures
More Information
Thank you to everyone who has helped test, debug and add to OpenTSDB
2.1 and 2.2 including, but not limited to:
John Tamblin, Jesse Chang, Rajesh Gopalakrishna Pillai, Siddartha Guthikonda, Sean Miller, Aveek Misra,
Ashwin Ramachandrain, Francis Liu, Slawek Ligus, Gabriel Avellaneda, Nick Whitehead
● Contribute at github.com/OpenTSDB/opentsdb
● Website: opentsdb.net
● Documentation: opentsdb.net/docs/build/html
● Mailing List: groups.google.com/group/opentsdb
Images
● http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html
● http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg
● http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg
● http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG
● http://pixabay.com/en/compression-archiver-compress-149782/
● https://openclipart.org/detail/201862/kerberos-icon
● http://lego.cuusoo.com/ideas/view/96

HBaseCon 2015: OpenTSDB and AsyncHBase Update

  • 1.
    OpenTSDB 2.x Distributed, ScalableTime Series Database Benoit Sigoure tsunanet@gmail.com Chris Larsen clarsen@yahoo-inc.com
  • 2.
    Who We Are BenoitSigoure ● Created OpenTSDB at StumbleUpon ● Software Engineer @ Arista Networks Chris Larsen ● Release manager for OpenTSDB 2.x ● Software Engineer @ Yahoo
  • 3.
    What Is OpenTSDB? ●Open Source Time Series Database ● Store trillions of data points ● Sucks up all data and keeps going ● Never lose precision ● Scales using HBase
  • 4.
    What good isit? ● Systems Monitoring & Measurement o Servers o Networks ● Sensor Data o The Internet of Things o SCADA ● Financial Data ● Scientific Experiment Results
  • 5.
    Use Cases ● >100 Region Servers ~ 30TB ● 60 TSDs ● 600,000 writes per second ● Primary data store for operational data ● Telemetry from StatsD, Ostrich and derived data from Storm
  • 6.
    Use Cases ● Monitoringapplication performance and statistics ● 50 region servers, 2.4M writes/s ~ 200TB ● Multi-tenant and Kerberos secure HBase ● ~200k writes per second per TSD ● Central monitoring for all Yahoo properties ● Over a billion time series served
  • 7.
    Some Other Users ●Box: 23 servers, 90K wps, System, app network, business metrics ● Limelight Networks: 8 servers, 30k wps, 24TB of data ● Ticketmaster: 13 servers, 90K wps, ~40GB a day
  • 8.
    What Are TimeSeries? ● Time Series: data points for an identity over time ● Typical Identity: o Dotted string: web01.sys.cpu.user.0 ● OpenTSDB Identity: o Metric: sys.cpu.user o Tags (name/value pairs): host=web01 cpu=0
  • 9.
    What Are TimeSeries? Data Point: ● Metric + Tags ● + Value: 42 ● + Timestamp: 1234567890 sys.cpu.user 1234567890 42 host=web01 cpu=0 ^ a data point ^
  • 10.
  • 11.
    Writing Data 1) OpenTelnet style socket, write: put sys.cpu.user 1234567890 42 host=web01 cpu=0 2) ..or, post JSON to: http://<host>:<port>/api/put 3) .. or import big files with CLI ● No schema definition ● No RRD file creation ● Just write!
  • 12.
    Querying Data ● Graphwith the GUI ● CLI tools ● HTTP API ● Aggregate multiple series ● Simple query language To average all CPUs on host: start=1h-ago avg sys.cpu.user host=web01
  • 13.
    HBase Data Tables ●tsdb - Data point table. Massive ● tsdb-uid - Name to UID and UID to name mappings ● tsdb-meta - Time series index and meta-data ● tsdb-tree - Config and index for hierarchical naming schema
  • 14.
    Data Table Schema ●Row key is a concatenation of UIDs and time: o metric + timestamp + tagk1 + tagv1… + tagkN + tagvN ● sys.cpu.user 1234567890 42 host=web01 cpu=0 x00x00x01x49x95xFBx70x00x00x01x00x00x01x00x00x02x00x00x02 ● Timestamp normalized on 1 hour boundaries ● All data points for an hour are stored in one row ● Enables fast scans of all time series for a metric ● …or pass a row key filter for specific time series with particular tags
  • 15.
    Salting 1 Metric + Tag Cardinality> 1 Million + 1 Write per Series, per Second = 1 Sad Region
  • 16.
    Salting ● Hash onthe metric and tags ● Modulo into one of 20 buckets ● Prepend bucket ID to key ● Writes now hit 20 regions/servers sys.cpu.user 1234567890 host=web01 x00x00x00x01x49x95xFBx70x00x00x01x00x00x01 sys.cpu.user 1234567890 host=web02 x01x00x00x01x49x95xFBx70x00x00x01x00x00x02
  • 17.
    Salting ● 1M writesper second to 50K per second = But wait, there’s more! ● Query with 20 asynchronous scanners... 20 Happy Regions
  • 18.
  • 19.
    Appends ● Row holdsone hour of data o 60 columns @ 1 minute resolution o 3,600 columns @ 1 second resolution o 3,600,000 columns @ millisecond resolution ● 3.6M columns + row key + timestamp = network overhead
  • 20.
    Appends ● Qualifiers encode ○offset from row base time ○ data type (float or integer) ○ value length (1 to 8 bytes) ● Timestamp = 4 bytes ● Row key >= 13 bytes to 55 bytes ● 1 serialized column >= 20 to 69 bytes [ 0b00001111, 0b11000111 ] <---------------->^<-> delta (seconds) type value length
  • 21.
    Appends Compact it intoone column! ● Concatenate qualifiers ● Concatenate values ● 1 row key, 1 timestamp ● 1 row from 69K to 14K, 83% savings Qualifier: [ 0b00000000, 0b00000000, ... 0b00000000, 0b00010000 ] Value: [ 0b00000001, ... 0b00000002 ] Key: [ x00x00x00x01x49x95xFBx70x00x00x01x00x00x01 ] Timestamp: [ x49x95xFBx70 ]
  • 22.
    Appends After each hourTSDs iterate over rows written: ● Read row from HBase (via a Get) ● Compact in memory ● Write new column to HBase (via a Put) ● Delete all old columns (via a MultiAction) Drawbacks: ● Network traversal ● HBase RPC queue ● Cache busting
  • 23.
    Appends Try HBase Appends ●Qualifier special prefix ● Concatenate offsets and values in value array ● Still one key, one timestamp, one column Qualifier: [ 0b00000004 ] Value: [ 0b00000000, 0b00000000, 0b00000001, <-Offset/type/length-> <-value -> ...0b00000000, 0b00010000, 0b00000002 ]
  • 24.
    Appends Reduces RPC countand network traffic but increases region server CPU usage <-------- Appends -------><-------------- Puts ------------->
  • 25.
    Storage Exception Plugin Whatto do during... ● Region Splits ● Region Moves ● Server Crashes ● AsyncHBase queues retries NSREs ● Requeue into message buss: Kafka ● Spool on disk: RocksDB
  • 26.
    Downsampling ● Previous timestampsbased on first value ● Now snap to proper modulo buckets ● Fill missing values with NaN, Null or zeros during emission
  • 27.
    New for OpenTSDB2.1 ● Improved compaction code path ● Last value query ● Meta table based lookup filtering ● Read/write TSD modes ● Preload UIDs ● fsck utility update ● CORS support
  • 28.
    New for OpenTSDB2.2 ● Salting ● Appends ● Random UID Assignment ● NaN, Null or Zero fill policy during downsampling ● Fully async query path ● Query tracking and statistics ● Additional thread, JVM and AsyncHBase stats
  • 29.
    OpenTSDB Community Bosun -Monitoring system based on OpenTSDB from the folks at Stack Exchange
  • 30.
    OpenTSDB Community Graphana -Kibana and elasticsearch based front-end for OpenTSDB, Graphite and InfluxDB
  • 31.
    AsyncHBase 1.7 ● AsyncHBaseis a fully asynchronous, multi- threaded HBase client ● Supports HBase 0.90 to 1.0 ● Remains 2x faster than HTable in PerformanceEvaluation ● Support for scanner filters, META prefetch, “fail-fast” RPCs
  • 32.
    New for 1.7 ●Secure RPC support ● Per region client statistics ● RPC timeouts ● Atomic AppendRequest ● Unit tests!
  • 33.
  • 34.
    GoHBase ● Prototype fora new pure-Go HBase client ● Apache 2.0 License ● https://github.com/tsuna/gohbase ● Still early on but feel free to help
  • 35.
    The Future ofOpenTSDB
  • 36.
    The Future ● Newquery language with support for o Complex filters o Cross metric expressions o Data manipulation / Analysis ● Distributed queries ● Quality/Confidence measures
  • 37.
    More Information Thank youto everyone who has helped test, debug and add to OpenTSDB 2.1 and 2.2 including, but not limited to: John Tamblin, Jesse Chang, Rajesh Gopalakrishna Pillai, Siddartha Guthikonda, Sean Miller, Aveek Misra, Ashwin Ramachandrain, Francis Liu, Slawek Ligus, Gabriel Avellaneda, Nick Whitehead ● Contribute at github.com/OpenTSDB/opentsdb ● Website: opentsdb.net ● Documentation: opentsdb.net/docs/build/html ● Mailing List: groups.google.com/group/opentsdb Images ● http://photos.jdhancock.com/photo/2013-06-04-212438-the-lonely-vacuum-of-space.html ● http://en.wikipedia.org/wiki/File:Semi-automated-external-monitor-defibrillator.jpg ● http://upload.wikimedia.org/wikipedia/commons/1/17/Dining_table_for_two.jpg ● http://upload.wikimedia.org/wikipedia/commons/9/92/Easy_button.JPG ● http://pixabay.com/en/compression-archiver-compress-149782/ ● https://openclipart.org/detail/201862/kerberos-icon ● http://lego.cuusoo.com/ideas/view/96