Working with time series
data with InfluxDB
Paul Dix
@pauldix
paul@influxdb.com
What is time series
data?
Stock trades and quotes
Metrics
Analytics
Events
Sensor data
Two kinds of time series
data…
Regular time series
t0 t1 t2 t3 t4 t6 t7
Samples at regular intervals
Irregular time series
t0 t1 t2 t3 t4 t6 t7
Events whenever they come in
Inducing a regular time series
from an irregular one
query: select count(customer_id) from events
where time > now() - 1h
group by time(1m), customer_id
Data that you ask
questions about over time
InfluxDB is an open
source distributed time
series database
* still working on the distributed part
Why would you want a
database for time series
data?
Scale
Example from DevOps
• 2,000 servers, VMs, containers, or sensor units
• 200 measurements per server/unit
• every 10 seconds
• = 3,456,000,000 distinct points per day
Sharding Data
usually requires application level code
Data retention
application level code and sharding
Rollups and
aggregation
InfluxDB features
SQL style query
language
Retention policies
automatically managed data retention
Continuous queries
for rollups and aggregation
HTTP API - 2 endpoints
HTTP API - 2 endpoints
/write?db=mydb&rp=fooWrite: HTTP POST
HTTP API - 2 endpoints
/write?db=mydb&rp=foo
/query?db=mydb&rp=foo&q=
Write: HTTP POST
Read: HTTP GET
InfluxDB Schema
• Measurements (e.g. cpu, temperature, event,
memory)
InfluxDB Schema
• Measurements (e.g. cpu, temperature, event,
memory)
• Tags (e.g. region=uswest, host=serverA,
sensor=23)
InfluxDB Schema
• Measurements (e.g. cpu, temperature, event,
memory)
• Tags (e.g. region=uswest, host=serverA,
sensor=23)
• Fields (e.g. value=23.2, info=‘this is some extra
stuff`, present=true)
InfluxDB Schema
• Measurements (e.g. cpu, temperature, event,
memory)
• Tags (e.g. region=uswest, host=serverA,
sensor=23)
• Fields (e.g. value=23.2, info=‘this is some extra
stuff`, present=true)
• Timestamp (nano-second epoch)
All data is indexed by
measurement, tagset,
and time
Influx CLI
$ ./influx
Connected to http://localhost:8086 version 0.9
InfluxDB shell 0.9
>
Create a database
CREATE DATABASE foo
Create a retention policy
CREATE RETENTION POLICY <rp-name> ON <db-name>
DURATION <duration> REPLICATION <n> [DEFAULT]
Create a retention policy
CREATE RETENTION POLICY <rp-name> ON <db-name>
DURATION <duration> REPLICATION <n> [DEFAULT]
CREATE RETENTION POLICY high_precision ON mydb
DURATION 7d REPLICATION 3 DEFAULT
Create a retention policy
CREATE RETENTION POLICY <rp-name> ON <db-name>
DURATION <duration> REPLICATION <n> [DEFAULT]
CREATE RETENTION POLICY high_precision ON mydb
DURATION 7d REPLICATION 3 DEFAULT
Writes will go into this RP unless
otherwise specified
Discovery
Inverted index
of measurements and tags
Discovery
SHOW MEASUREMENTs
Discovery
SHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
Discovery
SHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
Discovery
SHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
Discovery
SHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
SHOW TAG VALUES from CPU WITH KEY = 'region'
Discovery
SHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
SHOW TAG VALUES from CPU WITH KEY = 'region'
SHOW SERIES
Discovery
SHOW MEASUREMENTs
SHOW MEASUREMENTS where host = 'serverA'
SHOW TAG KEYS
SHOW TAG KEYS from CPU
SHOW TAG VALUES from CPU WITH KEY = 'region'
SHOW SERIES
SHOW SERIES where service = 'redis'
Queries
SQL-ish
select * from some_series
where time > now() - 1h
Aggregates
select percentile(90, value) from cpu
where time > now() - 1d
group by time(10m)
Aggregates
select percentile(90, value) from cpu
where time > now() - 1d
group by time(10m), region
Group by a tag
Where against Regex (field)
select value from some_log_series
where value =~ /.*ERROR.*/ and
time > "2014-03-01" and time < "2014-03-03"
Where against Regex (tag)
select value from some_log_series
where host =~ /.*asdf.*/ and
time > "2014-03-01" and time < “2014-03-03"
group by host
Functions
min
max
percentile
first
last
stddev
mean
count
sum
median
distinct
count(distinct)
more soon: difference, histogram, moving_average
Continuous queries
CREATE CONTINUOUS QUERY "10m_event_count"
ON mydb
BEGIN
SELECT count(value)
INTO "6_months".events
FROM events
GROUP BY time(10m)
END;
Other tools
Telegraf
data collection
Chronograf
Grafana
More coming
• Compression
• Clustering
• Custom functions
Thank you!
Paul Dix
@pauldix
paul@influxdb.com

Time Series Data with InfluxDB