Influxdb and time series data

InﬂuxDB
The time series database
Modern Factory #workshops 
Marcin Szepczyński, July 2016

What is time series data?
• A time series data is a sequence of data points
made from the same source over the time interval.
• If you have a time series data and plot it, one of
your axes will be always a time.

What is not a time
series data?

Regular vs irregular  
time series

Time series data is good for
• Internet of Things (e.g. sensors data)
• Alerting
• Monitoring
• Real Time Analytics

InﬂuxDB is I in TICK stack
• Telegraf - time data collector
• InﬂuxDB - time series database
• Chronograf - time series data visualization
• Kapacitor - time series data processing and
alerting

InﬂuxDB features
• SQL-like query language
• Schemaless
• Case sensitive
• Data types: string, ﬂoat64, int64, boolean

Measurement
• Measurement (or Point) is a single record (row) in
InﬂuxDB data store
• Each measurement has time (as primary key), tags
(indexed columns) and ﬁelds (not indexed
columns)

Inserting
INSERT table_name,tag1=value1,tag2=value2 temp=30.5,value=1.5
measurement name  
(„table”)

Inserting
(„table”)
Comma is a separator between measurement and tags 
Comma is a separator between each tag and each ﬁeld

Inserting
(„table”)
 
 
Space is a separator between tags and ﬁelds

Inserting
(„table”)
tags
Tags
tag1 tag2
value1 value2

Inserting
(„table”)
ﬁelds
Fields
temp value
30.5 1.5

Inserting
(„table”)
Comma is a separator between measurement and tags 
Comma is a separator between each tag and each field 
Space is a separator between tags and fields
tags
fields

Querying
• Show databases: 
> SHOW DATABASES
• Select database: 
> USE workshop
• Show measurements („tables”) 
> SHOW MEASUREMENTS
• Simple select all 
> SELECT * FROM measurement_name

Querying (2)
• Select with limit: 
> SELECT * FROM measure LIMIT 10
• Select with offset: 
> SELECT * FROM measure OFFSET 10
• Select where clause: 
> SELECT * FROM measure WHERE tag1 = ’value1’
• Select with order clause: 
> SELECT * FROM measure ORDER BY cpu DESC

Querying (3)
• Operators: 
= equal to 
<>, != not equal to 
> greater than 
< less than 
=~ matches against (REGEX) 
!~ doesn’t matches against (REGEX)

Aggregations - COUNT()
Returns the number of non-null values. 
 
 
> SELECT count(<field>) FROM measure 
 
> SELECT count(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Aggregations - MEAN()
Returns the mean (average) value of a single ﬁeld
(calculates only for non-null values). 
 
 
> SELECT mean(<field>) FROM measure 
 
> SELECT mean(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Aggregations - MEDIAN()
Returns the middle value from the sorted values in
single ﬁeld (Its similar to PERCENTILE(ﬁeld, 50). 
 
 
> SELECT median(<field>) FROM measure 
 
> SELECT median(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Aggregations - SPREAD()
Returns the difference between minimum and
maximum value of the ﬁeld. 
 
 
> SELECT spread(<field>) FROM measure 
 
> SELECT spread(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Aggregations - SUM()
Returns the sum of all values in a single ﬁeld. 
 
 
> SELECT sum(<field>) FROM measure 
 
> SELECT sum(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Selectors - BOTTOM(N)
Returns the smaller N values in a single ﬁeld. 
 
 
> SELECT bottom(<field>, <N>) FROM measure 
 
> SELECT bottom(cpu, 5) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Selectors - FIRST()
Returns the oldest values of a single ﬁeld. 
 
 
> SELECT first(<field>) FROM measure 
 
> SELECT first(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Selectors - LAST()
Returns the newest values of a single ﬁeld. 
 
 
> SELECT last(<field>) FROM measure 
 
> SELECT last(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Selectors - MAX()
Returns the highest value in a single ﬁeld. 
 
 
> SELECT max(<field>) FROM measure 
 
> SELECT max(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Selectors - MIN()
Returns the lowest value in a single ﬁeld. 
 
 
> SELECT min(<field>) FROM measure 
 
> SELECT min(cpu) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Selectors - PERCENTILE(N)
Returns the N-percentile value for sorted values of a
single ﬁeld. 
 
 
> SELECT percentile(<field>, <N>) FROM measure 
 
> SELECT percentile(cpu, 95) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

Selectors - TOP(N)
Returns the largest N values in a single ﬁeld. 
 
 
> SELECT top(<field>, <N>) FROM measure 
 
> SELECT top(cpu, 5) FROM cpu_temp  
WHERE time > '2016-07-04'  
AND time < '2016-07-05'  
GROUP BY time(1h)

GROUP BY clause
InﬂuxDB supports GROUP BY clause with tag values,
time intervals, tag values and time intervals and
GROUP BY with ﬁll().

Downsampling
InﬂuxDB can handle hundreds of thousands of data
points per second. Working with that much data over
a long period of time can create storage concerns. A
natural solution is to downsample the data; keep the
high precision raw data for only a limited time, and
store the lower precision, summarized data for much
longer or forever.

Data retention
A retention policy is the part of InﬂuxDB’s data
structure that describes for how long InﬂuxDB keeps
data and how many copies of those data are stored
in the cluster. A database can have several RPs and
RPs are unique per database.

More
https://influxdata.com/videos/
https://docs.influxdata.com/influxdb

Influxdb and time series data

More Related Content

What's hot

Viewers also liked

Similar to Influxdb and time series data

Recently uploaded

Influxdb and time series data