InfluxDB Internals
Platform Engineering Team

@ryanbetts / ryan@influxdata.com
How great are databases?
• I like making things with
smart, clever, kind
people.

• I’ve been working on
high-throughput, realtime
data for the last 10 years.
• What’s so special about time series

• Time series database designs

• InfluxDB internals
RDBMS NoSQL TSDB
Correctness ACID BASE BASE
Schema DDL
DDL /
documents
on-write
Writing data DML POST/PUT line protocol
Reading data SQL GET + filter
filter, window,
group, join
TSDB unique combination
• Ingest: thousands to millions of points per second

• Store: fast accumulating, append-mostly data, lots of
repetition, often with time-to-live

• Query: analytic queries with fast filtering, windowing

• Scale: availability, storage, query
Facebook Gorilla
• TTL eviction

• Columnar compression

• Write availability > query
correctness

• Metric-based schema

• Separate query processing
from access-path
Druid
• Roll-up at ingest

• Columnar storage &
time-based segments

• Indexes on dimension for
fast filtering

• Separation of real time
and historical data nodes
Bullet Journals
• Fast event
recording

• Ordered by time

• Indexed by
dimensions

• Weekly / Monthly
roll-up
InfluxDB
1.Write Path

2.Storage

3.Query Path

4.Clustering
InfluxDB: Adding data (1)
POST ’http://localhost:8086/write?db=mydb' --data-
binary 'cpu_load_short,host=server01,region=us-west
value=0.64 1434055562000000000’
InfluxDB: Adding data (2)
fsync( ) batch to
WAL
Add to in-
memory cache
Snapshot
cache to TSM
Add to index
InfluxDB: on-disk (filesystem)
CREATE RETENTION POLICY <retention_policy_name> ON
<database_name> DURATION <duration> REPLICATION <n> [SHARD
DURATION <duration>] [DEFAULT]
Database directory /db
Retention Policy directory /db/rp
Shard Group (time bounded) (Logical)
Shard directory (db/rp/Id#)
TSM0001.tsm (data file)
TSM0002.tsm (data file)
TSM
Blocks
Block
TSM Index
InfluxDB: Adding data (DB)
fsync( ) batch to
WAL
Add to in-
memory cache
Snapshot to
TSM
Add to index
InfluxDB: Adding data (index)
• Measurement name -> field keys

• Measurement name -> series

• Measurement name -> tag keys -> tag value -> series

• Series -> shards

• (Also sketches of series and measurements for fast
cardinality estimation)
InfluxDB: TSI
• Roaring-bitmaps to short-
cut series creation on insert

• Iterators for index
mappings

• Index is per-shard; series id
file is per-database

• Partitioned for lock-splitting
TSI
TSI
InfluxDB: InfluxQL Queries
1. Parses time range and expressions for filtering data
2. Look-up shards to access using the list of
measurements and the time frame
3. Create the iterators for each shard
4. Merge the shard iterator outputs
select user, system from cpu
where time > now() - 1h and host = 'serverA
InfluxQL: Query with IFQL
1. Stand-alone `ifqld` coordinator nodes

2. Streaming storage iterators that support rate-limits

3. Separation of query planning and query distribution

4. Extensible, functional language

5. Unification of InfluxQL and TICKScript
A brief sidebar on
append-mostly
databases
No one tells you about:
* Wrong data

* Old (back-filled) data
InfluxDB Clustering
• Strongly consistent meta-cluster (based on RAFT)

• User configured replication factor

• Replication and shard aware query planner

• Hinted-Handoff queues on each data node

• (WIP) Anti-entropy consistency repair
Conclusions
• Time series data has unique storage and query
requirements that impact database design.

• Evolution of InfluxDB:

1. TSI: remove the in-memory size limit on cardinality
2. IFQL: faster feature velocity; safer execution.

3. Anti-entropy repair: easier, more robust scale-out.

InfluxDB Internals

  • 1.
    InfluxDB Internals Platform EngineeringTeam
 @ryanbetts / ryan@influxdata.com
  • 2.
    How great aredatabases? • I like making things with smart, clever, kind people. • I’ve been working on high-throughput, realtime data for the last 10 years.
  • 3.
    • What’s sospecial about time series • Time series database designs • InfluxDB internals
  • 5.
    RDBMS NoSQL TSDB CorrectnessACID BASE BASE Schema DDL DDL / documents on-write Writing data DML POST/PUT line protocol Reading data SQL GET + filter filter, window, group, join
  • 6.
    TSDB unique combination •Ingest: thousands to millions of points per second • Store: fast accumulating, append-mostly data, lots of repetition, often with time-to-live • Query: analytic queries with fast filtering, windowing • Scale: availability, storage, query
  • 7.
    Facebook Gorilla • TTLeviction • Columnar compression • Write availability > query correctness • Metric-based schema • Separate query processing from access-path
  • 8.
    Druid • Roll-up atingest • Columnar storage & time-based segments • Indexes on dimension for fast filtering • Separation of real time and historical data nodes
  • 9.
    Bullet Journals • Fastevent recording • Ordered by time • Indexed by dimensions • Weekly / Monthly roll-up
  • 10.
  • 11.
    InfluxDB: Adding data(1) POST ’http://localhost:8086/write?db=mydb' --data- binary 'cpu_load_short,host=server01,region=us-west value=0.64 1434055562000000000’
  • 12.
    InfluxDB: Adding data(2) fsync( ) batch to WAL Add to in- memory cache Snapshot cache to TSM Add to index
  • 13.
    InfluxDB: on-disk (filesystem) CREATERETENTION POLICY <retention_policy_name> ON <database_name> DURATION <duration> REPLICATION <n> [SHARD DURATION <duration>] [DEFAULT] Database directory /db Retention Policy directory /db/rp Shard Group (time bounded) (Logical) Shard directory (db/rp/Id#) TSM0001.tsm (data file) TSM0002.tsm (data file)
  • 14.
  • 15.
    InfluxDB: Adding data(DB) fsync( ) batch to WAL Add to in- memory cache Snapshot to TSM Add to index
  • 16.
    InfluxDB: Adding data(index) • Measurement name -> field keys • Measurement name -> series • Measurement name -> tag keys -> tag value -> series • Series -> shards • (Also sketches of series and measurements for fast cardinality estimation)
  • 17.
    InfluxDB: TSI • Roaring-bitmapsto short- cut series creation on insert • Iterators for index mappings • Index is per-shard; series id file is per-database • Partitioned for lock-splitting
  • 18.
  • 19.
    InfluxDB: InfluxQL Queries 1.Parses time range and expressions for filtering data 2. Look-up shards to access using the list of measurements and the time frame 3. Create the iterators for each shard 4. Merge the shard iterator outputs select user, system from cpu where time > now() - 1h and host = 'serverA
  • 20.
    InfluxQL: Query withIFQL 1. Stand-alone `ifqld` coordinator nodes 2. Streaming storage iterators that support rate-limits 3. Separation of query planning and query distribution 4. Extensible, functional language 5. Unification of InfluxQL and TICKScript
  • 21.
    A brief sidebaron append-mostly databases No one tells you about: * Wrong data * Old (back-filled) data
  • 22.
    InfluxDB Clustering • Stronglyconsistent meta-cluster (based on RAFT) • User configured replication factor • Replication and shard aware query planner • Hinted-Handoff queues on each data node • (WIP) Anti-entropy consistency repair
  • 23.
    Conclusions • Time seriesdata has unique storage and query requirements that impact database design.
 • Evolution of InfluxDB: 1. TSI: remove the in-memory size limit on cardinality 2. IFQL: faster feature velocity; safer execution. 3. Anti-entropy repair: easier, more robust scale-out.