ARCHITECTING INFLUXENTERPRISE FOR SUCCESS

Agenda: New Practitioners Track
WORKSHOPAGENDA
8:00 AM – 9:00 AM Breakfast
9:00 AM – 10:00 AM Installing the TICK Stack and Your First Query Noah Crowley
10:00 AM – 10:50 AM Chronograf and Dashboarding David Simmons
10:50 AM – 11:20 AM Break
11:20 AM – 12:10 PM Writing Queries (InfluxQL and TICK) Noah Crowley
12:10 PM – 1:10 PM Lunch
1:10 PM – 2:00 PM Architecting InfluxEnterprise for Success Dean Sheehan
2:00 PM – 2:10 PM Break
2:10 PM – 3:10 PM Optimizing the TICK Stack Dean Sheehan
3:10 PM – 4:00 PM Downsampling Data Michael DeSa
4:00 PM Happy Hour

Dean Sheehan
Senior Director, Pre and
Post Sales
Architecting InfluxEnterprise
for Success
• Anatomy of an Influx Enterprise Cluster
• High Availability
• Horizonal Scalability
• Replication & Sharding Considerations
• Deployment Topologies
• Some Latin!

InfluxEnterprise Cluster Architecture
Chronograf
Data Node 1
Data Node 2
Data Node 3
Data Node n
Telegraf
Telegraf
Telegraf
Meta Node Quorum
Meta Node
1
Meta Node
2
Meta Node
3
Administrative
Functions
Load
Balancer
Writes
Queries

InfluxEnterprise Cluster Architecture – Meta Nodes
• Keep state consistent across the cluster
• Users
• Databases
• Continuous Queries
• Retention Policies
• Shard Metadata
• Cluster Membership – Data Nodes
• Multiple (odd #) meta nodes for High
Availability
• Uses Raft consensus algorithm
• Quorum needs an odd number of nodes
• (n/2)+1 nodes available for consensus

InfluxEnterprise Cluster Architecture – Data Nodes
• Data nodes hold all the raw time series
data and metadata including:
• Measurements
• Tag keys and values
• Field keys and values
• Do not participate in consensus

InfluxEnterprise Clustering
Meta Nodes
• CPU: 1-2 cores
• RAM: 512 MB – 1 GB
• Disk: 1 HDD of any size
• Can be run in a VM or container
Data Nodes
• CPU: 8+ cores
• RAM: 64+ GB
• Disk: SSD drives for WAL, HH
and Data
> 1000 IOPS recommended
• Network: 10 Gbps NIC

OS & Hosts
• No Specific OS Tuning Required
• Bare metal, Virtual Machines or Containers
• Consider 70% utilization a Biz As Usual upper limit. Need head
room for
• Peak periods
• Compaction process
• Node failure and re-entry

High Availability
• Replication factor set on database retention policy
– RF = 1 is possible, not common
– RF = 2 or 3 are typical
– RF > 3 is possible and can be beneficial
• Writes recorded in <RF> number of nodes
– Writer chooses consistency for write acknowledgement
• Any, One, Quorum, All

Hinted Handoff Queue
• Records replications that couldn’t happen immediately
– Node down or unreachable
• See how long it takes to fill the hinted handoff queue when a
node goes down
• That is how long an outage you can handle without losing data
• How long do you want to sleep before you get paged?
• If not long enough increase HH Queue size

Horizonal Scalability
• If RF < Node # there are resources available for partitioning
• Data is partitioned into Shards
– Shard count is derived from RF and Node #
• Almost Node#/RF number of shards – but not quite
– Hash(Measurement,Tags)->Shard
– Shard is replicated across RF nodes

Shard
count’s
aren’t
obvious

You
decide
what’s important!
• Replicate enough to meet you availability
requirements
• Over-replicate to improve query performance
– Expends resources on processing writes to assist,
if not favor, reads

Deployment
Considerations
• How can I deploy…

Multi DC Replication #1
Data Center 1
Firewall/
Load Balancer
Data Node 1
Data Node 2
Telegraf
Data Center 2
Firewall/
Load Balancer
Data Node 1
Data Node 2
Telegraf

Data Center 1
Firewall/
Load Balancer
Data Node 1
Data Node 2
Telegraf
Message Bus
Data Center 2
Firewall/
Load Balancer
Data Node 1
Data Node 2
Telegraf
Telegraf

Data Center 1
Load Balancer
Data Center 2
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Kapacitor
Load Balancer
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Data Nodes
Kapacitor

Data Center 1
KapacitorTelegraf
Telegraf
Telegraf InfluxDB
OSS
Data Center 2
KapacitorTelegraf
Telegraf
Telegraf InfluxDB
OSS
Data Center n
KapacitorTelegraf
Telegraf
Telegraf InfluxDB
OSS
Influx Enterprise
Data Node 1
Data Node 2
Data Node 3
Data Node n
Meta Node Quorum
Meta
Node 1
Meta
Node 2
Meta
Node 3
Firewall/
Load Balancer
Hierarchical Relationship
InfluxDB
Enterprise

Data Center 1
Firewall/
Load Balancer
Data Node 1
Data Node 2
Telegraf
Message Bus
Application Domain Level Sharding
Data Center 2
Firewall/
Load Balancer
Data Node 1
Data Node 2
Telegraf
{A,B}
{A} {B}

Some
Latin
• Quis custodiet ipsos custodes?

Dogfooding!
Data Node 1
Telegraf 1
Data Node N
Telegraf N
Influx Enterprise
Influx DB
Influx OSS
Chronograf

ARCHITECTING INFLUXENTERPRISE FOR SUCCESS

More Related Content

What's hot

Similar to ARCHITECTING INFLUXENTERPRISE FOR SUCCESS

More from InfluxData

Recently uploaded

ARCHITECTING INFLUXENTERPRISE FOR SUCCESS

Editor's Notes