Hear from InfluxData’s Field CTO as he provides an overview of the InfluxDB Time Series Engine. Learn more about the time series merge tree (TSM) and how it relates to the API, Flux, and Tasks. In the session, he will also provide a sneak peek into the future of the InfluxDB Time Series Engine.
3. Hear from InfluxData’s Field CTO as he provides an
overview of the InfluxDB Time Series Engine. Learn more
about the time series merge tree (TSM) and how it
relates to the API, Flux, and Tasks. In the session, he will
also provide a sneak peek into the future of the InfluxDB
Time Series Engine.
Dean Sheehan
EMEA Field CTO, InfluxData
As Field CTO, Dean is responsible for ensuring the successful
communication and deployment of InfluxData’s solutions throughout the
world. Based in the UK, Dean is also leading the expansion of
InfluxData’s business throughout Europe. He has more than 25 years of
experience in the technology industry covering consulting, product
development, product management and solution deployment throughout
the retail, financial and telecom industries—with significant expertise in
distributed systems, transactional systems and data center automation.
Dean has a Bachelor’s Degree in Computer Science, and an MBA from
Cambridge University.
InfluxDB Time Series Engine
Overview
4. Agenda
1. Intro to the InfluxDB Time Series Engine
2. TSM and the API
3. TSM and Flux
4. TSM and Tasks
5. The future of the InfluxDB Time Series Engine
5. Intro to the InfluxDB Time Series Engine
• At the core of the InfluxDB time series database is the Time
Structured Merge Tree (TSM) storage engine (& format)
• Purpose-built for storing time series data
• Battle tested over many years by a large community of users in a
multitude of scenarios
• Designed to continuously ingest large volumes of new data points
whilst also running real-time queries
6. Performance
Cardinality 1.3.9 (inmem) 1.5.0 (inmem) 1.5.0 (tsi1)
1M 140K s/sec 140K s/sec 188K s/sec
2M 134K s/sec 138K s/sec 186K s/sec
4M 119K s/sec 130K s/sec 164K s/sec
8M 103K s/sec 108K s/sec 127K s/sec
16M 88K s/sec 88K s/sec 96K s/sec
Series Creation Performance on m4.2xlarge
Series Creation Performance on Threadripper
Cardinality 1.3.9 (inmem) 1.5.0 (inmem) 1.5.0 (tsi1)
1M 166K s/sec 195K s/sec 406K s/sec
2M 152K s/sec 179K s/sec 385K s/sec
4M 138K s/sec 162K s/sec 326K s/sec
8M 133K s/sec 144K s/sec 278K s/sec
16M 103K s/sec 136K s/sec 213K s/sec
• Seriously fast ingest
• It has improved over time
• It has improved with TSI1
over INMEM
• Not sensitive to how much
data
• TB/PB don’t really care
• is sensitive to how many
unique series (cardinality) are
being recorded
• but wait…
7. Time Structured Merge-Tree (TSM)
• Draws on Log Structured Merge-Tree structure and algorithms
• Organized around time, series and fields (columnar format)
• Data blocks go through compaction and compression stages as
they become colder (less likely to be written to)
• Different compression algorithms for different column types (and
dynamic)
• Columnar format is very compression friendly
• Proprietary binary format
8. The API and TSM: One API to rule them all
• InfluxDB API exposes the data in the storage layer to users
• ingest (line protocol), query (InfluxQL & Flux), process (Flux Tasks)
• The API is consistent between InfluxDB OSS, Enterprise self-managed
clusters and our Cloud Service
• Move between them as needed
• We have users on our cloud service that then need to support an air-gapped
customer
• We have customers that as building new ventures on single board computers and
have visions of aggregating in a central location
• Move or blend, even synchronise, as needed according to your changing needs.
10. Flux and the TSM engine
• Flux is functional language (i.e. can do queries, analytical
transformations and perform actions e.g. http.post())
• Powerful, flexible, easy to write, easy to read…
• ‘Pushdowns’ push computational work down towards the
storage speeding up queries
• Flux isn’t just for querying the data held in TSM. Flux allows you
to codify background process that can do all manner of things
11. Tasks and the TSM: Your Heavy Lifting Friend
• Perform transformations &
operations on raw data
(Downsampling &
Precomputing)
• Monitor and look for
conditions to trigger actions
• Headless, automated &
scheduled
Raw
Data
Transformed or
Downsampled Data
or
Actionable insights
12. The Next Generation InfluxDB Time Series Engine
• Openness : persist using Apache Parquet files in Object Storage
• Speed: in-memory columnar using Apache Arrow
• Access: polyglot language support
• Native SQL - enabling the installed ecosystem and tools
• Flux - flexibility & extensibility (Flux can do more than query).
• InfluxQL - allowing existing workloads to move forward and benefit
• Scale: unlock ludicrous cardinality