SlideShare a Scribd company logo
1 of 67
Download to read offline
Paul Dix
InfluxData – CTO & co-founder
paul@influxdata.com
@pauldix
InfluxDB IOx - a new columnar
time series database (update)
Progress
• New Team Members!
• Read Buffer progress
• Mutable Buffer & Read Buffer connections
• Arrow Flight API
• Replication, multiple IOx servers doc
API Decisions
• Management API will be gRPC
– CLI for common tasks
• Write
– InfluxDB 2.0 Line Protocol
– JSON objects (events!)
– Protobuf?
• Query
– HTTP (csv, json, display)
– Arrow Flight
– Postgres?
What’s Next?
• Management API
• Parquet Persistence to Object Store
• Recovery from Object Store
• Replication
• Subscriptions
• Official Builds & Documentation (now late March)
Edd Robinson
Engineer @ InfluxData
edd@influxdata.com
@e-dard 🐙
@eddrobinson 🐦
An Intro to the InfluxDB IOx
Read Buffer: a read-optimised
in-memory execution engine
Me
● Software engineer at InfluxData.
● Worked on InfluxDB for ~4y: storage engine, write path, indexing.
Working on IOx (and with Rust!) for just over a year.
What are we working towards?
● Unlimited Data:
○ Object Storage, compression
● Unlimited Cardinality:
○ Data organisation, no large
indexes.
● 🚀 Analytical Queries:
○ in-memory, columnar
data-layout, lots of fanciness
This talk is about...
A sub-system in IOx called the Read Buffer, a new query execution engine.
● Work on data held in-memory and on-heap. No IO at read-time
● Data is immutable.
● Lots of wholesome column-store goodness:
○ 📊
○ 🗜
○ ⇶
○ ❓
○ ❓
Wider Goals
We want to have excellent support for different time-series
use-cases
● Events
● Observability trifecta (logging, tracing, metrics)
● Large analytical workloads
We already have a time-series database?
Quick Refresher
●
●
●
●
InfluxDB Happy Place
~67GB
InfluxDB Sad 🐼
~77 MB .
👎
So...
●
● mmap
●
●
●
●
● mmap -
●
IOx Bets
Why columnar is the way to go
● Analytical workloads usually only need
projections of dataset.
● Increase flexibility in data organisation.
● Improve data relevance.
● Reduce footprint through compression.
● Mechanical sympathy - CPUs love arrays.
Forrest Smith - blog
Why columnar is the way to go
Memory Bandwidth: benchmark
● This example is synthetic (but indicative!)
● Data throughput from memory to CPU has an
impact on performance.
● CPU cache is significantly faster than main memory
Why columnar is the way to go
L1 Cache
L2/L3 Cache
Main Memory
Memory Bandwidth: benchmark
● This example is synthetic (but indicative)!
● Data throughput from memory to CPU has an
impact on performance.
● CPU cache is significantly faster than main memory
If you want to make the most use of your memory
bandwidth:
● process less data.
● process more relevant data.
Columnar representations help with both of these
🤿 Dive into the Read Buffer
● Data organisation;
● Data representation;
● Read execution (late materialisation);
● Early numbers!
● Future improvements.
● WAL: replication and recovery
● Mutable Buffer: query written data
● Object Store: for durability
● Read Buffer: optised read-only view
of written data.
IOx Write Path
IOx Read Path
Query Engine
SQL Frontend
Flux Frontend
InfluxQL Frontend
Mutable Buffer
Read Buffer
Object Storage
Reader
IOx Read Path
Query Engine
SQL Frontend
Flux Frontend
… Frontend
Mutable Buffer
Read Buffer
Object Storage
Reader
Data Model
Data organised by database
Data Model
Databases are collections of
partitions
Partition Key
Chunk ID
Data Model
Partitions contain chunks
Table name
Data Model
Chunks contain Tables
Data Model
Tables contain Row Groups
Same Schema
Filter entire tables
Data Model
Row Groups contain columnar data
Skip Row Group
Data Model
(thanks @alamb)
weather,location=us-east temperature=82,humidity=67 1465839830100400200
weather,location=us-midwest temperature=82,humidity=65 1465839830100400200
weather,location=us-west temperature=70,humidity=54 1465839830100400200
weather,location=us-east temperature=83,humidity=69 1465839830200400200
weather,location=us-midwest temperature=87,humidity=78 1465839830200400200
weather,location=us-west temperature=72,humidity=56 1465839830200400200
weather,location=us-east temperature=84,humidity=67 1465839830300400200
weather,location=us-midwest temperature=90,humidity=82 1465839830400400200
weather,location=us-west temperature=71,humidity=57 1465839830400400200
location
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
temperature
82
82
70
83
87
72
84
90
71
humidity
67
65
54
69
78
56
67
82
57
timestamp
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
Row Group in Table: weather
Supported Data Types
Logical Data Types
● String (utf-8 valid strings)
● Float (double-precision float)
(all of them 😉)
● Integer (signed integers)
● Unsigned (unsigned integers)
● Boolean
● Binary (arbitrary bytes)
Semantic Column Types
● InfluxDB Tag ➟ String
● InfluxDB Field ➟ Most
● InfluxDB Timestamp ➟ I64
● IOx Column ➟ Anything
Tailored for time-series:
● scans, grouped aggregates, windowed aggregates, schema
exploration (tables, columns, values).
● Table/row group pruning.
● Predicate pushdown.
● Comparator operators with constant on tag columns
(<, <=, >, >=, =, !=}
● Aggregates any column(s)
Interesting Supported Features
Storing Data in the Read Buffer
➡
Columnar Compression Spectrum
Lots ‘o Compression
💯 Smaller Footprint
👎 High processing cost
No Compression
👎 Larger footprint
💯 ~Zero processing cost
Columnar Compression Spectrum
Lots ‘o Compression
Smaller Footprint
High processing cost
No Compression
Larger footprint
~Zero processing cost
Vec<T>
Choice can depend on data location
And Medium $$$
Petabytes
$0.03/GB
Gigabytes
$10/GB??
Terabytes
$0.1/GB
Read Buffer Compression Schemes
Dictionary Encoding
● Good for high cardinality tag
columns.
● Column order not factor in
compression.
● Constant time access. 🚀
● Key: Operate directly on
compressed data. 🚀
Read Buffer Compression Schemes
Filtering Dictionary Encoding
WHERE “region” = ‘east’
x = 0
{0, 2, 7, 15}
WHERE “region” > ‘north’
x > 1
{1, 3, 5, 8, 9, 10,
11, 12, 14}
“RLE” - Run-Length Encoding
● Incredible compression when lots
of “runs”.
● Works best on heavily sorted
columns.
● Not as consumable*
● Pre-computed bitsets 🚀
● Can operate on compressed
data. 🚀
Read Buffer Compression Schemes
Read Buffer Compression Schemes
“RLE” - Run-Length Encoding
WHERE “region” = ‘east’
x = 0
WHERE “region” > ‘north’
x > 1
{9, 10, 11, 12, 13,
14, 15}
Which Dictionary Encoding?
WHERE “region” = ‘east’
● 10M rows in column
● Cardinality 10,000
● Single thread
Billions rows/second processed
Which Dictionary Encoding?
WHERE “region” = ‘east’
● 10M rows in column.
● Cardinality 10,000.
● Single thread.
● SIMD intrinsics on Dictionary Encoding.
● RLE is on another level: “cheating”...
Billions rows/second processed
RLE
59ms 2.2ms 420ns
380MB ~40MB ~40MB
Which Dictionary Encoding?
WHERE “span_id” = ‘123djk7GHs99wj’
● 10 million rows in column.
● Cardinality 10 million.
● Single thread.
● SIMD intrinsics on Dictionary Encoding.
Billions rows/second processed
RLE
60ms 2.2ms
380MB ~420MB
580ns
~1GB
Which Dictionary Encoding?
“I need rows [2, 33, 55, 111, 3343]”
10,000,000 row column
Encoding Cardinality 10K
(materialise 1000 rows near end)
Cardinality 10M
(materialise 1 row near end)
Vec<String>
Dictionary μ
RLE μ
Which Dictionary Encoding?
●
● filtering
●
materialisation
Numerical Column Encodings
Supported Logical types: i64, u64, f64
{u8, i8,.., u64, i64}*
&[i64]: (48 B) [123, 198, 1, 33, 133, 224] ➠ &[u8]: (6 B) [..]
&[i64]: (48 B) [-18, 2, 0, 220, 2, 26] ➠ &[i16]: (12 B) [..]
Numerical Column Encodings
●
●
●
●
Read Execution
SELECT “host”, “counter”, “time”
FROM “cpu”
WHERE “env” = ‘prod’ AND
“path” = ‘/write’ AND
“counter” > 200 AND
“time” >= x AND “time” < y;
●
●
●
●
Late Materialisation - Scanning
SELECT “host”, “counter”, “time” FROM “cpu” WHERE “env” = ‘prod’ AND “path” = ‘/write’ AND “counter” > 200 AND “time” >= x AND “time” < y;
Late Materialisation - Grouping
SELECT SUM(“counter”) FROM “cpu” WHERE “path” = ‘/query’ AND “time” >= x AND “time” < y GROUP BY “region”;
♥
Let’s look at some initial numbers
●
●
span_id
●
●
●
Synthetic High Cardinality Tracing use-case
Column Name Cardinality Encoding
How much space do we need?
●
●
●
How much space do we need?
●
●
●
1 M 1 ms 1.2 ms
10 M 1.1 ms 2.5 ms
60 M 1.3 ms 15.7 ms
SELECT * FROM “traces” WHERE “trace_id” = ‘H7whivfl’;
●
● 🤔
● 💪
●
“Needle in a Haystack”
SELECT SUM(duration) FROM “traces” GROUP BY “trace_id”;
●
●
●
Aggregating over high-cardinality
1 M 30 s
(~10 GB RAM)
45 ms
(8 MB)
10 M 18 min
(140 GB RAM)
498 ms
(150 MB)
60 M D.N.F
(OOM)
4.3 s
(900MB)
SHOW TAG KEYS WHERE “cluster” = ‘cluster-2-2-3’
AND time >= x AND time < y ;
Schema Exploration
1 M 15 ms 12 μs
10 M 150 ms 47 μs
60 M 1.6 s 120 μs
Future Work
Lots more to do in Read Buffer land!
● Data-type support.
● More supported predicate, e.g., regex, LIKE, OR.
● More columnar encodings (e.g., time-series specific field encodings)
● Deletes support! (Proposal written up)
● Complete implementation of all physical operations.
● Performance - predicate caching, buffer pooling etc.
● Concurrent execution.
Thank You
Paul Dix
InfluxData – CTO & co-founder
paul@influxdata.com
@pauldix
InfluxDB IOx - a new columnar
time series database (update)
Progress
• New Team Members!
• Read Buffer progress
• Mutable Buffer & Read Buffer connections
• Arrow Flight API
• Replication, multiple IOx servers doc
API Decisions
• Management API will be gRPC
– CLI for common tasks
• Write
– InfluxDB 2.0 Line Protocol
– JSON objects (events!)
– Protobuf?
• Query
– HTTP (csv, json, display)
– Arrow Flight
– Postgres?
What’s Next?
• Management API
• Parquet Persistence to Object Store
• Recovery from Object Store
• Replication
• Subscriptions
• Official Builds & Documentation (now late March)
Paul Dix
InfluxData – CTO & co-founder
paul@influxdata.com
@pauldix
InfluxDB IOx - a new columnar
time series database (update)
Progress
• New Team Members!
• Read Buffer progress
• Mutable Buffer & Read Buffer connections
• Arrow Flight API
• Replication, multiple IOx servers doc
API Decisions
• Management API will be gRPC
– CLI for common tasks
• Write
– InfluxDB 2.0 Line Protocol
– JSON objects (events!)
– Protobuf?
• Query
– HTTP (csv, json, display)
– Arrow Flight
– Postgres?
What’s Next?
• Management API
• Parquet Persistence to Object Store
• Recovery from Object Store
• Replication
• Subscriptions
• Official Builds & Documentation (now late March)

More Related Content

What's hot

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?lucenerevolution
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guideRyan Blue
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...DataWorks Summit/Hadoop Summit
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangDatabricks
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDatabricks
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesDatabricks
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesObservability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesInfluxData
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsAlluxio, Inc.
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsDatabricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMydbops
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoAlluxio, Inc.
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Databricks
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...ScyllaDB
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeFlink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 

What's hot (20)

Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 
What is in a Lucene index?
What is in a Lucene index?What is in a Lucene index?
What is in a Lucene index?
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...How to understand and analyze Apache Hive query execution plan for performanc...
How to understand and analyze Apache Hive query execution plan for performanc...
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
Deep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache SparkDeep Dive: Memory Management in Apache Spark
Deep Dive: Memory Management in Apache Spark
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System TablesObservability of InfluxDB IOx: Tracing, Metrics and System Tables
Observability of InfluxDB IOx: Tracing, Metrics and System Tables
 
Iceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data AnalyticsIceberg + Alluxio for Fast Data Analytics
Iceberg + Alluxio for Fast Data Analytics
 
Understanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIsUnderstanding Query Plans and Spark UIs
Understanding Query Plans and Spark UIs
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
MongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To TransactionsMongoDB WiredTiger Internals: Journey To Transactions
MongoDB WiredTiger Internals: Journey To Transactions
 
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3GoHigh Performance Data Lake with Apache Hudi and Alluxio at T3Go
High Performance Data Lake with Apache Hudi and Alluxio at T3Go
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
How We Reduced Performance Tuning Time by Orders of Magnitude with Database O...
 
Building Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta LakeBuilding Reliable Lakehouses with Apache Flink and Delta Lake
Building Reliable Lakehouses with Apache Flink and Delta Lake
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 

Similar to InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optimized In-Memory Query Execution Engine

MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...Lucidworks
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...MongoDB
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsServer Density
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton insertsChris Adkin
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesIvan Kruglov
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016Pierre Mavro
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonJAXLondon2014
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...Glenn K. Lockwood
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceGlenn K. Lockwood
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Jeff Hung
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBScott Mansfield
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephRongze Zhu
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015Chris Fregly
 
London devops logging
London devops loggingLondon devops logging
London devops loggingTomas Doran
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course PROIDEA
 

Similar to InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optimized In-Memory Query Execution Engine (20)

MongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: ShardingMongoDB for Time Series Data: Sharding
MongoDB for Time Series Data: Sharding
 
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
High Performance Solr and JVM Tuning Strategies used for MapQuest’s Search Ah...
 
MongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: ShardingMongoDB for Time Series Data Part 3: Sharding
MongoDB for Time Series Data Part 3: Sharding
 
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
Ensuring High Availability for Real-time Analytics featuring Boxed Ice / Serv...
 
MongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & AnalyticsMongoDB: Optimising for Performance, Scale & Analytics
MongoDB: Optimising for Performance, Scale & Analytics
 
Super scaling singleton inserts
Super scaling singleton insertsSuper scaling singleton inserts
Super scaling singleton inserts
 
Bringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searchesBringing code to the data: from MySQL to RocksDB for high volume searches
Bringing code to the data: from MySQL to RocksDB for high volume searches
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
Couchbase live 2016
Couchbase live 2016Couchbase live 2016
Couchbase live 2016
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
The Proto-Burst Buffer: Experience with the flash-based file system on SDSC's...
 
Understanding and Measuring I/O Performance
Understanding and Measuring I/O PerformanceUnderstanding and Measuring I/O Performance
Understanding and Measuring I/O Performance
 
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
Cloud Computing in the Cloud (Hadoop.tw Meetup @ 2015/11/23)
 
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDBEVCache: Lowering Costs for a Low Latency Cache with RocksDB
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on CephBuild an High-Performance and High-Durable Block Storage Service Based on Ceph
Build an High-Performance and High-Durable Block Storage Service Based on Ceph
 
London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
London devops logging
London devops loggingLondon devops logging
London devops logging
 
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
JDD 2016 - Tomasz Borek - DB for next project? Why, Postgres, of course
 

More from InfluxData

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB ClusteredInfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBInfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackInfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedInfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineInfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022InfluxData
 

More from InfluxData (20)

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 
Understanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage EngineUnderstanding InfluxDB’s New Storage Engine
Understanding InfluxDB’s New Storage Engine
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
 

Recently uploaded

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 

Recently uploaded (20)

Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 

InfluxDB IOx Tech Talks: Intro to the InfluxDB IOx Read Buffer - A Read-Optimized In-Memory Query Execution Engine

  • 1. Paul Dix InfluxData – CTO & co-founder paul@influxdata.com @pauldix InfluxDB IOx - a new columnar time series database (update)
  • 2. Progress • New Team Members! • Read Buffer progress • Mutable Buffer & Read Buffer connections • Arrow Flight API • Replication, multiple IOx servers doc
  • 3. API Decisions • Management API will be gRPC – CLI for common tasks • Write – InfluxDB 2.0 Line Protocol – JSON objects (events!) – Protobuf? • Query – HTTP (csv, json, display) – Arrow Flight – Postgres?
  • 4. What’s Next? • Management API • Parquet Persistence to Object Store • Recovery from Object Store • Replication • Subscriptions • Official Builds & Documentation (now late March)
  • 5. Edd Robinson Engineer @ InfluxData edd@influxdata.com @e-dard 🐙 @eddrobinson 🐦 An Intro to the InfluxDB IOx Read Buffer: a read-optimised in-memory execution engine
  • 6. Me ● Software engineer at InfluxData. ● Worked on InfluxDB for ~4y: storage engine, write path, indexing. Working on IOx (and with Rust!) for just over a year.
  • 7. What are we working towards? ● Unlimited Data: ○ Object Storage, compression ● Unlimited Cardinality: ○ Data organisation, no large indexes. ● 🚀 Analytical Queries: ○ in-memory, columnar data-layout, lots of fanciness
  • 8. This talk is about... A sub-system in IOx called the Read Buffer, a new query execution engine. ● Work on data held in-memory and on-heap. No IO at read-time ● Data is immutable. ● Lots of wholesome column-store goodness: ○ 📊 ○ 🗜 ○ ⇶ ○ ❓ ○ ❓
  • 9. Wider Goals We want to have excellent support for different time-series use-cases ● Events ● Observability trifecta (logging, tracing, metrics) ● Large analytical workloads
  • 10. We already have a time-series database?
  • 16. Why columnar is the way to go ● Analytical workloads usually only need projections of dataset. ● Increase flexibility in data organisation. ● Improve data relevance. ● Reduce footprint through compression. ● Mechanical sympathy - CPUs love arrays. Forrest Smith - blog
  • 17. Why columnar is the way to go Memory Bandwidth: benchmark ● This example is synthetic (but indicative!) ● Data throughput from memory to CPU has an impact on performance. ● CPU cache is significantly faster than main memory
  • 18. Why columnar is the way to go L1 Cache L2/L3 Cache Main Memory Memory Bandwidth: benchmark ● This example is synthetic (but indicative)! ● Data throughput from memory to CPU has an impact on performance. ● CPU cache is significantly faster than main memory If you want to make the most use of your memory bandwidth: ● process less data. ● process more relevant data. Columnar representations help with both of these
  • 19. 🤿 Dive into the Read Buffer ● Data organisation; ● Data representation; ● Read execution (late materialisation); ● Early numbers! ● Future improvements.
  • 20. ● WAL: replication and recovery ● Mutable Buffer: query written data ● Object Store: for durability ● Read Buffer: optised read-only view of written data. IOx Write Path
  • 21. IOx Read Path Query Engine SQL Frontend Flux Frontend InfluxQL Frontend Mutable Buffer Read Buffer Object Storage Reader
  • 22. IOx Read Path Query Engine SQL Frontend Flux Frontend … Frontend Mutable Buffer Read Buffer Object Storage Reader
  • 24. Data Model Databases are collections of partitions Partition Key
  • 27. Data Model Tables contain Row Groups Same Schema Filter entire tables
  • 28. Data Model Row Groups contain columnar data Skip Row Group
  • 29. Data Model (thanks @alamb) weather,location=us-east temperature=82,humidity=67 1465839830100400200 weather,location=us-midwest temperature=82,humidity=65 1465839830100400200 weather,location=us-west temperature=70,humidity=54 1465839830100400200 weather,location=us-east temperature=83,humidity=69 1465839830200400200 weather,location=us-midwest temperature=87,humidity=78 1465839830200400200 weather,location=us-west temperature=72,humidity=56 1465839830200400200 weather,location=us-east temperature=84,humidity=67 1465839830300400200 weather,location=us-midwest temperature=90,humidity=82 1465839830400400200 weather,location=us-west temperature=71,humidity=57 1465839830400400200 location "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" temperature 82 82 70 83 87 72 84 90 71 humidity 67 65 54 69 78 56 67 82 57 timestamp 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z Row Group in Table: weather
  • 30. Supported Data Types Logical Data Types ● String (utf-8 valid strings) ● Float (double-precision float) (all of them 😉) ● Integer (signed integers) ● Unsigned (unsigned integers) ● Boolean ● Binary (arbitrary bytes) Semantic Column Types ● InfluxDB Tag ➟ String ● InfluxDB Field ➟ Most ● InfluxDB Timestamp ➟ I64 ● IOx Column ➟ Anything
  • 31. Tailored for time-series: ● scans, grouped aggregates, windowed aggregates, schema exploration (tables, columns, values). ● Table/row group pruning. ● Predicate pushdown. ● Comparator operators with constant on tag columns (<, <=, >, >=, =, !=} ● Aggregates any column(s) Interesting Supported Features
  • 32. Storing Data in the Read Buffer ➡
  • 33. Columnar Compression Spectrum Lots ‘o Compression 💯 Smaller Footprint 👎 High processing cost No Compression 👎 Larger footprint 💯 ~Zero processing cost
  • 34. Columnar Compression Spectrum Lots ‘o Compression Smaller Footprint High processing cost No Compression Larger footprint ~Zero processing cost Vec<T>
  • 35. Choice can depend on data location
  • 37. Read Buffer Compression Schemes Dictionary Encoding ● Good for high cardinality tag columns. ● Column order not factor in compression. ● Constant time access. 🚀 ● Key: Operate directly on compressed data. 🚀
  • 38. Read Buffer Compression Schemes Filtering Dictionary Encoding WHERE “region” = ‘east’ x = 0 {0, 2, 7, 15} WHERE “region” > ‘north’ x > 1 {1, 3, 5, 8, 9, 10, 11, 12, 14}
  • 39. “RLE” - Run-Length Encoding ● Incredible compression when lots of “runs”. ● Works best on heavily sorted columns. ● Not as consumable* ● Pre-computed bitsets 🚀 ● Can operate on compressed data. 🚀 Read Buffer Compression Schemes
  • 40. Read Buffer Compression Schemes “RLE” - Run-Length Encoding WHERE “region” = ‘east’ x = 0 WHERE “region” > ‘north’ x > 1 {9, 10, 11, 12, 13, 14, 15}
  • 41. Which Dictionary Encoding? WHERE “region” = ‘east’ ● 10M rows in column ● Cardinality 10,000 ● Single thread Billions rows/second processed
  • 42. Which Dictionary Encoding? WHERE “region” = ‘east’ ● 10M rows in column. ● Cardinality 10,000. ● Single thread. ● SIMD intrinsics on Dictionary Encoding. ● RLE is on another level: “cheating”... Billions rows/second processed RLE 59ms 2.2ms 420ns 380MB ~40MB ~40MB
  • 43. Which Dictionary Encoding? WHERE “span_id” = ‘123djk7GHs99wj’ ● 10 million rows in column. ● Cardinality 10 million. ● Single thread. ● SIMD intrinsics on Dictionary Encoding. Billions rows/second processed RLE 60ms 2.2ms 380MB ~420MB 580ns ~1GB
  • 44. Which Dictionary Encoding? “I need rows [2, 33, 55, 111, 3343]” 10,000,000 row column Encoding Cardinality 10K (materialise 1000 rows near end) Cardinality 10M (materialise 1 row near end) Vec<String> Dictionary μ RLE μ
  • 45. Which Dictionary Encoding? ● ● filtering ● materialisation
  • 46. Numerical Column Encodings Supported Logical types: i64, u64, f64 {u8, i8,.., u64, i64}* &[i64]: (48 B) [123, 198, 1, 33, 133, 224] ➠ &[u8]: (6 B) [..] &[i64]: (48 B) [-18, 2, 0, 220, 2, 26] ➠ &[i16]: (12 B) [..]
  • 48. Read Execution SELECT “host”, “counter”, “time” FROM “cpu” WHERE “env” = ‘prod’ AND “path” = ‘/write’ AND “counter” > 200 AND “time” >= x AND “time” < y; ● ● ● ●
  • 49. Late Materialisation - Scanning SELECT “host”, “counter”, “time” FROM “cpu” WHERE “env” = ‘prod’ AND “path” = ‘/write’ AND “counter” > 200 AND “time” >= x AND “time” < y;
  • 50. Late Materialisation - Grouping SELECT SUM(“counter”) FROM “cpu” WHERE “path” = ‘/query’ AND “time” >= x AND “time” < y GROUP BY “region”; ♥
  • 51. Let’s look at some initial numbers
  • 52. ● ● span_id ● ● ● Synthetic High Cardinality Tracing use-case Column Name Cardinality Encoding
  • 53. How much space do we need? ● ● ●
  • 54. How much space do we need? ● ● ●
  • 55. 1 M 1 ms 1.2 ms 10 M 1.1 ms 2.5 ms 60 M 1.3 ms 15.7 ms SELECT * FROM “traces” WHERE “trace_id” = ‘H7whivfl’; ● ● 🤔 ● 💪 ● “Needle in a Haystack”
  • 56. SELECT SUM(duration) FROM “traces” GROUP BY “trace_id”; ● ● ● Aggregating over high-cardinality 1 M 30 s (~10 GB RAM) 45 ms (8 MB) 10 M 18 min (140 GB RAM) 498 ms (150 MB) 60 M D.N.F (OOM) 4.3 s (900MB)
  • 57. SHOW TAG KEYS WHERE “cluster” = ‘cluster-2-2-3’ AND time >= x AND time < y ; Schema Exploration 1 M 15 ms 12 μs 10 M 150 ms 47 μs 60 M 1.6 s 120 μs
  • 58. Future Work Lots more to do in Read Buffer land! ● Data-type support. ● More supported predicate, e.g., regex, LIKE, OR. ● More columnar encodings (e.g., time-series specific field encodings) ● Deletes support! (Proposal written up) ● Complete implementation of all physical operations. ● Performance - predicate caching, buffer pooling etc. ● Concurrent execution.
  • 60. Paul Dix InfluxData – CTO & co-founder paul@influxdata.com @pauldix InfluxDB IOx - a new columnar time series database (update)
  • 61. Progress • New Team Members! • Read Buffer progress • Mutable Buffer & Read Buffer connections • Arrow Flight API • Replication, multiple IOx servers doc
  • 62. API Decisions • Management API will be gRPC – CLI for common tasks • Write – InfluxDB 2.0 Line Protocol – JSON objects (events!) – Protobuf? • Query – HTTP (csv, json, display) – Arrow Flight – Postgres?
  • 63. What’s Next? • Management API • Parquet Persistence to Object Store • Recovery from Object Store • Replication • Subscriptions • Official Builds & Documentation (now late March)
  • 64. Paul Dix InfluxData – CTO & co-founder paul@influxdata.com @pauldix InfluxDB IOx - a new columnar time series database (update)
  • 65. Progress • New Team Members! • Read Buffer progress • Mutable Buffer & Read Buffer connections • Arrow Flight API • Replication, multiple IOx servers doc
  • 66. API Decisions • Management API will be gRPC – CLI for common tasks • Write – InfluxDB 2.0 Line Protocol – JSON objects (events!) – Protobuf? • Query – HTTP (csv, json, display) – Arrow Flight – Postgres?
  • 67. What’s Next? • Management API • Parquet Persistence to Object Store • Recovery from Object Store • Replication • Subscriptions • Official Builds & Documentation (now late March)