SlideShare a Scribd company logo
1 of 45
Download to read offline
The Impossible Dream:
Easy-to-Use, Super Fast Software and
Simple Implementation
Nga Tran
Staff Engineer, InfluxData
November 10, 2021
● InfluxData - Staff Engineer
● Vertica RDBMS
Users: We want a Database that ...
Easy to Use Fast (Great Performance)
Developers: How do we build it?
Can it be simple? Or it has to be complex?
There must be trade-offs
Need a complicated black box to meet
easy-to-use & fast requirements
Outline
Easy-to-Use & Fast for Users Complicated Implementation for Developers
1 Flexible Writing Schema (No DDL) Need Schema Merging at Reading Time
2 Fast Read Prune Non-Covered Data Chunks (Predicate Push-Down)
3 Able to Load Duplicated Data Need Data Deduplication & Compaction Operations
4 Quick Data Deletion No deletion right away but need data elimination at read
time or in the background
1. Flexible Writing Schema (No DDL)
→ Schema Merging at Reading Time
Flexible Writing Schema
Timeseries Data Model
weather,location=us-east temperature=82,humidity=67 1465839830100400200
weather,location=us-midwest temperature=82,humidity=65 1465839830100400200
weather,location=us-west temperature=70,humidity=54 1465839830100400200
weather,location=us-east temperature=83,humidity=69 1465839830200400200
weather,location=us-midwest temperature=87,humidity=78 1465839830200400200
weather,location=us-west temperature=72,humidity=56 1465839830200400200
weather,location=us-east temperature=84,humidity=67 1465839830300400200
weather,location=us-midwest temperature=90,humidity=82 1465839830400400200
weather,location=us-west temperature=71,humidity=57 1465839830400400200
location
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
"us-east"
"us-midwest"
"us-west"
temperature
82
82
70
83
87
72
84
90
71
humidity
67
65
54
69
78
56
67
82
57
timestamp
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
2016-06-13T17:43:50.3004002Z
. No need to pre-define Schema (Table and its Columns)
. Loading Data can include different Tables & Columns
Flexible Writing Schema
Flexible Writing Schema
Load 1:
weather,location=east temp=82,humidity=67 1465839830100400200
weather,location=west temp=70 1465839830100400200
weather,location=east temp=82,humidity=69
host,state=MA,city=Boston cpu=10 1465839830200400200
host,state=MA,city=Andover cpu=12 1465839830400400200
weather,location=midwest temp=70,humidity=57 1465839830400400200
Loading Data IOx Storage: Table Chunks
Flexible Writing Schema
Load 1:
weather,location=east temp=82,humidity=67 1465839830100400200
weather,location=west temp=70 1465839830100400200
weather,location=east temp=82,humidity=69
host,state=MA,city=Boston cpu=10 1465839830200400200
host,state=MA,city=Andover cpu=12 1465839830400400200
weather,location=midwest temp=70,humidity=57 1465839830400400200
Loading Data
location temp humidity timestamp
east
west
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
IOx Storage: Table Chunks
Flexible Writing Schema
Load 1:
weather,location=east temp=82,humidity=67 1465839830100400200
weather,location=west temp=70 1465839830100400200
weather,location=east temp=82,humidity=69
host,state=MA,city=Boston cpu=10 1465839830200400200
host,state=MA,city=Andover cpu=12 1465839830400400200
weather,location=midwest temp=70,humidity=57 1465839830400400200
Loading Data
location temp humidity timestamp
east
west
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
IOx Storage: Table Chunks
Load 2:
host,state=MA,city=Boston disk=100 1465839830200500200
host,state=NY,city=New York disk=200 1465839830400600200
Flexible Writing Schema
Load 1:
weather,location=east temp=82,humidity=67 1465839830100400200
weather,location=west temp=70 1465839830100400200
weather,location=east temp=82,humidity=69
host,state=MA,city=Boston cpu=10 1465839830200400200
host,state=MA,city=Andover cpu=12 1465839830400400200
weather,location=midwest temp=70,humidity=57 1465839830400400200
Loading Data
location temp humidity timestamp
east
west
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
IOx Storage: Table Chunks
Load 2:
host,state=MA,city=Boston disk=100 1465839830200500200
host,state=NY,city=New York disk=200 1465839830400600200
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Flexible Writing Schema
Load 1:
weather,location=east temp=82,humidity=67 1465839830100400200
weather,location=west temp=70 1465839830100400200
weather,location=east temp=82,humidity=69
host,state=MA,city=Boston cpu=10 1465839830200400200
host,state=MA,city=Andover cpu=12 1465839830400400200
weather,location=midwest temp=70,humidity=57 1465839830400400200
Loading Data
(*) Chunk Types: Mutable Buffer, Read Buffer, Object Store (see previous talks)
location temp humidity timestamp
east
west
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
IOx Storage: Table Chunks(*)
Load 2:
host,state=MA,city=Boston disk=100 1465839830200500200
host,state=NY,city=New York disk=200 1465839830400600200
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Flexible Writing Schema
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Different Host Chunk Schema
Flexible Writing Schema → Schema Merging at Read Time
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Different Host Chunk Schema User issues: Read everything from Host
Flexible Writing Schema → Schema Merging at Read Time
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Different Host Chunk Schema User issues: Read everything from Host
→ IOx merges Chunk Schema at Scan Step
Host’s Chunk 1
state city cpu disk timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city cpu disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
2. Fast Read
→ Prune Non-Covered Data Chunks
(Predicate Push-Down)
Fast Read
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Different Host Chunk Schema User issues: Read everything from Host with “disk > 100”
Fast Read → Prune Non-Covered Data Chunks
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Different Host Chunk Schema User issues: Read everything from Host with “disk > 100”
→ IOx prunes Chunk 1 by applying predicate “disk > 100”
to prune non-covered “disk” data chunks
Host’s Chunk 2
state city cpu disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Fast Read → Prune Non-Covered Data Chunks
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Different Host Chunk Schema User issues: Read everything from Host with “disk > 100”
→ IOx prunes Chunk 1 by applying predicate “disk > 100”
to prune non-covered “disk” data chunks
Then apply further the predicate “disk > 100” to return 1
row
Host’s Chunk 2
state city cpu disk timestamp
NY New York 200 2016-06-13T17:43:50.6004002Z
Fast Read → Prune Non-Covered Data Chunks
Chunk Scan without Pruning
IOxReadFilterNode
chunk_id = 1
IOxReadFilterNode
chunk_id = 2
UnionExec
Chunk Scan with Pruning
Previous IOx Talk: Query Processing in InfluxDB IOx
IOxReadFilterNode
chunk_id = 2
FilterExec
(Disk > 100)
FilterExec
(Disk > 100)
3. Able to Load Duplicated Data
→ Deduplicate & Compact Operators
Able to Load Duplicated Data
Load 1:
weather,location=east temp=82,humidity=67 1465839830100400200
weather,location=west temp=70 1465839830100400200
weather,location=east temp=82,humidity=69
host,state=MA,city=Boston cpu=10 1465839830200400200
host,state=MA,city=Andover cpu=12 1465839830400400200
weather,location=midwest temp=70,humidity=57 1465839830400400200
Loading Data: Same tag values are duplicates
location temp humidity timestamp
east
west
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
IOx Storage: Table Chunks
Load 2:
host,state=MA,city=Boston disk=100 1465839830200500200
host,state=NY,city=New York disk=200 1465839830400600200
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Able to Load Duplicated Data → Deduplicate at Read Time
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Table Chunks
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Able to Load Duplicated Data → Deduplicate at Read Time
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Table Chunks User issues: Read Weather Data
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Able to Load Duplicated Data → Deduplicate at Read Time
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Table Chunks User issues: Read Weather Data
→ 3 rows returned
location temp humidity timestamp
east
west
midwest
82
70
70
69
57
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.4004002Z
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Able to Load Duplicated Data → Deduplicate at Read Time
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Table Chunks User issues: Read Weather Data
→ 3 rows returned
location temp humidity timestamp
east
west
midwest
82
70
70
69
57
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.4004002Z
User issues: Read Host Data
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Able to Load Duplicated Data → Deduplicate at Read Time
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: Table Chunks User issues: Read Weather Data
→ 3 rows returned
location temp humidity timestamp
east
west
midwest
82
70
70
69
57
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.4004002Z
User issues: Read Host Data
→ 3 rows returned
state city cpu disk timestamp
MA
MA
NY
Boston
Andover
New York
10
12
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.4004002Z
2016-06-13T17:43:50.6004002Z
Able to Load Duplicated Data → Deduplicate at Read Time
Chunk Scan without Deduplication
IOxReadFilterNode
chunk_id = 1
IOxReadFilterNode
chunk_id = 2
UnionExec
SortPreservingMerge
DeduplicateExec
SortExec(optional)
Sort_key: tags
SortExec(optional)
Sort_key: tags
IOxReadFilterNode
chunk_id = 1
IOxReadFilterNode
chunk_id = 2
UnionExec
Chunk Scan with Deduplication (*)
(*) Previous IOx Talk: Query Processing in InfluxDB IOx
Able to Load Duplicated Data → Compact from time to time
IOx Storage: before compaction
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Able to Load Duplicated Data → Compact from time to time
IOx Storage: before compaction
location temp humidity timestamp
east
west
midwest
82
70
70
69
57
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.4004002Z
state city cpu disk timestamp
MA
MA
NY
Boston
Andover
New York
10
12
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.4004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: after compaction
Weather’s Chunk
Host’s Chunk
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Able to Load Duplicated Data → Compact from time to time
● Compaction Operation ≃
Deduplication Operation (= Chunk Scan) +
Create a new chunk to store deduplicated data +
Drop old chunks
● Compaction runs in the background based on compaction policy
4. Quick Data Deletion
→ No Online Deletion
→ Eliminate data at Read Time &
Actual Deletion during Compaction
● User issues a Delete
○ Nothing is deleted (Classic Technique in Analytic/Big Data System)
○ Delete Predicate is stored as a Tombstone
● At Read time (Chunk Scan)
○ The Tombstone is applied at Scan step to not return the deleted data
● During compaction
○ The newly created chunk won’t include deleted data as the result of the chunk
scan
Quick Data Deletion
Quick Data Deletion → Add Tombstone at Delete Time
IOx Storage: before delete
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Quick Data Deletion → Add Tombstone at Delete Time
IOx Storage: before delete IOx Storage: after delete from host “city = Boston”
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Tombstone: “city = Boston”
Tombstone: “city = Boston”
Quick Data Deletion → Eliminate Data at Read Time
IOx Storage: Host Chunks with Tombstones
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Tombstone: “city = Boston”
Tombstone: “city = Boston”
Quick Data Deletion → Eliminate Data at Read Time
IOx Storage: Host Chunks with Tombstones User issues: Read everything from Host
→ IOx applies tombstones to eliminate data
→ 2 rows returned
state city cpu disk timestamp
MA
NY
Andover
New York
12
200
2016-06-13T17:43:50.4004002Z
2016-06-13T17:43:50.6004002Z
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Tombstone: “city = Boston”
Tombstone: “city = Boston”
Quick Data Deletion → Eliminate Data at Read Time
IOxReadFilterNode
chunk_id = 1
IOxReadFilterNode
chunk_id = 2
UnionExec
SortPreservingMerge
DeduplicateExec
SortExec(optional)
Sort_key: tags
SortExec(optional)
Sort_key: tags
FilterExec
city = Boston
Chunk Scan without Delete Chunk Scan with Delete(*) (city = Boston)
IOxReadFilterNode
chunk_id = 1
IOxReadFilterNode
chunk_id = 2
UnionExec
SortPreservingMerge
DeduplicateExec
SortExec(optional)
Sort_key: tags
SortExec(optional)
Sort_key: tags
(*) Previous IOx Talk: Query Processing in InfluxDB IOx
FilterExec
city = Boston
Quick Data Deletion → Compact from time to time
IOx Storage: before compaction
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Tombstone: “city = Boston”
Tombstone: “city = Boston”
Quick Data Deletion → Compact from time to time
IOx Storage: before compaction
location temp humidity timestamp
east
west
midwest
82
70
70
69
57
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.4004002Z
state city cpu disk timestamp
MA
NY
Andover
New York
12
200
2016-06-13T17:43:50.4004002Z
2016-06-13T17:43:50.6004002Z
IOx Storage: after compaction (deduplication + delete)
Weather’s Chunk
Host’s Chunk
location temp humidity timestamp
east
West
east
midwest
82
70
82
70
67
69
57
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.1004002Z
2016-06-13T17:43:50.5000000Z
2016-06-13T17:43:50.4004002Z
Weather’s Chunk 1
Host’s Chunk 1
state city cpu timestamp
MA
MA
Boston
Andover
10
12
2016-06-13T17:43:50.2004002Z
2016-06-13T17:43:50.4004002Z
Host’s Chunk 2
state city disk timestamp
MA
NY
Boston
New York
100
200
2016-06-13T17:43:50.5004002Z
2016-06-13T17:43:50.6004002Z
Tombstone: “city = Boston”
Tombstone: “city = Boston”
Summary:
Easy-to-Use & Fast for Users Complicated Implementation for Developers
1 Flexible Writing Schema (No DDL) Need Schema Merging at Reading Time
2 Fast Read Prune Non-Covered Data Chunks (Predicate Push-Down)
3 Able to Load Duplicated Data Need Data Deduplication & Compaction Operations
4 Quick Data Deletion No deletion right away but need data elimination at read
time or in the background
Simplicity is the Ultimate Sophistication
But InfluxData is committed to bring Simplicity to Users
Thank You

More Related Content

What's hot

Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsAlluxio, Inc.
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceDatabricks
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in RustAndrew Lamb
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Cloudera, Inc.
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache CassandraPatrick McFadin
 
Improving Apache Spark's Reliability with DataSourceV2
Improving Apache Spark's Reliability with DataSourceV2Improving Apache Spark's Reliability with DataSourceV2
Improving Apache Spark's Reliability with DataSourceV2Databricks
 
InfluxDB Roadmap: What’s New and What’s Coming
InfluxDB Roadmap: What’s New and What’s ComingInfluxDB Roadmap: What’s New and What’s Coming
InfluxDB Roadmap: What’s New and What’s ComingInfluxData
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engineInfluxData
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Databricks
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Spark Summit
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortNAVER D2
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkDatabricks
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversScyllaDB
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Julien Le Dem
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...HostedbyConfluent
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark Summit
 

What's hot (20)

Apache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic DatasetsApache Iceberg - A Table Format for Hige Analytic Datasets
Apache Iceberg - A Table Format for Hige Analytic Datasets
 
Cosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle ServiceCosco: An Efficient Facebook-Scale Shuffle Service
Cosco: An Efficient Facebook-Scale Shuffle Service
 
Introduction to DataFusion An Embeddable Query Engine Written in Rust
Introduction to DataFusion  An Embeddable Query Engine Written in RustIntroduction to DataFusion  An Embeddable Query Engine Written in Rust
Introduction to DataFusion An Embeddable Query Engine Written in Rust
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
 
Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0Efficient Data Storage for Analytics with Apache Parquet 2.0
Efficient Data Storage for Analytics with Apache Parquet 2.0
 
Storing time series data with Apache Cassandra
Storing time series data with Apache CassandraStoring time series data with Apache Cassandra
Storing time series data with Apache Cassandra
 
Improving Apache Spark's Reliability with DataSourceV2
Improving Apache Spark's Reliability with DataSourceV2Improving Apache Spark's Reliability with DataSourceV2
Improving Apache Spark's Reliability with DataSourceV2
 
InfluxDB Roadmap: What’s New and What’s Coming
InfluxDB Roadmap: What’s New and What’s ComingInfluxDB Roadmap: What’s New and What’s Coming
InfluxDB Roadmap: What’s New and What’s Coming
 
Inside the InfluxDB storage engine
Inside the InfluxDB storage engineInside the InfluxDB storage engine
Inside the InfluxDB storage engine
 
MyRocks Deep Dive
MyRocks Deep DiveMyRocks Deep Dive
MyRocks Deep Dive
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
 
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
Deep Dive into Project Tungsten: Bringing Spark Closer to Bare Metal-(Josh Ro...
 
ceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-shortceph optimization on ssd ilsoo byun-short
ceph optimization on ssd ilsoo byun-short
 
Optimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache SparkOptimizing Delta/Parquet Data Lakes for Apache Spark
Optimizing Delta/Parquet Data Lakes for Apache Spark
 
Optimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database DriversOptimizing Performance in Rust for Low-Latency Database Drivers
Optimizing Performance in Rust for Low-Latency Database Drivers
 
Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013Parquet Strata/Hadoop World, New York 2013
Parquet Strata/Hadoop World, New York 2013
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
 

Similar to Easy-to-use, super-fast software

2021 10-13 i ox query processing
2021 10-13 i ox query processing2021 10-13 i ox query processing
2021 10-13 i ox query processingAndrew Lamb
 
Getting Started with NuoDB Community Edition
Getting Started with NuoDB Community Edition Getting Started with NuoDB Community Edition
Getting Started with NuoDB Community Edition NuoDB
 
How to create a pluggable database by cloning an existing local pdb
How to create a pluggable database by cloning an existing local pdbHow to create a pluggable database by cloning an existing local pdb
How to create a pluggable database by cloning an existing local pdbMarco Vigelini
 
Operating Systems: Revision
Operating Systems: RevisionOperating Systems: Revision
Operating Systems: RevisionDamian T. Gordon
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimizationEric Guo
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL softbasemarketing
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudCeph Community
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudPatrick McGarry
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesBrendan Gregg
 
Whitepaper MS SQL Server on Linux
Whitepaper MS SQL Server on LinuxWhitepaper MS SQL Server on Linux
Whitepaper MS SQL Server on LinuxRoger Eisentrager
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoringIben Rodriguez
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11gfcamachob
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Ververica
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Rick Branson
 
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebLinux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebAll Things Open
 
Backup automation in KAKAO
Backup automation in KAKAO Backup automation in KAKAO
Backup automation in KAKAO I Goo Lee
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)J.B. Langston
 

Similar to Easy-to-use, super-fast software (20)

2021 10-13 i ox query processing
2021 10-13 i ox query processing2021 10-13 i ox query processing
2021 10-13 i ox query processing
 
Rmoug ashmaster
Rmoug ashmasterRmoug ashmaster
Rmoug ashmaster
 
Getting Started with NuoDB Community Edition
Getting Started with NuoDB Community Edition Getting Started with NuoDB Community Edition
Getting Started with NuoDB Community Edition
 
How to create a pluggable database by cloning an existing local pdb
How to create a pluggable database by cloning an existing local pdbHow to create a pluggable database by cloning an existing local pdb
How to create a pluggable database by cloning an existing local pdb
 
Operating Systems: Revision
Operating Systems: RevisionOperating Systems: Revision
Operating Systems: Revision
 
Web TCard - Speed optimization
Web TCard - Speed optimizationWeb TCard - Speed optimization
Web TCard - Speed optimization
 
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
Quickly Locate Poorly Performing DB2 for z/OS Batch SQL
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack CloudJourney to Stability: Petabyte Ceph Cluster in OpenStack Cloud
Journey to Stability: Petabyte Ceph Cluster in OpenStack Cloud
 
EuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis MethodologiesEuroBSDcon 2017 System Performance Analysis Methodologies
EuroBSDcon 2017 System Performance Analysis Methodologies
 
Whitepaper MS SQL Server on Linux
Whitepaper MS SQL Server on LinuxWhitepaper MS SQL Server on Linux
Whitepaper MS SQL Server on Linux
 
Fine grained monitoring
Fine grained monitoringFine grained monitoring
Fine grained monitoring
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
Keynote: Stephan Ewen - Stream Processing as a Foundational Paradigm and Apac...
 
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
Flink Forward SF 2017: Stephan Ewen - Experiences running Flink at Very Large...
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
 
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure WebLinux HTTPS/TCP/IP Stack for the Fast and Secure Web
Linux HTTPS/TCP/IP Stack for the Fast and Secure Web
 
Backup automation in KAKAO
Backup automation in KAKAO Backup automation in KAKAO
Backup automation in KAKAO
 
Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)Cassandra Troubleshooting (for 2.0 and earlier)
Cassandra Troubleshooting (for 2.0 and earlier)
 

More from InfluxData

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB ClusteredInfluxData
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemInfluxData
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...InfluxData
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBInfluxData
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base InfluxData
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackInfluxData
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustInfluxData
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedInfluxData
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB InfluxData
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...InfluxData
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...InfluxData
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineInfluxData
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena InfluxData
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBInfluxData
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...InfluxData
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022InfluxData
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...InfluxData
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022InfluxData
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022InfluxData
 

More from InfluxData (20)

Announcing InfluxDB Clustered
Announcing InfluxDB ClusteredAnnouncing InfluxDB Clustered
Announcing InfluxDB Clustered
 
Best Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow EcosystemBest Practices for Leveraging the Apache Arrow Ecosystem
Best Practices for Leveraging the Apache Arrow Ecosystem
 
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
How Bevi Uses InfluxDB and Grafana to Improve Predictive Maintenance and Redu...
 
Power Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDBPower Your Predictive Analytics with InfluxDB
Power Your Predictive Analytics with InfluxDB
 
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
How Teréga Replaces Legacy Data Historians with InfluxDB, AWS and IO-Base
 
Build an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING StackBuild an Edge-to-Cloud Solution with the MING Stack
Build an Edge-to-Cloud Solution with the MING Stack
 
Meet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using RustMeet the Founders: An Open Discussion About Rewriting Using Rust
Meet the Founders: An Open Discussion About Rewriting Using Rust
 
Introducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud DedicatedIntroducing InfluxDB Cloud Dedicated
Introducing InfluxDB Cloud Dedicated
 
Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB Gain Better Observability with OpenTelemetry and InfluxDB
Gain Better Observability with OpenTelemetry and InfluxDB
 
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
How a Heat Treating Plant Ensures Tight Process Control and Exceptional Quali...
 
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...How Delft University's Engineering Students Make Their EV Formula-Style Race ...
How Delft University's Engineering Students Make Their EV Formula-Style Race ...
 
Introducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage EngineIntroducing InfluxDB’s New Time Series Database Storage Engine
Introducing InfluxDB’s New Time Series Database Storage Engine
 
Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena Start Automating InfluxDB Deployments at the Edge with balena
Start Automating InfluxDB Deployments at the Edge with balena
 
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDBStreamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
Streamline and Scale Out Data Pipelines with Kubernetes, Telegraf, and InfluxDB
 
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
Ward Bowman [PTC] | ThingWorx Long-Term Data Storage with InfluxDB | InfluxDa...
 
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
Scott Anderson [InfluxData] | New & Upcoming Flux Features | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts | InfluxDays 2022
 
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
Steinkamp, Clifford [InfluxData] | Welcome to InfluxDays 2022 - Day 2 | Influ...
 
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
Steinkamp, Clifford [InfluxData] | Closing Thoughts Day 1 | InfluxDays 2022
 
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
Paul Dix [InfluxData] The Journey of InfluxDB | InfluxDays 2022
 

Recently uploaded

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 

Recently uploaded (20)

Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 

Easy-to-use, super-fast software

  • 1. The Impossible Dream: Easy-to-Use, Super Fast Software and Simple Implementation Nga Tran Staff Engineer, InfluxData November 10, 2021
  • 2. ● InfluxData - Staff Engineer ● Vertica RDBMS
  • 3. Users: We want a Database that ... Easy to Use Fast (Great Performance)
  • 4. Developers: How do we build it? Can it be simple? Or it has to be complex?
  • 5. There must be trade-offs Need a complicated black box to meet easy-to-use & fast requirements
  • 6. Outline Easy-to-Use & Fast for Users Complicated Implementation for Developers 1 Flexible Writing Schema (No DDL) Need Schema Merging at Reading Time 2 Fast Read Prune Non-Covered Data Chunks (Predicate Push-Down) 3 Able to Load Duplicated Data Need Data Deduplication & Compaction Operations 4 Quick Data Deletion No deletion right away but need data elimination at read time or in the background
  • 7. 1. Flexible Writing Schema (No DDL) → Schema Merging at Reading Time
  • 9. Timeseries Data Model weather,location=us-east temperature=82,humidity=67 1465839830100400200 weather,location=us-midwest temperature=82,humidity=65 1465839830100400200 weather,location=us-west temperature=70,humidity=54 1465839830100400200 weather,location=us-east temperature=83,humidity=69 1465839830200400200 weather,location=us-midwest temperature=87,humidity=78 1465839830200400200 weather,location=us-west temperature=72,humidity=56 1465839830200400200 weather,location=us-east temperature=84,humidity=67 1465839830300400200 weather,location=us-midwest temperature=90,humidity=82 1465839830400400200 weather,location=us-west temperature=71,humidity=57 1465839830400400200 location "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" "us-east" "us-midwest" "us-west" temperature 82 82 70 83 87 72 84 90 71 humidity 67 65 54 69 78 56 67 82 57 timestamp 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z 2016-06-13T17:43:50.3004002Z
  • 10. . No need to pre-define Schema (Table and its Columns) . Loading Data can include different Tables & Columns Flexible Writing Schema
  • 11. Flexible Writing Schema Load 1: weather,location=east temp=82,humidity=67 1465839830100400200 weather,location=west temp=70 1465839830100400200 weather,location=east temp=82,humidity=69 host,state=MA,city=Boston cpu=10 1465839830200400200 host,state=MA,city=Andover cpu=12 1465839830400400200 weather,location=midwest temp=70,humidity=57 1465839830400400200 Loading Data IOx Storage: Table Chunks
  • 12. Flexible Writing Schema Load 1: weather,location=east temp=82,humidity=67 1465839830100400200 weather,location=west temp=70 1465839830100400200 weather,location=east temp=82,humidity=69 host,state=MA,city=Boston cpu=10 1465839830200400200 host,state=MA,city=Andover cpu=12 1465839830400400200 weather,location=midwest temp=70,humidity=57 1465839830400400200 Loading Data location temp humidity timestamp east west east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z IOx Storage: Table Chunks
  • 13. Flexible Writing Schema Load 1: weather,location=east temp=82,humidity=67 1465839830100400200 weather,location=west temp=70 1465839830100400200 weather,location=east temp=82,humidity=69 host,state=MA,city=Boston cpu=10 1465839830200400200 host,state=MA,city=Andover cpu=12 1465839830400400200 weather,location=midwest temp=70,humidity=57 1465839830400400200 Loading Data location temp humidity timestamp east west east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z IOx Storage: Table Chunks Load 2: host,state=MA,city=Boston disk=100 1465839830200500200 host,state=NY,city=New York disk=200 1465839830400600200
  • 14. Flexible Writing Schema Load 1: weather,location=east temp=82,humidity=67 1465839830100400200 weather,location=west temp=70 1465839830100400200 weather,location=east temp=82,humidity=69 host,state=MA,city=Boston cpu=10 1465839830200400200 host,state=MA,city=Andover cpu=12 1465839830400400200 weather,location=midwest temp=70,humidity=57 1465839830400400200 Loading Data location temp humidity timestamp east west east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z IOx Storage: Table Chunks Load 2: host,state=MA,city=Boston disk=100 1465839830200500200 host,state=NY,city=New York disk=200 1465839830400600200 Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 15. Flexible Writing Schema Load 1: weather,location=east temp=82,humidity=67 1465839830100400200 weather,location=west temp=70 1465839830100400200 weather,location=east temp=82,humidity=69 host,state=MA,city=Boston cpu=10 1465839830200400200 host,state=MA,city=Andover cpu=12 1465839830400400200 weather,location=midwest temp=70,humidity=57 1465839830400400200 Loading Data (*) Chunk Types: Mutable Buffer, Read Buffer, Object Store (see previous talks) location temp humidity timestamp east west east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z IOx Storage: Table Chunks(*) Load 2: host,state=MA,city=Boston disk=100 1465839830200500200 host,state=NY,city=New York disk=200 1465839830400600200 Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 16. Flexible Writing Schema Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Different Host Chunk Schema
  • 17. Flexible Writing Schema → Schema Merging at Read Time Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Different Host Chunk Schema User issues: Read everything from Host
  • 18. Flexible Writing Schema → Schema Merging at Read Time Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Different Host Chunk Schema User issues: Read everything from Host → IOx merges Chunk Schema at Scan Step Host’s Chunk 1 state city cpu disk timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city cpu disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 19. 2. Fast Read → Prune Non-Covered Data Chunks (Predicate Push-Down)
  • 20. Fast Read Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Different Host Chunk Schema User issues: Read everything from Host with “disk > 100”
  • 21. Fast Read → Prune Non-Covered Data Chunks Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Different Host Chunk Schema User issues: Read everything from Host with “disk > 100” → IOx prunes Chunk 1 by applying predicate “disk > 100” to prune non-covered “disk” data chunks Host’s Chunk 2 state city cpu disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 22. Fast Read → Prune Non-Covered Data Chunks Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Different Host Chunk Schema User issues: Read everything from Host with “disk > 100” → IOx prunes Chunk 1 by applying predicate “disk > 100” to prune non-covered “disk” data chunks Then apply further the predicate “disk > 100” to return 1 row Host’s Chunk 2 state city cpu disk timestamp NY New York 200 2016-06-13T17:43:50.6004002Z
  • 23. Fast Read → Prune Non-Covered Data Chunks Chunk Scan without Pruning IOxReadFilterNode chunk_id = 1 IOxReadFilterNode chunk_id = 2 UnionExec Chunk Scan with Pruning Previous IOx Talk: Query Processing in InfluxDB IOx IOxReadFilterNode chunk_id = 2 FilterExec (Disk > 100) FilterExec (Disk > 100)
  • 24. 3. Able to Load Duplicated Data → Deduplicate & Compact Operators
  • 25. Able to Load Duplicated Data Load 1: weather,location=east temp=82,humidity=67 1465839830100400200 weather,location=west temp=70 1465839830100400200 weather,location=east temp=82,humidity=69 host,state=MA,city=Boston cpu=10 1465839830200400200 host,state=MA,city=Andover cpu=12 1465839830400400200 weather,location=midwest temp=70,humidity=57 1465839830400400200 Loading Data: Same tag values are duplicates location temp humidity timestamp east west east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z IOx Storage: Table Chunks Load 2: host,state=MA,city=Boston disk=100 1465839830200500200 host,state=NY,city=New York disk=200 1465839830400600200 Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 26. location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Able to Load Duplicated Data → Deduplicate at Read Time Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Table Chunks
  • 27. location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Able to Load Duplicated Data → Deduplicate at Read Time Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Table Chunks User issues: Read Weather Data
  • 28. location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Able to Load Duplicated Data → Deduplicate at Read Time Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Table Chunks User issues: Read Weather Data → 3 rows returned location temp humidity timestamp east west midwest 82 70 70 69 57 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.4004002Z
  • 29. location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Able to Load Duplicated Data → Deduplicate at Read Time Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Table Chunks User issues: Read Weather Data → 3 rows returned location temp humidity timestamp east west midwest 82 70 70 69 57 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.4004002Z User issues: Read Host Data
  • 30. location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Able to Load Duplicated Data → Deduplicate at Read Time Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: Table Chunks User issues: Read Weather Data → 3 rows returned location temp humidity timestamp east west midwest 82 70 70 69 57 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.4004002Z User issues: Read Host Data → 3 rows returned state city cpu disk timestamp MA MA NY Boston Andover New York 10 12 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.4004002Z 2016-06-13T17:43:50.6004002Z
  • 31. Able to Load Duplicated Data → Deduplicate at Read Time Chunk Scan without Deduplication IOxReadFilterNode chunk_id = 1 IOxReadFilterNode chunk_id = 2 UnionExec SortPreservingMerge DeduplicateExec SortExec(optional) Sort_key: tags SortExec(optional) Sort_key: tags IOxReadFilterNode chunk_id = 1 IOxReadFilterNode chunk_id = 2 UnionExec Chunk Scan with Deduplication (*) (*) Previous IOx Talk: Query Processing in InfluxDB IOx
  • 32. Able to Load Duplicated Data → Compact from time to time IOx Storage: before compaction location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 33. Able to Load Duplicated Data → Compact from time to time IOx Storage: before compaction location temp humidity timestamp east west midwest 82 70 70 69 57 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.4004002Z state city cpu disk timestamp MA MA NY Boston Andover New York 10 12 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.4004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: after compaction Weather’s Chunk Host’s Chunk location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 34. Able to Load Duplicated Data → Compact from time to time ● Compaction Operation ≃ Deduplication Operation (= Chunk Scan) + Create a new chunk to store deduplicated data + Drop old chunks ● Compaction runs in the background based on compaction policy
  • 35. 4. Quick Data Deletion → No Online Deletion → Eliminate data at Read Time & Actual Deletion during Compaction
  • 36. ● User issues a Delete ○ Nothing is deleted (Classic Technique in Analytic/Big Data System) ○ Delete Predicate is stored as a Tombstone ● At Read time (Chunk Scan) ○ The Tombstone is applied at Scan step to not return the deleted data ● During compaction ○ The newly created chunk won’t include deleted data as the result of the chunk scan Quick Data Deletion
  • 37. Quick Data Deletion → Add Tombstone at Delete Time IOx Storage: before delete location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z
  • 38. Quick Data Deletion → Add Tombstone at Delete Time IOx Storage: before delete IOx Storage: after delete from host “city = Boston” location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z Tombstone: “city = Boston” Tombstone: “city = Boston”
  • 39. Quick Data Deletion → Eliminate Data at Read Time IOx Storage: Host Chunks with Tombstones Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z Tombstone: “city = Boston” Tombstone: “city = Boston”
  • 40. Quick Data Deletion → Eliminate Data at Read Time IOx Storage: Host Chunks with Tombstones User issues: Read everything from Host → IOx applies tombstones to eliminate data → 2 rows returned state city cpu disk timestamp MA NY Andover New York 12 200 2016-06-13T17:43:50.4004002Z 2016-06-13T17:43:50.6004002Z Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z Tombstone: “city = Boston” Tombstone: “city = Boston”
  • 41. Quick Data Deletion → Eliminate Data at Read Time IOxReadFilterNode chunk_id = 1 IOxReadFilterNode chunk_id = 2 UnionExec SortPreservingMerge DeduplicateExec SortExec(optional) Sort_key: tags SortExec(optional) Sort_key: tags FilterExec city = Boston Chunk Scan without Delete Chunk Scan with Delete(*) (city = Boston) IOxReadFilterNode chunk_id = 1 IOxReadFilterNode chunk_id = 2 UnionExec SortPreservingMerge DeduplicateExec SortExec(optional) Sort_key: tags SortExec(optional) Sort_key: tags (*) Previous IOx Talk: Query Processing in InfluxDB IOx FilterExec city = Boston
  • 42. Quick Data Deletion → Compact from time to time IOx Storage: before compaction location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z Tombstone: “city = Boston” Tombstone: “city = Boston”
  • 43. Quick Data Deletion → Compact from time to time IOx Storage: before compaction location temp humidity timestamp east west midwest 82 70 70 69 57 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.4004002Z state city cpu disk timestamp MA NY Andover New York 12 200 2016-06-13T17:43:50.4004002Z 2016-06-13T17:43:50.6004002Z IOx Storage: after compaction (deduplication + delete) Weather’s Chunk Host’s Chunk location temp humidity timestamp east West east midwest 82 70 82 70 67 69 57 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.1004002Z 2016-06-13T17:43:50.5000000Z 2016-06-13T17:43:50.4004002Z Weather’s Chunk 1 Host’s Chunk 1 state city cpu timestamp MA MA Boston Andover 10 12 2016-06-13T17:43:50.2004002Z 2016-06-13T17:43:50.4004002Z Host’s Chunk 2 state city disk timestamp MA NY Boston New York 100 200 2016-06-13T17:43:50.5004002Z 2016-06-13T17:43:50.6004002Z Tombstone: “city = Boston” Tombstone: “city = Boston”
  • 44. Summary: Easy-to-Use & Fast for Users Complicated Implementation for Developers 1 Flexible Writing Schema (No DDL) Need Schema Merging at Reading Time 2 Fast Read Prune Non-Covered Data Chunks (Predicate Push-Down) 3 Able to Load Duplicated Data Need Data Deduplication & Compaction Operations 4 Quick Data Deletion No deletion right away but need data elimination at read time or in the background Simplicity is the Ultimate Sophistication But InfluxData is committed to bring Simplicity to Users