SlideShare a Scribd company logo
TileDB webinars
The TileDB Embedded
Storage Engine
Founder & CEO of TileDB, Inc.
Dr. Stavros Papadopoulos
Who is this webinar for?
Those wanting to learn about data storage fundamentals
Layout, compression, IO, etc.
Those looking to efficiently store/access any kind of data to/from anywhere
Dataframes, genomics, LiDAR, SAR, weather, and more, with a single engine
Those tired of managing custom, inefficient data formats
Formats not supporting fast updates, indexing, versioning, cloud performance
Disclaimer
I am the exclusive recipient of complaints
Email me at: stavros@tiledb.com
All the credit for our amazing work goes to our powerful team
Check it out at https://tiledb.com/about
Deep roots at the intersection of HPC, databases and data science
Traction with telecoms, pharmas, hospitals and other scientific organizations
40 members with expertise across all applications and domains
Who we are
TileDB got spun out from MIT and Intel Labs in 2017
WHERE IT ALL STARTED
Raised over $20M, we are very well capitalized
INVESTORS
What is TileDB Embedded?
An embeddable C library that stores and accesses multi-dimensional arrays
Dense array Sparse array
It implements very fast array slicing across dimensions
Superior
performance
Built in C
Fully-parallelized
Columnar format
Multiple compressors
R-trees for sparse arrays
TileDB Embedded at a Glance
https://github.com/TileDBInc/TileDB
Open source:
Rapid updates
& data versioning
Immutable writes
Lock-free
Parallel reader / writer model
Time traveling
Schema evolution
TileDB Embedded at a Glance
https://github.com/TileDBInc/TileDB
Open source:
Extreme
interoperability
Numerous APIs
Numerous integrations
All backends
Optimized
for the cloud
Immutable writes
Parallel IO
Minimization of requests
TileDB Embedded at a Glance
APIs & tool Integrations with zero-copy where possible
TileDB Embedded
Open-source interoperable
storage with a universal
open-spec array format
● Parallel IO, rapid reads & writes
● Columnar, cloud-optimized
● Data versioning & time traveling
Why arrays?
The basics
Advanced internal mechanics
Examples
Work in progress
Agenda
Comparison to other formats and engines
Docs at docs.tiledb.com
Byte 0 1 ...
Regardless of what kind of data you have, it is laid out in a 1D storage medium
Why Arrays?
where each task may slice
Algorithm as a task graph
Regardless of what kind of algorithm you run, the algorithm involves a set of slices
Why Arrays?
Byte 0 1 ...
Byte 0 1 ...
Performance is absolutely dictated by the slice result locality on the 1D medium
Why Arrays?
Arrays provide a flexible way to map/slice any-dimensional (ND data to/from a 1D layout
Giving different “importance” to different dimensions (order and tiling)
Choosing whether dimension coordinates should be materialized or not (dense vs. sparse)
Considering compression, encryption and other filters (tiling)
Abstracting all the engineering magic that it takes to make everything very fast (engine)
Unifying the data model for all application domains! (universality)
Building indices for fast search (e.g., R-trees)
Arrays Subsume Dataframes
Sparse array
Dataframe
Dense vector
Arrays Are Universal
What else can be modeled as an array
LiDAR 3D sparse)
SAR 2D or 3D dense)
Population genomics (3D sparse)
Single-cell genomics (2D dense or sparse)
Biomedical imaging (2D or 3D dense)
Even flat files!!! 1D dense)
Time series (ND dense or sparse)
Weather (2D or 3D dense)
Graphs (2D sparse)
Video (3D dense)
Key-values (1D or ND sparse)
The Basics
dense_array1
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
A Simple 2D Dense Array
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
fragment
schema
attribute data
A Simple 2D Sparse Array
sparse_array1
├── __t2_t2_uuid2_v
│ ├── __fragment_metadata.tdb
│ ├── a0.tdb
│ ├── d0.tdb
│ └── d1.tdb
├── __t2_t2_uuid2_v.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t2_uuid1
1 2
3
4 5
6
fragment
schema
attribute data
coordinates
Groups
dense_group
├── __tiledb_group.tdb
└── nested_group
├── __tiledb_group.tdb
└── dense_array1
├── __lock.tdb
├── __meta
└── __schema
Groups provide an easy way to hierarchically organize arrays
Array Metadata
dense_array1
├── __t2_t2_uuid2_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
├── __t2_t2_uuid2_v.ok
├── __lock.tdb
├── __meta
│ └── __t3_t3_uuid3
└── __schema
└── __t1_t2_uuid1
You can attach any number of (key, value) pairs to an array
The key must be string, and the value can be anything
metadata go here
Multiple Attributes
dense_array1
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── a1.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
1,a 2,b 3,c 4,d
5,e 6,f 7,g 8,h
9,i 10,j 11,k 12,l
13,m 14,n 15,o 16,p
You can store more than one values in each cell, even of different type
TileDB has a “columnar” format that allows you to efficiently subselect on attributes
attribute data
Var-length Attributes
dense_array3
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── a0_var.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
TileDB supports storing variable-length values in a cell (of any data type)
a bb ccc dddd
e ff ggg hhhh
i jj kkk lll
m nn ooo pppp
offsets
var-length data
Var-length Dimensions
sparse_array4
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── d0.tdb
│ └── d0_var.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
You can also have var-length dimensions and slice naturally using string ranges
Applicable only to sparse arrays
offsets
var-length data
a bb ccc dddd e ff
1 2 3 4 5 6
unbounded domain
infinite gaps
Heterogeneous Dimensions
4
1.0
0.0
“dddd”
0.4
infinite string
dimension
infinite float32
dimension
Sparse array allow you to have dimensions of different types
The following 2D array allows efficient slicing on a string and a float32 dimension
Arrays as Dataframes
An array is essentially a dataframe
where dimensions are special (they are “indexed”)
What About Cloud Object Stores?
array_name → {s3,azure,gcs,tiledb}://path/array_name
Everything
demonstrated works
as is on the cloud
Tiling & Layout
Tiling | Dense Arrays
fetches the whole array from storage
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
space tile
extents
fetches only a portion of the array (a tile)
A space tile is the atomic unit of IO
space tile
extents
Cell Layout | Dense Arrays
Three parameters define the values layout on storage, called the global order
Space tile extents
Tile order/layout (row-major or column-major)
Cell order/layout (row-major or column major)
row-major tile order
row-major cell order
22
space
tiles
col-major tile order
row-major cell order
22
space
tiles
row-major tile order
col-major cell order
42
space
tiles
Tiling & Cell Layout | Sparse Arrays
Sparse arrays store only non-empty cells
Grouping non-empty cells with space tiles would be inefficient (due to potential skew)
The atomic unit of IO in sparse arrays is the data tile, of fixed (user-defined) capacity
First impose a global order similar to dense arrays, then group based on capacity
col-major tile order
row-major cell order
22
space
tiles
capacity 2
space tile
extents
space tile
extents
col-major tile order
row-major cell order
22
space
tiles
capacity 4
data tile
Hilbert Order | Sparse Arrays
Space tiles greatly affect the cell layout in sparse arrays
Sometimes it is very difficult to define a good space tiling (especially with floats and strings)
For such cases, the Hilbert order is ideal (no tile extents and order)
For floats we discretize the domain into buckets
based on the number of dimensions
For strings we assign a number of bits per dimension
and then use the string prefixes as numbers
Tile Filters
TileDB allows a wide range of filters to be applied to each tile prior to its storage
Compressors (gzip, zstd, bzip2, …)
Checksums
Encryption
The atomic unit of filtering is the chunk (typically equal to the L1 cache size)
TileDB applies the filters across chunks in parallel in a pipeline
chunk
tile
zstd
AES256
Advanced
Internal Mechanics
Versioning and Time Traveling
In TileDB, every write is immutable
Each (batch) write creates a timestamped fragment
With fragments, TileDB implements
versioning and time traveling
Versioning and Time Traveling | Dense Arrays
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Read at 0,t1
100 3 4
7 8
9 10 11 12
13 14 15 16
Read at 0,t2
200
500 600
100 - -
- -
- - - -
- - - -
Read at (t1,t2
200
500 600
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
Write at t1
100 200
500 600
Write at t2
100
Write at t2
40
Versioning and Time Traveling | Sparse Arrays
1 2
3
4 5
6
Write at t1
100 2
3
40 5
6
Read at 0,t2
1 100
Read at (t1,t2
40
When no dups
allowed
1
4
6
Read at 0,t1
2
3
5
When dups
are allowed
4
dups
100
Write at t2
40
Versioning and Time Traveling | Sparse Arrays
1 2
3
4 5
6
Write at t1
100
Read at (t1,t2
40
1
4
6
Read at 0,t1
2
3
5
100 2
3
40 5
6
Read at 0,t2
1
Indexing
TileDB has a three-level indexing approach
Fragment timestamps (in the fragment names) for time traveling
Non-empty domain in each fragment’s metadata
Either simple offset arithmetic (dense) or R-trees (sparse)
1. Get list of fragment names (with .ok)
t1_t1_uuid1_v
t2_t2_uuid2_v
...
2. Ignore fragments with timestamp not in time traveling interval
3. Ignore fragments with non-empty domain not overlapping slice
__fragment_metadata.tdb
__fragment_metadata.tdb
4a. Ignore dense tiles via implicit positional indexing, or
4b. Ignore sparse tiles from the R-tree that do not overlap the slice
Algorithm
A slicing query would just traverse the tree
top-down, visiting only nodes/MBRs that
intersect the slice
Indexing
Given the non-empty domain, the space tile extents and the
tile order, we can find easily that this slice overlaps the
second and fourth tile
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
row-major tile order
22
space
tiles
MBR1
MBR2
MBR3
MBR4
col-major tile order
row-major cell order
22
space
tiles
capacity
2
R-tree
(stored in fragment metadata)
MBR1 MBR2 MBR3 MBR4
Consolidation & Vacuuming
Numerous fragments can lead to performance degradation (loss of locality, expensive listing)
TileDB supports two levels of consolidation
Fragment metadata (group the non-empty domains in a single place)
Fragments (better preserve data locality)
Old fragments are preserved after consolidation (for time traveling)
TileDB can vacuum old fragments to save space and boost listing
Time traveling will not work on vacuumed fragments
Attribute Filter Push-Down
TileDB supports pushing attribute filter conditions down to the engine
That typically boosts performance
Much fewer data gets copied around
More L1-cache conscious
More opportunities for parallelism and vectorization
Schema Evolution
TileDB supports schema evolution (since v2.4
Adding an attribute
Dropping an attribute
More schema evolution features are coming up
Full versioning and time traveling is supported
Notes on Writing
Lots of flexibility in writing in different orders, different domain subarray, etc.
Support for lock-free, parallel writing
Tips for performance:
Each tile should be 100KB  1MB
Each fragment should be 1  2GB
Fragments should not “interleave”
Run fragment metadata consolidation (especially on cloud object stores)
No support for deletions and updates yet (coming up soon)
Notes on Reading
TileDB is eventually consistent
Support for parallel writers, parallel readers (all lock-free)
Support for reads in different layouts
Support for “streaming reads” (incomplete queries)
Tips for performance:
Allocate sufficient space for the result buffers (minimize incomplete queries)
Tune written layout based on the read layout (application dependent)
Push down coordinate and attribute filter conditions
Work In Progress
Coming Up
More schema evolution features
Support for deletes and updates
Git-like versioning
ACID via modularizing locking
More tile filters (e.g., sum, min, max)
RLE and dictionary compression on strings
Computations on compressed data
Linear Algebra operations
More SQL push down (e.g., group by)
Graph algorithms
TileDB vs. Others
High-level Comparisons
vs. HDF5
TileDB is cloud-native
TileDB has support for sparse arrays
vs. Zarr
TileDB is built in C and is more interoperable
TileDB has support for sparse arrays
TileDB has support for versioning and time traveling
TileDB has support for versioning and time traveling
High-level Comparisons
vs. Parquet
TileDB is multi-dimensional and supports more flexible layouts
TileDB has support for dense arrays
vs. Delta Lake
TileDB does not rely on Spark, Presto or other subsystem
TileDB has support for dense arrays
TileDB has support for versioning and time traveling
TileDB does not support deletes, updates and full ACID (yet)
TileDB is natively multi-dimensional and supports more flexible layouts
The Universal Database
Thank you
WE ARE HIRING
Apply at tiledb.workable.com

More Related Content

What's hot

Chapter 12 ds
Chapter 12 dsChapter 12 ds
Chapter 12 ds
Hanif Durad
 
AVL Tree
AVL TreeAVL Tree
AVL Tree
Awais Ahmad
 
Operating system 23 process synchronization
Operating system 23 process synchronizationOperating system 23 process synchronization
Operating system 23 process synchronization
Vaibhav Khanna
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
mentoresd
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
Dr Sandeep Kumar Poonia
 
Randomized Algorithm
Randomized AlgorithmRandomized Algorithm
Randomized Algorithm
Kanishka Khandelwal
 
Mass Storage Structure
Mass Storage StructureMass Storage Structure
Mass Storage Structure
Vimalanathan D
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
Dr. C.V. Suresh Babu
 
Greedy algorithms -Making change-Knapsack-Prim's-Kruskal's
Greedy algorithms -Making change-Knapsack-Prim's-Kruskal'sGreedy algorithms -Making change-Knapsack-Prim's-Kruskal's
Greedy algorithms -Making change-Knapsack-Prim's-Kruskal's
Jay Patel
 
Fake-Coin Problem
Fake-Coin ProblemFake-Coin Problem
Fake-Coin Problem
Gem WeBlog
 
Centralita Resol Deltasol BS Pro
Centralita Resol Deltasol BS ProCentralita Resol Deltasol BS Pro
Centralita Resol Deltasol BS Pro
Gogely The Great
 
Fpga Device Selection
Fpga Device SelectionFpga Device Selection
Fpga Device Selection
Vikram Singh
 
Greedy Algorihm
Greedy AlgorihmGreedy Algorihm
Greedy Algorihm
Muhammad Amjad Rana
 
Presentation on flynn’s classification
Presentation on flynn’s classificationPresentation on flynn’s classification
Presentation on flynn’s classification
vani gupta
 
Monkey & banana problem in AI
Monkey & banana problem in AIMonkey & banana problem in AI
Monkey & banana problem in AI
Manjeet Kamboj
 
Teltonika tbox20
Teltonika tbox20Teltonika tbox20
Teltonika tbox20
curiosius
 
Daa:Dynamic Programing
Daa:Dynamic ProgramingDaa:Dynamic Programing
Daa:Dynamic Programingrupali_2bonde
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
Sanjay Santhakumar
 
Theory of Automata and formal languages Unit 5
Theory of Automata and formal languages Unit 5Theory of Automata and formal languages Unit 5
Theory of Automata and formal languages Unit 5
Abhimanyu Mishra
 

What's hot (20)

Chapter 12 ds
Chapter 12 dsChapter 12 ds
Chapter 12 ds
 
AVL Tree
AVL TreeAVL Tree
AVL Tree
 
Operating system 23 process synchronization
Operating system 23 process synchronizationOperating system 23 process synchronization
Operating system 23 process synchronization
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
 
Distributed Operating System_1
Distributed Operating System_1Distributed Operating System_1
Distributed Operating System_1
 
Randomized Algorithm
Randomized AlgorithmRandomized Algorithm
Randomized Algorithm
 
Mass Storage Structure
Mass Storage StructureMass Storage Structure
Mass Storage Structure
 
Randomized algorithms ver 1.0
Randomized algorithms ver 1.0Randomized algorithms ver 1.0
Randomized algorithms ver 1.0
 
Greedy algorithms -Making change-Knapsack-Prim's-Kruskal's
Greedy algorithms -Making change-Knapsack-Prim's-Kruskal'sGreedy algorithms -Making change-Knapsack-Prim's-Kruskal's
Greedy algorithms -Making change-Knapsack-Prim's-Kruskal's
 
Fake-Coin Problem
Fake-Coin ProblemFake-Coin Problem
Fake-Coin Problem
 
Centralita Resol Deltasol BS Pro
Centralita Resol Deltasol BS ProCentralita Resol Deltasol BS Pro
Centralita Resol Deltasol BS Pro
 
Fpga Device Selection
Fpga Device SelectionFpga Device Selection
Fpga Device Selection
 
Greedy Algorihm
Greedy AlgorihmGreedy Algorihm
Greedy Algorihm
 
Presentation on flynn’s classification
Presentation on flynn’s classificationPresentation on flynn’s classification
Presentation on flynn’s classification
 
Monkey & banana problem in AI
Monkey & banana problem in AIMonkey & banana problem in AI
Monkey & banana problem in AI
 
Teltonika tbox20
Teltonika tbox20Teltonika tbox20
Teltonika tbox20
 
Daa:Dynamic Programing
Daa:Dynamic ProgramingDaa:Dynamic Programing
Daa:Dynamic Programing
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Fuzzy logic
Fuzzy logicFuzzy logic
Fuzzy logic
 
Theory of Automata and formal languages Unit 5
Theory of Automata and formal languages Unit 5Theory of Automata and formal languages Unit 5
Theory of Automata and formal languages Unit 5
 

Similar to The TileDB Embedded Storage Engine

Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
Sri Ambati
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Question
pukal rani
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
Shy Engelberg
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
Lviv Startup Club
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Altinity Ltd
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
User biglm
User biglmUser biglm
User biglm
johnatan pladott
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
Arno Huetter
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
Amin Chowdhury
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
Andrey Lomakin
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
Mark Smith
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
rainynovember12
 
Vertica
VerticaVertica
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptxStructured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Edu4Sure
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databaseslovingprince58
 
Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008paulguerin
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
Sql and mysql database concepts
Sql and mysql database conceptsSql and mysql database concepts
Sql and mysql database conceptsSelamawit Feleke
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
J'tong Atong
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
Richard Claassens CIPPE
 

Similar to The TileDB Embedded Storage Engine (20)

Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
Sql Server Interview Question
Sql Server Interview QuestionSql Server Interview Question
Sql Server Interview Question
 
SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)SQL Server In-Memory OLTP introduction (Hekaton)
SQL Server In-Memory OLTP introduction (Hekaton)
 
Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"Maryna Popova "Deep dive AWS Redshift"
Maryna Popova "Deep dive AWS Redshift"
 
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
Analytics at Speed: Introduction to ClickHouse and Common Use Cases. By Mikha...
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
User biglm
User biglmUser biglm
User biglm
 
Database Performance Tuning
Database Performance Tuning Database Performance Tuning
Database Performance Tuning
 
Database Sizing
Database SizingDatabase Sizing
Database Sizing
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 
Sql Basics And Advanced
Sql Basics And AdvancedSql Basics And Advanced
Sql Basics And Advanced
 
Vertica
VerticaVertica
Vertica
 
Structured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptxStructured Query Language (SQL) _ Edu4Sure Training.pptx
Structured Query Language (SQL) _ Edu4Sure Training.pptx
 
Design Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databasesDesign Patterns For Distributed NO-reational databases
Design Patterns For Distributed NO-reational databases
 
Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008Myth busters - performance tuning 103 2008
Myth busters - performance tuning 103 2008
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
 
Sql and mysql database concepts
Sql and mysql database conceptsSql and mysql database concepts
Sql and mysql database concepts
 
220 runtime environments
220 runtime environments220 runtime environments
220 runtime environments
 
Clustering van IT-componenten
Clustering van IT-componentenClustering van IT-componenten
Clustering van IT-componenten
 

More from Stavros Papadopoulos

Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
Stavros Papadopoulos
 
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
Stavros Papadopoulos
 
The New Data Economics
The New Data EconomicsThe New Data Economics
The New Data Economics
Stavros Papadopoulos
 
Population genomics is a data management problem
Population genomics is a data management problemPopulation genomics is a data management problem
Population genomics is a data management problem
Stavros Papadopoulos
 
TileDB Cloud Webinar (09/30/2021)
TileDB Cloud Webinar (09/30/2021)TileDB Cloud Webinar (09/30/2021)
TileDB Cloud Webinar (09/30/2021)
Stavros Papadopoulos
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Stavros Papadopoulos
 

More from Stavros Papadopoulos (6)

Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
Analyzing LiDAR and SAR data with Capella Space and TileDB (TileDB webinars, ...
 
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3...
 
The New Data Economics
The New Data EconomicsThe New Data Economics
The New Data Economics
 
Population genomics is a data management problem
Population genomics is a data management problemPopulation genomics is a data management problem
Population genomics is a data management problem
 
TileDB Cloud Webinar (09/30/2021)
TileDB Cloud Webinar (09/30/2021)TileDB Cloud Webinar (09/30/2021)
TileDB Cloud Webinar (09/30/2021)
 
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal DatabaseDebunking "Purpose-Built Data Systems:": Enter the Universal Database
Debunking "Purpose-Built Data Systems:": Enter the Universal Database
 

Recently uploaded

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 

Recently uploaded (20)

Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 

The TileDB Embedded Storage Engine

  • 1. TileDB webinars The TileDB Embedded Storage Engine Founder & CEO of TileDB, Inc. Dr. Stavros Papadopoulos
  • 2. Who is this webinar for? Those wanting to learn about data storage fundamentals Layout, compression, IO, etc. Those looking to efficiently store/access any kind of data to/from anywhere Dataframes, genomics, LiDAR, SAR, weather, and more, with a single engine Those tired of managing custom, inefficient data formats Formats not supporting fast updates, indexing, versioning, cloud performance
  • 3. Disclaimer I am the exclusive recipient of complaints Email me at: stavros@tiledb.com All the credit for our amazing work goes to our powerful team Check it out at https://tiledb.com/about
  • 4. Deep roots at the intersection of HPC, databases and data science Traction with telecoms, pharmas, hospitals and other scientific organizations 40 members with expertise across all applications and domains Who we are TileDB got spun out from MIT and Intel Labs in 2017 WHERE IT ALL STARTED Raised over $20M, we are very well capitalized INVESTORS
  • 5. What is TileDB Embedded? An embeddable C library that stores and accesses multi-dimensional arrays Dense array Sparse array It implements very fast array slicing across dimensions
  • 6. Superior performance Built in C Fully-parallelized Columnar format Multiple compressors R-trees for sparse arrays TileDB Embedded at a Glance https://github.com/TileDBInc/TileDB Open source: Rapid updates & data versioning Immutable writes Lock-free Parallel reader / writer model Time traveling Schema evolution
  • 7. TileDB Embedded at a Glance https://github.com/TileDBInc/TileDB Open source: Extreme interoperability Numerous APIs Numerous integrations All backends Optimized for the cloud Immutable writes Parallel IO Minimization of requests
  • 8. TileDB Embedded at a Glance APIs & tool Integrations with zero-copy where possible TileDB Embedded Open-source interoperable storage with a universal open-spec array format ● Parallel IO, rapid reads & writes ● Columnar, cloud-optimized ● Data versioning & time traveling
  • 9. Why arrays? The basics Advanced internal mechanics Examples Work in progress Agenda Comparison to other formats and engines Docs at docs.tiledb.com
  • 10. Byte 0 1 ... Regardless of what kind of data you have, it is laid out in a 1D storage medium Why Arrays? where each task may slice Algorithm as a task graph Regardless of what kind of algorithm you run, the algorithm involves a set of slices
  • 11. Why Arrays? Byte 0 1 ... Byte 0 1 ... Performance is absolutely dictated by the slice result locality on the 1D medium
  • 12. Why Arrays? Arrays provide a flexible way to map/slice any-dimensional (ND data to/from a 1D layout Giving different “importance” to different dimensions (order and tiling) Choosing whether dimension coordinates should be materialized or not (dense vs. sparse) Considering compression, encryption and other filters (tiling) Abstracting all the engineering magic that it takes to make everything very fast (engine) Unifying the data model for all application domains! (universality) Building indices for fast search (e.g., R-trees)
  • 13. Arrays Subsume Dataframes Sparse array Dataframe Dense vector
  • 14. Arrays Are Universal What else can be modeled as an array LiDAR 3D sparse) SAR 2D or 3D dense) Population genomics (3D sparse) Single-cell genomics (2D dense or sparse) Biomedical imaging (2D or 3D dense) Even flat files!!! 1D dense) Time series (ND dense or sparse) Weather (2D or 3D dense) Graphs (2D sparse) Video (3D dense) Key-values (1D or ND sparse)
  • 16. dense_array1 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 A Simple 2D Dense Array 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 fragment schema attribute data
  • 17. A Simple 2D Sparse Array sparse_array1 ├── __t2_t2_uuid2_v │ ├── __fragment_metadata.tdb │ ├── a0.tdb │ ├── d0.tdb │ └── d1.tdb ├── __t2_t2_uuid2_v.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t2_uuid1 1 2 3 4 5 6 fragment schema attribute data coordinates
  • 18. Groups dense_group ├── __tiledb_group.tdb └── nested_group ├── __tiledb_group.tdb └── dense_array1 ├── __lock.tdb ├── __meta └── __schema Groups provide an easy way to hierarchically organize arrays
  • 19. Array Metadata dense_array1 ├── __t2_t2_uuid2_v │ ├── __fragment_metadata.tdb │ └── a0.tdb ├── __t2_t2_uuid2_v.ok ├── __lock.tdb ├── __meta │ └── __t3_t3_uuid3 └── __schema └── __t1_t2_uuid1 You can attach any number of (key, value) pairs to an array The key must be string, and the value can be anything metadata go here
  • 20. Multiple Attributes dense_array1 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb │ └── a1.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 1,a 2,b 3,c 4,d 5,e 6,f 7,g 8,h 9,i 10,j 11,k 12,l 13,m 14,n 15,o 16,p You can store more than one values in each cell, even of different type TileDB has a “columnar” format that allows you to efficiently subselect on attributes attribute data
  • 21. Var-length Attributes dense_array3 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb │ └── a0_var.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 TileDB supports storing variable-length values in a cell (of any data type) a bb ccc dddd e ff ggg hhhh i jj kkk lll m nn ooo pppp offsets var-length data
  • 22. Var-length Dimensions sparse_array4 ├── __t2_t2_uuid1_v │ ├── __fragment_metadata.tdb │ └── a0.tdb │ └── d0.tdb │ └── d0_var.tdb ├── __t2_t2_uuid1.ok ├── __lock.tdb ├── __meta └── __schema └── __t1_t1_uuid2 You can also have var-length dimensions and slice naturally using string ranges Applicable only to sparse arrays offsets var-length data a bb ccc dddd e ff 1 2 3 4 5 6 unbounded domain infinite gaps
  • 23. Heterogeneous Dimensions 4 1.0 0.0 “dddd” 0.4 infinite string dimension infinite float32 dimension Sparse array allow you to have dimensions of different types The following 2D array allows efficient slicing on a string and a float32 dimension
  • 24. Arrays as Dataframes An array is essentially a dataframe where dimensions are special (they are “indexed”)
  • 25. What About Cloud Object Stores? array_name → {s3,azure,gcs,tiledb}://path/array_name Everything demonstrated works as is on the cloud
  • 27. Tiling | Dense Arrays fetches the whole array from storage 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 space tile extents fetches only a portion of the array (a tile) A space tile is the atomic unit of IO space tile extents
  • 28. Cell Layout | Dense Arrays Three parameters define the values layout on storage, called the global order Space tile extents Tile order/layout (row-major or column-major) Cell order/layout (row-major or column major) row-major tile order row-major cell order 22 space tiles col-major tile order row-major cell order 22 space tiles row-major tile order col-major cell order 42 space tiles
  • 29. Tiling & Cell Layout | Sparse Arrays Sparse arrays store only non-empty cells Grouping non-empty cells with space tiles would be inefficient (due to potential skew) The atomic unit of IO in sparse arrays is the data tile, of fixed (user-defined) capacity First impose a global order similar to dense arrays, then group based on capacity col-major tile order row-major cell order 22 space tiles capacity 2 space tile extents space tile extents col-major tile order row-major cell order 22 space tiles capacity 4 data tile
  • 30. Hilbert Order | Sparse Arrays Space tiles greatly affect the cell layout in sparse arrays Sometimes it is very difficult to define a good space tiling (especially with floats and strings) For such cases, the Hilbert order is ideal (no tile extents and order) For floats we discretize the domain into buckets based on the number of dimensions For strings we assign a number of bits per dimension and then use the string prefixes as numbers
  • 31. Tile Filters TileDB allows a wide range of filters to be applied to each tile prior to its storage Compressors (gzip, zstd, bzip2, …) Checksums Encryption The atomic unit of filtering is the chunk (typically equal to the L1 cache size) TileDB applies the filters across chunks in parallel in a pipeline chunk tile zstd AES256
  • 33. Versioning and Time Traveling In TileDB, every write is immutable Each (batch) write creates a timestamped fragment With fragments, TileDB implements versioning and time traveling
  • 34. Versioning and Time Traveling | Dense Arrays 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Read at 0,t1 100 3 4 7 8 9 10 11 12 13 14 15 16 Read at 0,t2 200 500 600 100 - - - - - - - - - - - - Read at (t1,t2 200 500 600 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Write at t1 100 200 500 600 Write at t2
  • 35. 100 Write at t2 40 Versioning and Time Traveling | Sparse Arrays 1 2 3 4 5 6 Write at t1 100 2 3 40 5 6 Read at 0,t2 1 100 Read at (t1,t2 40 When no dups allowed 1 4 6 Read at 0,t1 2 3 5
  • 36. When dups are allowed 4 dups 100 Write at t2 40 Versioning and Time Traveling | Sparse Arrays 1 2 3 4 5 6 Write at t1 100 Read at (t1,t2 40 1 4 6 Read at 0,t1 2 3 5 100 2 3 40 5 6 Read at 0,t2 1
  • 37. Indexing TileDB has a three-level indexing approach Fragment timestamps (in the fragment names) for time traveling Non-empty domain in each fragment’s metadata Either simple offset arithmetic (dense) or R-trees (sparse) 1. Get list of fragment names (with .ok) t1_t1_uuid1_v t2_t2_uuid2_v ... 2. Ignore fragments with timestamp not in time traveling interval 3. Ignore fragments with non-empty domain not overlapping slice __fragment_metadata.tdb __fragment_metadata.tdb 4a. Ignore dense tiles via implicit positional indexing, or 4b. Ignore sparse tiles from the R-tree that do not overlap the slice Algorithm
  • 38. A slicing query would just traverse the tree top-down, visiting only nodes/MBRs that intersect the slice Indexing Given the non-empty domain, the space tile extents and the tile order, we can find easily that this slice overlaps the second and fourth tile 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 row-major tile order 22 space tiles MBR1 MBR2 MBR3 MBR4 col-major tile order row-major cell order 22 space tiles capacity 2 R-tree (stored in fragment metadata) MBR1 MBR2 MBR3 MBR4
  • 39. Consolidation & Vacuuming Numerous fragments can lead to performance degradation (loss of locality, expensive listing) TileDB supports two levels of consolidation Fragment metadata (group the non-empty domains in a single place) Fragments (better preserve data locality) Old fragments are preserved after consolidation (for time traveling) TileDB can vacuum old fragments to save space and boost listing Time traveling will not work on vacuumed fragments
  • 40. Attribute Filter Push-Down TileDB supports pushing attribute filter conditions down to the engine That typically boosts performance Much fewer data gets copied around More L1-cache conscious More opportunities for parallelism and vectorization
  • 41. Schema Evolution TileDB supports schema evolution (since v2.4 Adding an attribute Dropping an attribute More schema evolution features are coming up Full versioning and time traveling is supported
  • 42. Notes on Writing Lots of flexibility in writing in different orders, different domain subarray, etc. Support for lock-free, parallel writing Tips for performance: Each tile should be 100KB  1MB Each fragment should be 1  2GB Fragments should not “interleave” Run fragment metadata consolidation (especially on cloud object stores) No support for deletions and updates yet (coming up soon)
  • 43. Notes on Reading TileDB is eventually consistent Support for parallel writers, parallel readers (all lock-free) Support for reads in different layouts Support for “streaming reads” (incomplete queries) Tips for performance: Allocate sufficient space for the result buffers (minimize incomplete queries) Tune written layout based on the read layout (application dependent) Push down coordinate and attribute filter conditions
  • 45. Coming Up More schema evolution features Support for deletes and updates Git-like versioning ACID via modularizing locking More tile filters (e.g., sum, min, max) RLE and dictionary compression on strings Computations on compressed data Linear Algebra operations More SQL push down (e.g., group by) Graph algorithms
  • 47. High-level Comparisons vs. HDF5 TileDB is cloud-native TileDB has support for sparse arrays vs. Zarr TileDB is built in C and is more interoperable TileDB has support for sparse arrays TileDB has support for versioning and time traveling TileDB has support for versioning and time traveling
  • 48. High-level Comparisons vs. Parquet TileDB is multi-dimensional and supports more flexible layouts TileDB has support for dense arrays vs. Delta Lake TileDB does not rely on Spark, Presto or other subsystem TileDB has support for dense arrays TileDB has support for versioning and time traveling TileDB does not support deletes, updates and full ACID (yet) TileDB is natively multi-dimensional and supports more flexible layouts
  • 49. The Universal Database Thank you WE ARE HIRING Apply at tiledb.workable.com