➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
The TileDB Embedded Storage Engine
1. TileDB webinars
The TileDB Embedded
Storage Engine
Founder & CEO of TileDB, Inc.
Dr. Stavros Papadopoulos
2. Who is this webinar for?
Those wanting to learn about data storage fundamentals
Layout, compression, IO, etc.
Those looking to efficiently store/access any kind of data to/from anywhere
Dataframes, genomics, LiDAR, SAR, weather, and more, with a single engine
Those tired of managing custom, inefficient data formats
Formats not supporting fast updates, indexing, versioning, cloud performance
3. Disclaimer
I am the exclusive recipient of complaints
Email me at: stavros@tiledb.com
All the credit for our amazing work goes to our powerful team
Check it out at https://tiledb.com/about
4. Deep roots at the intersection of HPC, databases and data science
Traction with telecoms, pharmas, hospitals and other scientific organizations
40 members with expertise across all applications and domains
Who we are
TileDB got spun out from MIT and Intel Labs in 2017
WHERE IT ALL STARTED
Raised over $20M, we are very well capitalized
INVESTORS
5. What is TileDB Embedded?
An embeddable C library that stores and accesses multi-dimensional arrays
Dense array Sparse array
It implements very fast array slicing across dimensions
6. Superior
performance
Built in C
Fully-parallelized
Columnar format
Multiple compressors
R-trees for sparse arrays
TileDB Embedded at a Glance
https://github.com/TileDBInc/TileDB
Open source:
Rapid updates
& data versioning
Immutable writes
Lock-free
Parallel reader / writer model
Time traveling
Schema evolution
7. TileDB Embedded at a Glance
https://github.com/TileDBInc/TileDB
Open source:
Extreme
interoperability
Numerous APIs
Numerous integrations
All backends
Optimized
for the cloud
Immutable writes
Parallel IO
Minimization of requests
8. TileDB Embedded at a Glance
APIs & tool Integrations with zero-copy where possible
TileDB Embedded
Open-source interoperable
storage with a universal
open-spec array format
● Parallel IO, rapid reads & writes
● Columnar, cloud-optimized
● Data versioning & time traveling
9. Why arrays?
The basics
Advanced internal mechanics
Examples
Work in progress
Agenda
Comparison to other formats and engines
Docs at docs.tiledb.com
10. Byte 0 1 ...
Regardless of what kind of data you have, it is laid out in a 1D storage medium
Why Arrays?
where each task may slice
Algorithm as a task graph
Regardless of what kind of algorithm you run, the algorithm involves a set of slices
11. Why Arrays?
Byte 0 1 ...
Byte 0 1 ...
Performance is absolutely dictated by the slice result locality on the 1D medium
12. Why Arrays?
Arrays provide a flexible way to map/slice any-dimensional (ND data to/from a 1D layout
Giving different “importance” to different dimensions (order and tiling)
Choosing whether dimension coordinates should be materialized or not (dense vs. sparse)
Considering compression, encryption and other filters (tiling)
Abstracting all the engineering magic that it takes to make everything very fast (engine)
Unifying the data model for all application domains! (universality)
Building indices for fast search (e.g., R-trees)
14. Arrays Are Universal
What else can be modeled as an array
LiDAR 3D sparse)
SAR 2D or 3D dense)
Population genomics (3D sparse)
Single-cell genomics (2D dense or sparse)
Biomedical imaging (2D or 3D dense)
Even flat files!!! 1D dense)
Time series (ND dense or sparse)
Weather (2D or 3D dense)
Graphs (2D sparse)
Video (3D dense)
Key-values (1D or ND sparse)
19. Array Metadata
dense_array1
├── __t2_t2_uuid2_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
├── __t2_t2_uuid2_v.ok
├── __lock.tdb
├── __meta
│ └── __t3_t3_uuid3
└── __schema
└── __t1_t2_uuid1
You can attach any number of (key, value) pairs to an array
The key must be string, and the value can be anything
metadata go here
20. Multiple Attributes
dense_array1
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── a1.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
1,a 2,b 3,c 4,d
5,e 6,f 7,g 8,h
9,i 10,j 11,k 12,l
13,m 14,n 15,o 16,p
You can store more than one values in each cell, even of different type
TileDB has a “columnar” format that allows you to efficiently subselect on attributes
attribute data
21. Var-length Attributes
dense_array3
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── a0_var.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
TileDB supports storing variable-length values in a cell (of any data type)
a bb ccc dddd
e ff ggg hhhh
i jj kkk lll
m nn ooo pppp
offsets
var-length data
22. Var-length Dimensions
sparse_array4
├── __t2_t2_uuid1_v
│ ├── __fragment_metadata.tdb
│ └── a0.tdb
│ └── d0.tdb
│ └── d0_var.tdb
├── __t2_t2_uuid1.ok
├── __lock.tdb
├── __meta
└── __schema
└── __t1_t1_uuid2
You can also have var-length dimensions and slice naturally using string ranges
Applicable only to sparse arrays
offsets
var-length data
a bb ccc dddd e ff
1 2 3 4 5 6
unbounded domain
infinite gaps
27. Tiling | Dense Arrays
fetches the whole array from storage
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
space tile
extents
fetches only a portion of the array (a tile)
A space tile is the atomic unit of IO
space tile
extents
28. Cell Layout | Dense Arrays
Three parameters define the values layout on storage, called the global order
Space tile extents
Tile order/layout (row-major or column-major)
Cell order/layout (row-major or column major)
row-major tile order
row-major cell order
22
space
tiles
col-major tile order
row-major cell order
22
space
tiles
row-major tile order
col-major cell order
42
space
tiles
29. Tiling & Cell Layout | Sparse Arrays
Sparse arrays store only non-empty cells
Grouping non-empty cells with space tiles would be inefficient (due to potential skew)
The atomic unit of IO in sparse arrays is the data tile, of fixed (user-defined) capacity
First impose a global order similar to dense arrays, then group based on capacity
col-major tile order
row-major cell order
22
space
tiles
capacity 2
space tile
extents
space tile
extents
col-major tile order
row-major cell order
22
space
tiles
capacity 4
data tile
30. Hilbert Order | Sparse Arrays
Space tiles greatly affect the cell layout in sparse arrays
Sometimes it is very difficult to define a good space tiling (especially with floats and strings)
For such cases, the Hilbert order is ideal (no tile extents and order)
For floats we discretize the domain into buckets
based on the number of dimensions
For strings we assign a number of bits per dimension
and then use the string prefixes as numbers
31. Tile Filters
TileDB allows a wide range of filters to be applied to each tile prior to its storage
Compressors (gzip, zstd, bzip2, …)
Checksums
Encryption
The atomic unit of filtering is the chunk (typically equal to the L1 cache size)
TileDB applies the filters across chunks in parallel in a pipeline
chunk
tile
zstd
AES256
33. Versioning and Time Traveling
In TileDB, every write is immutable
Each (batch) write creates a timestamped fragment
With fragments, TileDB implements
versioning and time traveling
35. 100
Write at t2
40
Versioning and Time Traveling | Sparse Arrays
1 2
3
4 5
6
Write at t1
100 2
3
40 5
6
Read at 0,t2
1 100
Read at (t1,t2
40
When no dups
allowed
1
4
6
Read at 0,t1
2
3
5
36. When dups
are allowed
4
dups
100
Write at t2
40
Versioning and Time Traveling | Sparse Arrays
1 2
3
4 5
6
Write at t1
100
Read at (t1,t2
40
1
4
6
Read at 0,t1
2
3
5
100 2
3
40 5
6
Read at 0,t2
1
37. Indexing
TileDB has a three-level indexing approach
Fragment timestamps (in the fragment names) for time traveling
Non-empty domain in each fragment’s metadata
Either simple offset arithmetic (dense) or R-trees (sparse)
1. Get list of fragment names (with .ok)
t1_t1_uuid1_v
t2_t2_uuid2_v
...
2. Ignore fragments with timestamp not in time traveling interval
3. Ignore fragments with non-empty domain not overlapping slice
__fragment_metadata.tdb
__fragment_metadata.tdb
4a. Ignore dense tiles via implicit positional indexing, or
4b. Ignore sparse tiles from the R-tree that do not overlap the slice
Algorithm
38. A slicing query would just traverse the tree
top-down, visiting only nodes/MBRs that
intersect the slice
Indexing
Given the non-empty domain, the space tile extents and the
tile order, we can find easily that this slice overlaps the
second and fourth tile
1 2 3 4
5 6 7 8
9 10 11 12
13 14 15 16
row-major tile order
22
space
tiles
MBR1
MBR2
MBR3
MBR4
col-major tile order
row-major cell order
22
space
tiles
capacity
2
R-tree
(stored in fragment metadata)
MBR1 MBR2 MBR3 MBR4
39. Consolidation & Vacuuming
Numerous fragments can lead to performance degradation (loss of locality, expensive listing)
TileDB supports two levels of consolidation
Fragment metadata (group the non-empty domains in a single place)
Fragments (better preserve data locality)
Old fragments are preserved after consolidation (for time traveling)
TileDB can vacuum old fragments to save space and boost listing
Time traveling will not work on vacuumed fragments
40. Attribute Filter Push-Down
TileDB supports pushing attribute filter conditions down to the engine
That typically boosts performance
Much fewer data gets copied around
More L1-cache conscious
More opportunities for parallelism and vectorization
41. Schema Evolution
TileDB supports schema evolution (since v2.4
Adding an attribute
Dropping an attribute
More schema evolution features are coming up
Full versioning and time traveling is supported
42. Notes on Writing
Lots of flexibility in writing in different orders, different domain subarray, etc.
Support for lock-free, parallel writing
Tips for performance:
Each tile should be 100KB 1MB
Each fragment should be 1 2GB
Fragments should not “interleave”
Run fragment metadata consolidation (especially on cloud object stores)
No support for deletions and updates yet (coming up soon)
43. Notes on Reading
TileDB is eventually consistent
Support for parallel writers, parallel readers (all lock-free)
Support for reads in different layouts
Support for “streaming reads” (incomplete queries)
Tips for performance:
Allocate sufficient space for the result buffers (minimize incomplete queries)
Tune written layout based on the read layout (application dependent)
Push down coordinate and attribute filter conditions
45. Coming Up
More schema evolution features
Support for deletes and updates
Git-like versioning
ACID via modularizing locking
More tile filters (e.g., sum, min, max)
RLE and dictionary compression on strings
Computations on compressed data
Linear Algebra operations
More SQL push down (e.g., group by)
Graph algorithms
47. High-level Comparisons
vs. HDF5
TileDB is cloud-native
TileDB has support for sparse arrays
vs. Zarr
TileDB is built in C and is more interoperable
TileDB has support for sparse arrays
TileDB has support for versioning and time traveling
TileDB has support for versioning and time traveling
48. High-level Comparisons
vs. Parquet
TileDB is multi-dimensional and supports more flexible layouts
TileDB has support for dense arrays
vs. Delta Lake
TileDB does not rely on Spark, Presto or other subsystem
TileDB has support for dense arrays
TileDB has support for versioning and time traveling
TileDB does not support deletes, updates and full ACID (yet)
TileDB is natively multi-dimensional and supports more flexible layouts