AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3, 2022)

TileDB webinars
February 3, 2022
AIS data management
& time-series analytics on
TileDB Cloud
Founder & CEO of TileDB, Inc.
Dr. Stavros Papadopoulos

Deep roots at the intersection of HPC, databases and data science
Traction with telecoms, pharmas, hospitals and other scientific organizations
45+ members with expertise across all applications and domains
Who we are
TileDB was spun out from MIT and Intel Labs in 2017
WHERE IT ALL STARTED
Raised over $20M, we are very well capitalized
INVESTORS

Data Economics
Consumption
How tools can compute
on the data, where
does the computation
happen
Distribution
Who has access to the
data, what is the means
of access, and
monetization
Production
What format does the
data get produced in
and where does it get
stored

The Problem | Data Economics is Flawed
Distribution (secure sharing) is an afterthought
Data produced in inefficient formats
All data management
solutions focus here
Consumption
How tools can
compute on the data,
where does the
computation happen

Data in some
custom format
.las
.cog
.csv
The Problem
very high TCO
Storage in some cloud
bucket or marketplace Org #N:
Download + Wrangle +
Built analytics infra
Org #1:
Download + Wrangle +
Built analytics infra
burden at data vendor
for extra services

Enter TileDB
Secure governance & collaboration
Scalable, serverless compute
Data & code sharing & monetization
Pay-as-you-go, consumer pays
Extreme interoperability
No infra hassles
Universal data
management platform
Data in a universal,
analysis-ready format
User / group #1:
any tool, any scale
User / group #N:
any tool, any scale
no wrangling

The Secret Sauce | The Data Model
Dense array
Store everything as dense or sparse multi-dimensional arrays
Sparse array

Arrays Subsume Dataframes
Sparse array
Dataframe
Dense vector

The Secret Sauce | The Data Model
What can be modeled as an array
LiDAR (3D sparse)
SAR (2D or 3D dense)
Population genomics (3D sparse)
Single-cell genomics (2D dense or sparse)
Biomedical imaging (2D or 3D dense) Even flat files!!! (1D dense)
Time series (ND dense or sparse)
Weather (2D or 3D dense)
Graphs (2D sparse)
Video (3D dense)
Key-values (1D or ND sparse)
Tables (1D dense or ND sparse)

TileDB Cloud
❏ Access control and logging
❏ Serverless SQL, UDFs, task graphs
❏ Jupyter notebooks and dashboards
Unified data management
and easy serverless compute
at global scale
How we built a Universal Database
Efficient APIs & Tool Integrations via Zero-Copy Techniques
TileDB Embedded
Open-source interoperable
storage with a universal
open-spec array format
❏ Parallel IO, rapid reads & writes
❏ Columnar, cloud-optimized
❏ Data versioning & time traveling

Superior
performance
Built in C++
Fully-parallelized
Columnar format
Multiple compressors
R-trees for sparse arrays
TileDB Embedded
https://github.com/TileDB-Inc/TileDB
Open source:
Rapid updates
& data versioning
Immutable writes
Lock-free
Parallel reader / writer model
Time traveling

TileDB Embedded
https://github.com/TileDB-Inc/TileDB
Open source:
Extreme
interoperability
Numerous APIs
Numerous integrations
All backends
Optimized
for the cloud
Immutable writes
Parallel IO
Minimization of requests

TileDB Cloud
Universal storage Universal tooling
Universal data
.las .cog .vcf .csv
Universal scale
Management. Collaboration. Scalability

TileDB Cloud
Works as SaaS: https://cloud.tiledb.com
Works on premises
Currently on AWS, soon on any cloud
Built to work anywhere
Slicing, SQL, UDFs, task graphs
It is completely serverless
On-demand JupyterHub instances
Can launch Jupyter notebooks
Compute sent to the data
It is geo-aware
Authentication, compliance, etc.
It is secure

TileDB Cloud
Full marketplace (via Stripe)
Everything is monetizable
Access control inside and outside your
organization
Make any data and code public
Discover any public data and code
(central catalog)
Everything is shareable at global scale
Jupyter notebooks
UDFs and task graphs
ML models
Everything is an array!
Dashboards (e.g., R shiny apps)
All types of data (even flat files)
Full auditability (data, code, any action)
Everything is logged

AIS capabilities on TileDB Cloud
Data is analysis-ready,
no more CSV downloads
A built-in marketplace,
no infrastructure costs
Time-series analysis,
at extreme scale
Fusion of AIS data with
other sources (e.g., SAR)
Numerous APIs and tool
integrations
Visualization with popular
tools and dashboards

The Universal Database
Thank you

Spire Maritime
Enabling the Data Advantage: Hosted Data Platform
18

Covering the Earth 24/7: Global data and analytics

The Evolution of Spire Maritime’s Data Services
The Early Years (<2013)
• AIS Messages delivered via proxy/SFTP in raw NMEA
or CSV formats
• Customer 100% responsible for data storage,
position and static message synthesization,
indexing, manipulation, etc.
2013
• Geospatial Web Services (GWS) Introduced
• Easy to query vessel-based information
• Removes complications associated with real-time
synthesization of position and static messages
• Key fields indexed to provide rapid query responses
• Data delivered in industry standard schema for
easier storage and manipulation
2021
• Hosted Data Platform Introduced (TileDB)
• Maintains all the benefits of historical GWS content but removes
the complexity and lowers the expense that customers will
experience to store and compute against the data
• Enables immediate access to interrogate Spire Maritime’s historical
data using complex queries that would typically require a fully
configured database to run
• Spire Maritime’s AIS data updated daily into TileDB platform

2
1
Hosted Data Platform Use Cases
`
Customers who
believe they are
spending too much
money on storage and
compute time based
on their Spire
Maritime data
subscription
Customers who only
want to ask
questions of the
data
• Don’t need or want
to store archive
data locally
• Focus on answering
real world
questions starting
from the moment
access to the
platform is
granted
Customers who lack
the skill set to
create the databases
needed to
interrogate the data
in a fast and
efficient way

AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3, 2022)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3, 2022)

Similar to AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3, 2022) (20)

Recently uploaded

Recently uploaded (20)

AIS data management and time series analytics on TileDB Cloud (Webinar, Feb 3, 2022)