Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores

Presto: Fast SQL-on-Anything
across Data Lakes, DBMS, and NoSQL data stores
Kamil Bajda-Pawlikowski
Co-founder and CTO Data Orchestration Summit 2020

What is Presto?
2
Community-driven open
source project
High performance MPP SQL engine
• Interactive ANSI SQL queries
• Proven scalability
• High concurrency
Deploy Anywhere
• Kubernetes
• Cloud (AWS, Azure, GCP)
• On premises
Separation of compute & storage
• Scale storage & compute independently
• SQL-on-anything
• Federated queries

About Starburst
3
Enterprise Grade
Security
On-Prem, or
Cloud
Rapid Time to
Insights
Low Cost of
Ownership
24x7 Expert
Support
ANSI SQL MPP
Query Engine
High
Concurrency
Our Platform
Named Open Source
Startup to Watch 2020
600% Growth YoY
100+
Enterprise Customers
NPS Score
80+
Massive
Scale

Starburst Customers
Tech
Retail Media & Telco
Finance & Insurance
Healthcare & Pharma Other
4

Why Delta Lake?
▪ ACID properties over data lake
▪ Open source table format
▪ Stored as Parquet files
▪ Object storage support
▪ Schema evolution
▪ Time travel feature
▪ Metadata & statistics
▪ Data skipping & z-ordering

Native Presto Delta Lake Reader
Supports data skipping & dynamic filtering
Optimizes query using file statistics
Supports reading the Delta transaction log
Native connector written from scratch

Query-time Data Federation
● Single point of access to numerous
data sources
● Query Delta Lake and federate with
legacy databases as well as many
NoSQL data stores
● Enforce table, column and row level
policies to ensure maximum data
security
● Mask column data for different groups
and users

Data Consumption & Analytics BI Reporting Tools
SQL Query Tools
• Connect using a variety of BI and SQL
tools including Looker, Tableau, Power
BI and DBeaver
• JDBC, ODBC and many libraries
including Python, R and Java
SELECT id, COUNT(*), SUM(active_seconds)
FROM delta.iot.events e
JOIN snowflake.sales.customer c ON (e.customer_id = c.id)
WHERE e.event_date >= current_date
AND c.region = 'US'
AND c.id IN
(SELECT l.customer_id
FROM elastic.web.logs l
WHERE l.visit_date >= date '2020-01-01')
GROUP BY id;

Thank You
10
Try Presto: www.starburstdata.com

Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores

More Related Content

What's hot

Similar to Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores

More from Alluxio, Inc.

Recently uploaded

Presto: Fast SQL-on-Anything Across Data Lakes, DBMS, and NoSQL Data Stores