Powering Interactive BI Analytics with Presto and Delta Lake

Powering Interactive BI Analytics
with Presto and Delta Lake
Kamil Bajda-Pawlikowski
Co-founder/CTO @ Starburst

Agenda
▪ Presto & Starburst
▪ Delta Lake Integration
▪ Data Platform Architecture
▪ Use Cases

What is Presto?
High performance MPP SQL
engine
•Interactive ANSI SQL queries
•Proven scalability
•High concurrency
Separation of compute & storage
•Scale storage & compute independently
•SQL-on-anything
•Federated queries
Community-driven open
source project
Deploy Anywhere
•Kubernetes
•Cloud
•On premises

Presto Users
Facebook: 10,000+ of nodes, 1000s of users
Uber 2,000+ nodes, 160K+ queries daily
LinkedIn: 500+ nodes, 200K+ queries daily
Lyft: 400+ nodes, 100K+ queries daily

Starburst Enterprise Presto
Performance Connectivity Security Management
30+ supported enterprise
connectors
High performance parallel
connectors for Oracle,
Teradata, Snowflake and
more
Support
From petabytes to exabytes
– query data from disparate
sources using SQL – with
high concurrency
Control your
price/performance with the
latest cost-based optimizer
Caching available for
frequently accessed data
Kerberos & LDAP
integration
Global Security for fine-
grained Access Control
Data encryption
Data masking
Query auditing
Configuration
Autoscaling
High availability
Monitoring
Deploy anywhere
The largest team of Presto
experts in the world
Fully-tested, stable
releases, curated by the
Presto creators
Hot fixes & security
patches
24x7 support, 365 – we’ve
got your back

Why are we excited about Delta?
▪ ACID properties over data lake
▪ Open source table format
▪ Stored as Parquet files
▪ Object storage support
▪ Schema evolution
▪ Time travel feature
▪ Metadata & statistics
▪ Data skipping & z-ordering

Native Presto Delta Lake Reader
Supports data skipping
Optimizes query using file statistics
Supports reading the Delta transaction
log
Native connector written from scratch

Native Delta Lake Reader Performance
▪ 2x average speedup across 22 queries
▪ 6x best query speedup
▪ “What we have here is game changing for our
industry. Especially now that the native Delta
reader works as fast as it does. We have people
lining up to now use this data”
▪ We have queries that were running in 10 minutes
that are now running in 47 seconds"
Feedback from customers:Standard TPC-H benchmark:
Try now: https://docs.starburstdata.com/latest/connector/starburst-delta-lake.html

Starburst Platform
Data Scientists Data AnalystsFinance Marketers
The Data Consumption Layer
Existing analytics tools
Data Masking Global Security Column + Row-
level permissions
Query Auditing Fine-grained
access control
Data Encryption
Data Lakes Relational Databases NoSQL Stores Publish/Subscribe
Azure Event Hub

Different Technologies In Your Toolbelt
14
ETL
SQL
Streaming Ingestion
Machine Learning
Delta Lake
Management
High Concurrency SQL
BI Reporting/Analytics
Federated Queries
Your Storage

Data Ingestion and Analytics Ecosystem

Starburst & Delta Lake – Use Case
Using a combination of Databricks and Starburst
Presto to bring a full data ingestion and analytical
environment to life

● Real-time ingestion of event data
into Delta tables
● Customer and inventory data
ingested every hour
● Modified customer information
merged into Delta Lake table
● Data marts created using streaming
and batch data

● Single point of access to numerous
data sources
● Query Delta Lake and federate with
legacy databases as well as many
NoSQL data stores
● Enforce table, column and row level
policies to ensure maximum data
security
● Mask column data for different
groups and users

Starburst & Delta Lake – Use Case BI Reporting Tools
SQL Query Tools
DEMO TIME!
• Connect using a variety of BI and SQL
tools including Looker, Tableau, Power
BI and DBeaver
• JDBC, ODBC and many libraries
including Python, R and Java

Thank You!
Try Presto with Delta:
www.starburstdata.com/delta-lake-
reader

Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

Powering Interactive BI Analytics with Presto and Delta Lake

Powering Interactive BI Analytics with Presto and Delta Lake

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Powering Interactive BI Analytics with Presto and Delta Lake

Similar to Powering Interactive BI Analytics with Presto and Delta Lake (20)

More from Databricks

More from Databricks (20)

Recently uploaded

Recently uploaded (20)

Powering Interactive BI Analytics with Presto and Delta Lake