Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

Share

Powering Interactive BI Analytics with Presto and Delta Lake

Download to read offline

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources.

Powering Interactive BI Analytics with Presto and Delta Lake

  1. 1. Powering Interactive BI Analytics with Presto and Delta Lake Kamil Bajda-Pawlikowski Co-founder/CTO @ Starburst
  2. 2. Agenda ▪ Presto & Starburst ▪ Delta Lake Integration ▪ Data Platform Architecture ▪ Use Cases
  3. 3. Presto & Starburst
  4. 4. What is Presto? High performance MPP SQL engine •Interactive ANSI SQL queries •Proven scalability •High concurrency Separation of compute & storage •Scale storage & compute independently •SQL-on-anything •Federated queries Community-driven open source project Deploy Anywhere •Kubernetes •Cloud •On premises
  5. 5. Presto Users Facebook: 10,000+ of nodes, 1000s of users Uber 2,000+ nodes, 160K+ queries daily LinkedIn: 500+ nodes, 200K+ queries daily Lyft: 400+ nodes, 100K+ queries daily
  6. 6. Starburst Enterprise Presto Performance Connectivity Security Management 30+ supported enterprise connectors High performance parallel connectors for Oracle, Teradata, Snowflake and more Support From petabytes to exabytes – query data from disparate sources using SQL – with high concurrency Control your price/performance with the latest cost-based optimizer Caching available for frequently accessed data Kerberos & LDAP integration Global Security for fine- grained Access Control Data encryption Data masking Query auditing Configuration Autoscaling High availability Monitoring Deploy anywhere The largest team of Presto experts in the world Fully-tested, stable releases, curated by the Presto creators Hot fixes & security patches 24x7 support, 365 – we’ve got your back
  7. 7. Data Lake Integration
  8. 8. Why are we excited about Delta? ▪ ACID properties over data lake ▪ Open source table format ▪ Stored as Parquet files ▪ Object storage support ▪ Schema evolution ▪ Time travel feature ▪ Metadata & statistics ▪ Data skipping & z-ordering
  9. 9. Native Presto Delta Lake Reader Supports data skipping Optimizes query using file statistics Supports reading the Delta transaction log Native connector written from scratch
  10. 10. Native Delta Lake Reader Performance ▪ 2x average speedup across 22 queries ▪ 6x best query speedup ▪ “What we have here is game changing for our industry. Especially now that the native Delta reader works as fast as it does. We have people lining up to now use this data” ▪ We have queries that were running in 10 minutes that are now running in 47 seconds" Feedback from customers:Standard TPC-H benchmark: Try now: https://docs.starburstdata.com/latest/connector/starburst-delta-lake.html
  11. 11. Data Platform Architecture
  12. 12. Starburst Platform Data Scientists Data AnalystsFinance Marketers The Data Consumption Layer Existing analytics tools Data Masking Global Security Column + Row- level permissions Query Auditing Fine-grained access control Data Encryption Data Lakes Relational Databases NoSQL Stores Publish/Subscribe Azure Event Hub
  13. 13. Different Technologies In Your Toolbelt 14 ETL SQL Streaming Ingestion Machine Learning Delta Lake Management High Concurrency SQL BI Reporting/Analytics Federated Queries Your Storage
  14. 14. Data Ingestion and Analytics Ecosystem
  15. 15. Deployment Architecture
  16. 16. Use Cases
  17. 17. Starburst & Delta Lake – Use Case Using a combination of Databricks and Starburst Presto to bring a full data ingestion and analytical environment to life
  18. 18. Starburst & Delta Lake – Use Case ● Real-time ingestion of event data into Delta tables ● Customer and inventory data ingested every hour ● Modified customer information merged into Delta Lake table ● Data marts created using streaming and batch data
  19. 19. Starburst & Delta Lake – Use Case ● Single point of access to numerous data sources ● Query Delta Lake and federate with legacy databases as well as many NoSQL data stores ● Enforce table, column and row level policies to ensure maximum data security ● Mask column data for different groups and users
  20. 20. Starburst & Delta Lake – Use Case BI Reporting Tools SQL Query Tools DEMO TIME! • Connect using a variety of BI and SQL tools including Looker, Tableau, Power BI and DBeaver • JDBC, ODBC and many libraries including Python, R and Java
  21. 21. Thank You! Try Presto with Delta: www.starburstdata.com/delta-lake- reader
  22. 22. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.
  • kbajda

    Jun. 30, 2020

Presto, an open source distributed SQL engine, is widely recognized for its low-latency queries, high concurrency, and native ability to query multiple data sources.

Views

Total views

372

On Slideshare

0

From embeds

0

Number of embeds

0

Actions

Downloads

35

Shares

0

Comments

0

Likes

1

×