Past, Present and Future
of Presto on Cloud
07/15/2018
00Copyright 2017 © Qubole
Agenda
Past
• Presto Adoption
Present
• Presto at Qubole
• Key Use Cases
Future
• Area of Focus
• OS collaborations
Past | Looking Back at Presto Adoption
3Copyright 2018 © Qubole
YoY Growth
6x growth in Compute hours
~ 2 Million compute hours per month
4Copyright 2018 © Qubole
Presto surging in Growth
255%
growth in Users
365%
growth in Commands
101%
growth in Throughput
YoY % growth (January 2017 to 2018)
Current State | Presto at Qubole
00Copyright 2017 © Qubole
QDS Big Data Activation Platform
TCO
Interfaces
Intelligence
Auto Scaling
Spot Node
Alerts
Notebook
Analyze
REST API
ODBC/JDBC
BI Tools /
Clients
Insights
Recommendations
Connectors
(Cross data sources
query)
MySQL
SQL Server
Oracle
Redshift
Kinesis
Presto on Qubole
00Copyright 2017 © Qubole
Ad hoc Analytics
BI Dashboard, Reporting
Batch Workloads
Exploratory Analytic
Expected
Response Time
Typical Data Volume
High
HighLow
Key Use Cases
Area of Focus
00Copyright 2017 © Qubole
Area of Focus - Past, Current and Future Work
• Cluster Management and TCO
• Performance
• Security
12Copyright 2018 © Qubole
Completed Work
● Self Start
● Auto Terminate
● Workload Aware Auto Scaling
● Spot Node Integration
Cluster Automation | Cloud Management and TCO
00Copyright 2017 © Qubole
Autoscaling Savings | Cloud Management and TCO
Auto Terminate
Savings, 48%
Autoscaling
Savings, 39%
Spot Node
Savings, 13%
00Copyright 2017 © Qubole
In 2017, 54% of all Amazon EC2 compute hours used were spot instances, resulting in an
estimated $230 million in savings of Amazon EC2 costs.
Spot instances per cluster for
Presto in 2017
Spot Node On-Demand Nodes
29 %
4.1X
Increase !!
Current Work
● Spot Node Loss:
Retries of queries on Spot Node Loss.
● WorkFlow Manager:
Predictive Load Managing across clusters using
Cost Model to compute resource usage.
Future Work
● Predictive AutoScaling using Cost Model
Cluster Automation | Cloud Management and TCO
00Copyright 2017 © Qubole
Memory Cost Model | Performance
Completed Work
● Memory Cost Model
Cost model is within a factor of 2
of actual usage in the worst case
of memory for non-skewed data.
Evaluation of Cost model on TPC-DS benchmark (scale 10000)
00Copyright 2017 © Qubole
Dynamic Filtering and Join Reorder | Performance
Completed Work
● Dynamic Filtering and Join Reorder Evaluation of Dynamic Filtering and Join Reordering on TPC-DS benchmark
(scale 3000)
3.2X reduction in Geomean
Up to 14X performance improvement observed
00Copyright 2017 © Qubole
Rubix | Performance
Completed Work
● Rubix - Cache Engine
Open sourced for Presto and Spark
00Copyright 2017 © Qubole
Current and Future Work | Performance
Current Work
● Fast Copy – Auto Framework for Materialized Views
● Join Distribution
Future Work
● Histograms for improving Cost Model
● CPU Efficiency
00Copyright 2017 © Qubole
Security
Completed Work
● HiPPA compliant
● Internode SSL, Dual IAM Role, VPC, Qubole ACLs
● Hive Authorization
Future Work
● Ranger support
Compliance
HIPPA
GDPR ready
Infrastructure
Dual IAM Roles
Qubole ACLs
VPC Support
Internode SSL
Physical
Data Access
S3 Authentication
Logical Data
Access
Hive Authorization
Ranger Support
00Copyright 2017 © Qubole
OS Collaborations
● Presto Lens – A tool for admins to help tune Presto
● CBO – Improve Cost Model
● Cloud Specific
● S3 Optimizations like S3 Select
● Performance benchmarks for Cloud
● Integration of Product tests with S3
● Workload Management
● Failure Recovery
Questions ?
Contact me: amoghm@qubole.com
00Copyright 2017 © Qubole
Helpful Links
- Engineering Blog
https://www.qubole.com/blog/tag/presto/
- AutoScaling
https://qubole-eng.quora.com/Industry's-First-Auto-Scaling-Presto-Clusters
- Rubix
https://github.com/qubole/rubix
- Dynamic Filtering/Join Reordering
https://www.qubole.com/blog/sql-join-optimizations-qubole-presto/
- Memory Cost-Model
https://www.qubole.com/blog/memory-cost-model-qubole-presto/

Presto Summit 2018 - 10 - Qubole

  • 1.
    Past, Present andFuture of Presto on Cloud 07/15/2018
  • 2.
    00Copyright 2017 ©Qubole Agenda Past • Presto Adoption Present • Presto at Qubole • Key Use Cases Future • Area of Focus • OS collaborations
  • 3.
    Past | LookingBack at Presto Adoption
  • 4.
    3Copyright 2018 ©Qubole YoY Growth 6x growth in Compute hours ~ 2 Million compute hours per month
  • 5.
    4Copyright 2018 ©Qubole Presto surging in Growth 255% growth in Users 365% growth in Commands 101% growth in Throughput YoY % growth (January 2017 to 2018)
  • 6.
    Current State |Presto at Qubole
  • 7.
    00Copyright 2017 ©Qubole QDS Big Data Activation Platform TCO Interfaces Intelligence Auto Scaling Spot Node Alerts Notebook Analyze REST API ODBC/JDBC BI Tools / Clients Insights Recommendations Connectors (Cross data sources query) MySQL SQL Server Oracle Redshift Kinesis Presto on Qubole
  • 8.
    00Copyright 2017 ©Qubole Ad hoc Analytics BI Dashboard, Reporting Batch Workloads Exploratory Analytic Expected Response Time Typical Data Volume High HighLow Key Use Cases
  • 9.
  • 10.
    00Copyright 2017 ©Qubole Area of Focus - Past, Current and Future Work • Cluster Management and TCO • Performance • Security
  • 11.
    12Copyright 2018 ©Qubole Completed Work ● Self Start ● Auto Terminate ● Workload Aware Auto Scaling ● Spot Node Integration Cluster Automation | Cloud Management and TCO
  • 12.
    00Copyright 2017 ©Qubole Autoscaling Savings | Cloud Management and TCO Auto Terminate Savings, 48% Autoscaling Savings, 39% Spot Node Savings, 13%
  • 13.
    00Copyright 2017 ©Qubole In 2017, 54% of all Amazon EC2 compute hours used were spot instances, resulting in an estimated $230 million in savings of Amazon EC2 costs. Spot instances per cluster for Presto in 2017 Spot Node On-Demand Nodes 29 % 4.1X Increase !! Current Work ● Spot Node Loss: Retries of queries on Spot Node Loss. ● WorkFlow Manager: Predictive Load Managing across clusters using Cost Model to compute resource usage. Future Work ● Predictive AutoScaling using Cost Model Cluster Automation | Cloud Management and TCO
  • 14.
    00Copyright 2017 ©Qubole Memory Cost Model | Performance Completed Work ● Memory Cost Model Cost model is within a factor of 2 of actual usage in the worst case of memory for non-skewed data. Evaluation of Cost model on TPC-DS benchmark (scale 10000)
  • 15.
    00Copyright 2017 ©Qubole Dynamic Filtering and Join Reorder | Performance Completed Work ● Dynamic Filtering and Join Reorder Evaluation of Dynamic Filtering and Join Reordering on TPC-DS benchmark (scale 3000) 3.2X reduction in Geomean Up to 14X performance improvement observed
  • 16.
    00Copyright 2017 ©Qubole Rubix | Performance Completed Work ● Rubix - Cache Engine Open sourced for Presto and Spark
  • 17.
    00Copyright 2017 ©Qubole Current and Future Work | Performance Current Work ● Fast Copy – Auto Framework for Materialized Views ● Join Distribution Future Work ● Histograms for improving Cost Model ● CPU Efficiency
  • 18.
    00Copyright 2017 ©Qubole Security Completed Work ● HiPPA compliant ● Internode SSL, Dual IAM Role, VPC, Qubole ACLs ● Hive Authorization Future Work ● Ranger support Compliance HIPPA GDPR ready Infrastructure Dual IAM Roles Qubole ACLs VPC Support Internode SSL Physical Data Access S3 Authentication Logical Data Access Hive Authorization Ranger Support
  • 19.
    00Copyright 2017 ©Qubole OS Collaborations ● Presto Lens – A tool for admins to help tune Presto ● CBO – Improve Cost Model ● Cloud Specific ● S3 Optimizations like S3 Select ● Performance benchmarks for Cloud ● Integration of Product tests with S3 ● Workload Management ● Failure Recovery
  • 20.
    Questions ? Contact me:amoghm@qubole.com
  • 21.
    00Copyright 2017 ©Qubole Helpful Links - Engineering Blog https://www.qubole.com/blog/tag/presto/ - AutoScaling https://qubole-eng.quora.com/Industry's-First-Auto-Scaling-Presto-Clusters - Rubix https://github.com/qubole/rubix - Dynamic Filtering/Join Reordering https://www.qubole.com/blog/sql-join-optimizations-qubole-presto/ - Memory Cost-Model https://www.qubole.com/blog/memory-cost-model-qubole-presto/