Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto Summit 2018 - 10 - Qubole

602 views

Published on

Past, Present & Future of Presto on the Cloud (Amogh Margoor, Qubole)
Presto Summit 2018 (https://www.starburstdata.com/technical-blog/presto-summit-2018-recap/)

Published in: Data & Analytics
  • Be the first to comment

Presto Summit 2018 - 10 - Qubole

  1. 1. Past, Present and Future of Presto on Cloud 07/15/2018
  2. 2. 00Copyright 2017 © Qubole Agenda Past • Presto Adoption Present • Presto at Qubole • Key Use Cases Future • Area of Focus • OS collaborations
  3. 3. Past | Looking Back at Presto Adoption
  4. 4. 3Copyright 2018 © Qubole YoY Growth 6x growth in Compute hours ~ 2 Million compute hours per month
  5. 5. 4Copyright 2018 © Qubole Presto surging in Growth 255% growth in Users 365% growth in Commands 101% growth in Throughput YoY % growth (January 2017 to 2018)
  6. 6. Current State | Presto at Qubole
  7. 7. 00Copyright 2017 © Qubole QDS Big Data Activation Platform TCO Interfaces Intelligence Auto Scaling Spot Node Alerts Notebook Analyze REST API ODBC/JDBC BI Tools / Clients Insights Recommendations Connectors (Cross data sources query) MySQL SQL Server Oracle Redshift Kinesis Presto on Qubole
  8. 8. 00Copyright 2017 © Qubole Ad hoc Analytics BI Dashboard, Reporting Batch Workloads Exploratory Analytic Expected Response Time Typical Data Volume High HighLow Key Use Cases
  9. 9. Area of Focus
  10. 10. 00Copyright 2017 © Qubole Area of Focus - Past, Current and Future Work • Cluster Management and TCO • Performance • Security
  11. 11. 12Copyright 2018 © Qubole Completed Work ● Self Start ● Auto Terminate ● Workload Aware Auto Scaling ● Spot Node Integration Cluster Automation | Cloud Management and TCO
  12. 12. 00Copyright 2017 © Qubole Autoscaling Savings | Cloud Management and TCO Auto Terminate Savings, 48% Autoscaling Savings, 39% Spot Node Savings, 13%
  13. 13. 00Copyright 2017 © Qubole In 2017, 54% of all Amazon EC2 compute hours used were spot instances, resulting in an estimated $230 million in savings of Amazon EC2 costs. Spot instances per cluster for Presto in 2017 Spot Node On-Demand Nodes 29 % 4.1X Increase !! Current Work ● Spot Node Loss: Retries of queries on Spot Node Loss. ● WorkFlow Manager: Predictive Load Managing across clusters using Cost Model to compute resource usage. Future Work ● Predictive AutoScaling using Cost Model Cluster Automation | Cloud Management and TCO
  14. 14. 00Copyright 2017 © Qubole Memory Cost Model | Performance Completed Work ● Memory Cost Model Cost model is within a factor of 2 of actual usage in the worst case of memory for non-skewed data. Evaluation of Cost model on TPC-DS benchmark (scale 10000)
  15. 15. 00Copyright 2017 © Qubole Dynamic Filtering and Join Reorder | Performance Completed Work ● Dynamic Filtering and Join Reorder Evaluation of Dynamic Filtering and Join Reordering on TPC-DS benchmark (scale 3000) 3.2X reduction in Geomean Up to 14X performance improvement observed
  16. 16. 00Copyright 2017 © Qubole Rubix | Performance Completed Work ● Rubix - Cache Engine Open sourced for Presto and Spark
  17. 17. 00Copyright 2017 © Qubole Current and Future Work | Performance Current Work ● Fast Copy – Auto Framework for Materialized Views ● Join Distribution Future Work ● Histograms for improving Cost Model ● CPU Efficiency
  18. 18. 00Copyright 2017 © Qubole Security Completed Work ● HiPPA compliant ● Internode SSL, Dual IAM Role, VPC, Qubole ACLs ● Hive Authorization Future Work ● Ranger support Compliance HIPPA GDPR ready Infrastructure Dual IAM Roles Qubole ACLs VPC Support Internode SSL Physical Data Access S3 Authentication Logical Data Access Hive Authorization Ranger Support
  19. 19. 00Copyright 2017 © Qubole OS Collaborations ● Presto Lens – A tool for admins to help tune Presto ● CBO – Improve Cost Model ● Cloud Specific ● S3 Optimizations like S3 Select ● Performance benchmarks for Cloud ● Integration of Product tests with S3 ● Workload Management ● Failure Recovery
  20. 20. Questions ? Contact me: amoghm@qubole.com
  21. 21. 00Copyright 2017 © Qubole Helpful Links - Engineering Blog https://www.qubole.com/blog/tag/presto/ - AutoScaling https://qubole-eng.quora.com/Industry's-First-Auto-Scaling-Presto-Clusters - Rubix https://github.com/qubole/rubix - Dynamic Filtering/Join Reordering https://www.qubole.com/blog/sql-join-optimizations-qubole-presto/ - Memory Cost-Model https://www.qubole.com/blog/memory-cost-model-qubole-presto/

×