Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto SQL at Wayfair meetup presentation


Published on

Presto SQL at Wayfair meetup in Boston

Presto has become the essential tool for Data Scientists and Analysts at Wayfair. Presto is relatively new at Wayfair. It was implemented about a year ago and has been actively used in the past 6 months. Attend this session to understand why Wayfair decided to implement presto, how we architected our cluster, configuration choices we made, some common issues we faced and where we are heading towards next. By breaking down our problems and approach with Presto, it will help describe some of the challenges we face, and provide color to the decisions we’ve made.

Speaker Bio:
Krishna Ravishankar is a DevOps Engineer working on the BigData and Messaging Platforms at Wayfair. He joined the team about 2 years ago with a little knowledge of how distributed systems work, now Krishna is one of the leads for some of the platforms (Presto, Kafka) the team own. Krishna was invloved in bringing Presto into Wayfair, he designed and implemented a production presto cluster which is now used by hundereds of users running ad-hoc queries. He’s currently now involved in building a new read/write cluster using Starburst distro.

Published in: Data & Analytics
  • Login to see the comments

Presto SQL at Wayfair meetup presentation

  1. 1. Presto at Wayfair Krishna Ravishankar
  2. 2. 2 • Problem Statement • Why use Presto ? • Presto at Wayfair • Presto Clients • Presto adoption • Moving towards • Monitoring • Q/A Agenda
  3. 3. 3 Problem Statement
  4. 4. 4 1. Optimize Hive queries 1. Setting up queues to prioritize batch jobs 1. Throttle users to 2 ad-hoc queries 1. Move jobs from Hive to Spark 1. Conduct SME training session for both Hive and Spark Remedies
  5. 5. 5 Why Presto ? ● It’s VERY fast! ● It saves hadoop resources: by using presto, you enable more development work to be done as other teams test their pyspark pipelines on the cluster ● Unlike spark which requires more expertise and set up, presto is quick to set up. ● You can combine data sources in different places (SQL and hive data in one place)
  6. 6. 6 Presto at Wayfair
  7. 7. 7 Presto ad-hoc (Read Only Cluster) 301 VM’s (8*64) with 1 Coordinator, 300 Workers Total available Memory 20TB Total CPU available 2800 vcores Presto CLI Presto at Wayfair
  8. 8. 8 Adoption - before 140K Queries 80K Queries 40%
  9. 9. 9 Adoption - after
  10. 10. 10 Query Throttling ● SELECT only ● 2 queries per user ● 2 queued queries per user ● Increased the time limit from 5 to 10 mins avg execution time - 51 sec
  11. 11. 11 Moving towards Starburst
  12. 12. 12 OSS Presto Vs Starburst Starburst Presto open source Note: 1. CBO turned manually on OSS presto 2. Starburst has CBO turned on by default 3. CBO improved query performance by 3-10X
  13. 13. 13 Monitoring Presto
  14. 14. 14 1. Migrating jobs 1. Upgrading our existing presto cluster to use Starburst distribution 1. BigQuery Vs Presto What’s Next?
  15. 15. 15 THANK YOU Questions? Krishna Ravishankar DevOps Engineer