Presto SQL at Wayfair meetup in Boston
Presto has become the essential tool for Data Scientists and Analysts at Wayfair. Presto is relatively new at Wayfair. It was implemented about a year ago and has been actively used in the past 6 months. Attend this session to understand why Wayfair decided to implement presto, how we architected our cluster, configuration choices we made, some common issues we faced and where we are heading towards next. By breaking down our problems and approach with Presto, it will help describe some of the challenges we face, and provide color to the decisions we’ve made.
Krishna Ravishankar is a DevOps Engineer working on the BigData and Messaging Platforms at Wayfair. He joined the team about 2 years ago with a little knowledge of how distributed systems work, now Krishna is one of the leads for some of the platforms (Presto, Kafka) the team own. Krishna was invloved in bringing Presto into Wayfair, he designed and implemented a production presto cluster which is now used by hundereds of users running ad-hoc queries. He’s currently now involved in building a new read/write cluster using Starburst distro.
1. Optimize Hive queries
1. Setting up queues to prioritize batch jobs
1. Throttle users to 2 ad-hoc queries
1. Move jobs from Hive to Spark
1. Conduct SME training session for both Hive and
Why Presto ?
● It’s VERY fast!
● It saves hadoop resources: by using presto, you enable more development work to
be done as other teams test their pyspark pipelines on the cluster
● Unlike spark which requires more expertise and set up, presto is quick to set up.
● You can combine data sources in different places (SQL and hive data in one place)