Journey and evolution of Presto@Grab

Journey and Evolution of Presto @Grab

Rahul Penti
2017 - Graduated from IIT Kharagpur
2017 - Lead SDE(Data)@Juspay
2018 - Data Engineer@Grab
About the Speaker

Redshift
● 8 * ds2.8xlarge
● Workload Management
● Peak hour performance
● SQL Optimisation
● Access Control

● 125k+ Tables
● Over 6 million partitions
● 500+ Unique Users

Presto
● Enabled Multi Cluster Architecture
● In-Memory Parallel Processing SQL Query Engine
● ANSI SQL
● Open Source
● Workload Management

Current Scenario
● 40 Presto Clusters
● Master Node : r5a
● Worker Node: r4.8xlarge
● V0.193
● Serving Adhoc SQL queries and ETL jobs
● 200k+ Queries

Query Source:
● Python/ R
● IDE: JDBC
● Tableau: ODBC Driver
● Holistics
● Internal Platforms

Datagateway
● Single entry point for all queries
● Authentication and Authorisation service
● Grant access to schemas, tables and clusters
● Integrated with API endpoints of Presto

Issues on EMR
● Configuration Changes
● Cluster Administration

Qubole Migration
● Spot Nodes Usage
● Auto Scaling
● Easier Cluster Administration

Benchmarking
Two types of tests:
● Functional Tests
● Performance Tests
Within Performance tests, we simulated the cluster workloads to check the performance
improvements.

Resource Groups
● Initially used the queue configs to apply concurrency
limit based on the query source
● Currently enforce concurrency with resource groups
along with separate resource allocation for exploratory
queries

Task Writers
● Number of concurrent writer threads per query per
worker
● Default: 1
● Current: 8

Client Timeout
● Duration of post which query execution will time out if
not polled
● The query will be regarded as Abandoned
● Default: 2 mins
● Current: 10mins (30min)

Query History
● Query history retained by the presto UI for lookback
● Default: 100
● Current: 900 (90k)

Event Listener
● Supports custom plugins which are invoked on
○ Query Creation
○ Query Completion
○ Split completion
● Enables us to log every query submitted to cluster along
with important metrics which are used to fine tune
config further

Presto Test Suite
● Internal platform used to analyse the query
performance
● Enables us to quantify the impact of various config
changes
● Simulate the cluster workload

Query Analysis
● Helps us identify table usage pattern
● Track table lineages, join patterns
● Recreate the query plan to understand the query
execution

● Combine clusters but ensure Workload isolation
● Support custom configurations

Journey and evolution of Presto@Grab

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Journey and evolution of Presto@Grab

Similar to Journey and evolution of Presto@Grab (20)

More from Shubham Tagra

More from Shubham Tagra (7)

Recently uploaded

Recently uploaded (20)

Journey and evolution of Presto@Grab