Presto @ Uber
Edit or delete footer text in Master ipsandella doloreium dem isciame ndaestia nessed
quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent.
Uber’s Mission
Transportation as reliable as running water,
everywhere, for everyone
400+ Cities 69 Countries
And growing ...
Agenda
● Data Platform @ Uber
● SQL on Hadoop
● Presto
● Parquet
● Roadmap
Data@Uber in 2015
Kafka
Schemaless
MySQL, Postgres
Data Producers
Hadoop Distributed File
System (HDFS)
Commercial
Database
Commercial
Database
Commercial
Database
ETL Jobs
ETL Jobs
ETL Jobs
ETL Jobs
Data Consumers
Load
Load
Ad Hoc Queries
Reports
Machine
Learning Jobs
Load
Statistics
● Petabyte Scale Hadoop Cluster
● ~10 TB Data Ingested to Hadoop daily
● ~500 raw datasets
● Hundreds of nodes Hadoop Cluster
Pain Points
● Data not Queryable until available in Commercial DB
● Single Commercial DB cluster limited to ~32 nodes
● Data in Commercial DB <<< Data in HDFS
● Hive on the PB scale Hadoop Warehouse is **SLOW**
SQL On Hadoop
Hadoop Distributed File System
(HDFS)
Batch Jobs Interactive Queries
Presto
Janus
Hive
Applications
Kafka
Schemaless
MySQL, Postgres
Solution
● Data not Queryable until available in Commercial DB
○ Run SQL directly on Hadoop
● Single Commercial DB cluster limited to ~32 nodes
○ SQL on Hadoop scales to thousands of machines
● Data in Commercial DB <<< Data in HDFS
○ HDFS holds all the data
● Hive on the PB scale Hadoop Warehouse is **SLOW**
○ Try Presto
What is Presto
Distributed SQL engine for Hadoop
Fast
Scalable
ANSI SQL
Open Source
Extensible
Background
● Facebook Internal users would
like to run SQL on Hadoop
● Hive in production 2008
● Need a fast SQL engine
● 2013 Facebook Presto in
production
● 2014 Netflix Presto in production
● 2016 Uber Presto in production
● Presto + Hive = SQL on Hadoop
Fighter Aircraft F22 and F35
How Presto Works
Worker
Partial Aggregation
Table Scan
Parquet
File System
Worker
Partial Aggregation
Table Scan
Parquet
File System
Coordinator
Parser
Optimizer
Fragmenter
Scheduler
Worker
Final Aggregation
Client
● Data in memory during execution
● Pipelining and streaming
● Columnar storage & execution
● Bytecode generation
○ Inline virtual function calls
○ Inline constants
○ Rewrite inner loops
○ Rewrite type specific branches
Why Presto is Fast
● CPU Management
○ priority queues
○ short running queries higher priority
● Memory Management
○ query max memory per node
○ Query fails on hitting memory limit, Presto process
continue running
● Concurrency Management
○ Queue: per user max concurrent running queries
How Presto Manages Resources
● No Fault Tolerance
○ Applications have to retry if query fails
● Joins do not fit in memory
○ Join fails
○ Presto Worker process continues serving other queries
○ Run it on Hive
● Coordinator is a single point of failure
Limitations
Deployment
● ~ 200 node Presto cluster
● ~ 30K queries per day
● Serving ad hoc SQL queries
● Serving real time applications
Commercial
Database
Presto SparkSQL Hive
Performance Fast Fast Not as fast as Presto Not Fast
Open Source No Yes Yes Yes
Warehouse Size 100s of TB PB Scale PB Scale PB Scale
SQL Support ANSI SQL ANSI SQL HiveQL HiveQL
Nested Schema No Yes Yes Yes
User Defined
Functions
Has its own UDFs,
third party GeoSpatial
functions available
Has its own builtin functions.
GeoSpatial functions
implemented
Support UDFs, third party
GeoSpatial functions available
Support UDFs, third
party GeoSpatial
functions available
Memory Limit query rejected if
requests larger than
memory cap
Cannot handle huge joins if
hash bucket hits memory
cap
Spill to disk for big join Spill to disk for big join
In Summary
Parquet
Parquet Improvement
Predicate Pushdown
Stats [ min: 5, max: 8 ]
Skip this Row Group
Dictionary Pushdown
Stats [ min 5, max: 20]
Dictionary Page [ 5, 9, 12,
17, 20]
Skip this Row Group
Query: Select A, B from T where C = 10;
Parquet Improvement
Lazy Reads
Read C first
No need to read A and B at
all if no matching C
Columnar Reads
Build Presto blocks for each
column
Not reading row by row
Query: Select A, B from T where C = 10;
Roadmap
● Schema Evolution
● Geo Spatial SQL support
● Parquet Performance Improvements:
○ Nested Column Pruning
○ Predicate Pushdown & Dictionary Pushdown
○ Lazy Reads & Columnar Reads
Thanks

Presto@Uber

  • 1.
  • 2.
    Edit or deletefooter text in Master ipsandella doloreium dem isciame ndaestia nessed quibus aut hiligenet ut ea debisci eturiate poresti vid min core, vercidigent. Uber’s Mission Transportation as reliable as running water, everywhere, for everyone 400+ Cities 69 Countries And growing ...
  • 3.
    Agenda ● Data Platform@ Uber ● SQL on Hadoop ● Presto ● Parquet ● Roadmap
  • 4.
    Data@Uber in 2015 Kafka Schemaless MySQL,Postgres Data Producers Hadoop Distributed File System (HDFS) Commercial Database Commercial Database Commercial Database ETL Jobs ETL Jobs ETL Jobs ETL Jobs Data Consumers Load Load Ad Hoc Queries Reports Machine Learning Jobs Load
  • 5.
    Statistics ● Petabyte ScaleHadoop Cluster ● ~10 TB Data Ingested to Hadoop daily ● ~500 raw datasets ● Hundreds of nodes Hadoop Cluster
  • 6.
    Pain Points ● Datanot Queryable until available in Commercial DB ● Single Commercial DB cluster limited to ~32 nodes ● Data in Commercial DB <<< Data in HDFS ● Hive on the PB scale Hadoop Warehouse is **SLOW**
  • 7.
    SQL On Hadoop HadoopDistributed File System (HDFS) Batch Jobs Interactive Queries Presto Janus Hive Applications Kafka Schemaless MySQL, Postgres
  • 8.
    Solution ● Data notQueryable until available in Commercial DB ○ Run SQL directly on Hadoop ● Single Commercial DB cluster limited to ~32 nodes ○ SQL on Hadoop scales to thousands of machines ● Data in Commercial DB <<< Data in HDFS ○ HDFS holds all the data ● Hive on the PB scale Hadoop Warehouse is **SLOW** ○ Try Presto
  • 9.
    What is Presto DistributedSQL engine for Hadoop Fast Scalable ANSI SQL Open Source Extensible
  • 10.
    Background ● Facebook Internalusers would like to run SQL on Hadoop ● Hive in production 2008 ● Need a fast SQL engine ● 2013 Facebook Presto in production ● 2014 Netflix Presto in production ● 2016 Uber Presto in production ● Presto + Hive = SQL on Hadoop Fighter Aircraft F22 and F35
  • 11.
    How Presto Works Worker PartialAggregation Table Scan Parquet File System Worker Partial Aggregation Table Scan Parquet File System Coordinator Parser Optimizer Fragmenter Scheduler Worker Final Aggregation Client
  • 12.
    ● Data inmemory during execution ● Pipelining and streaming ● Columnar storage & execution ● Bytecode generation ○ Inline virtual function calls ○ Inline constants ○ Rewrite inner loops ○ Rewrite type specific branches Why Presto is Fast
  • 13.
    ● CPU Management ○priority queues ○ short running queries higher priority ● Memory Management ○ query max memory per node ○ Query fails on hitting memory limit, Presto process continue running ● Concurrency Management ○ Queue: per user max concurrent running queries How Presto Manages Resources
  • 14.
    ● No FaultTolerance ○ Applications have to retry if query fails ● Joins do not fit in memory ○ Join fails ○ Presto Worker process continues serving other queries ○ Run it on Hive ● Coordinator is a single point of failure Limitations
  • 15.
    Deployment ● ~ 200node Presto cluster ● ~ 30K queries per day ● Serving ad hoc SQL queries ● Serving real time applications
  • 16.
    Commercial Database Presto SparkSQL Hive PerformanceFast Fast Not as fast as Presto Not Fast Open Source No Yes Yes Yes Warehouse Size 100s of TB PB Scale PB Scale PB Scale SQL Support ANSI SQL ANSI SQL HiveQL HiveQL Nested Schema No Yes Yes Yes User Defined Functions Has its own UDFs, third party GeoSpatial functions available Has its own builtin functions. GeoSpatial functions implemented Support UDFs, third party GeoSpatial functions available Support UDFs, third party GeoSpatial functions available Memory Limit query rejected if requests larger than memory cap Cannot handle huge joins if hash bucket hits memory cap Spill to disk for big join Spill to disk for big join In Summary
  • 17.
  • 18.
    Parquet Improvement Predicate Pushdown Stats[ min: 5, max: 8 ] Skip this Row Group Dictionary Pushdown Stats [ min 5, max: 20] Dictionary Page [ 5, 9, 12, 17, 20] Skip this Row Group Query: Select A, B from T where C = 10;
  • 19.
    Parquet Improvement Lazy Reads ReadC first No need to read A and B at all if no matching C Columnar Reads Build Presto blocks for each column Not reading row by row Query: Select A, B from T where C = 10;
  • 20.
    Roadmap ● Schema Evolution ●Geo Spatial SQL support ● Parquet Performance Improvements: ○ Nested Column Pruning ○ Predicate Pushdown & Dictionary Pushdown ○ Lazy Reads & Columnar Reads
  • 21.