Level 101 for Presto: What is PrestoDB?

Level 101 for Presto
SQL on Everything
Part 1 of the Tech Talk Series for Presto
What is PrestoDB?
What’s the difference?Beinan Wang
Sr. Software Engineer, Twitter
Dipti Borkar
Co-Founder & CPO, Ahana

Presto 101 Outline
● What is Presto?
● How are we using Presto?
● What made Presto different?
○ Scalable architecture
○ Flexible Connectors
○ Performance
● The life of a query
2

What is Presto?
● Distributed SQL query engine
○ ANSI SQL on Hadoop, Kafka, Druid etc.
○ Designed to be interactive
○ Access to petabytes of data
● Opensource, hosted on github
○ https://github.com/prestodb
● Open question:
○ Is presto a database?
3

How are we using Presto?
● Adhoc
● BI tools
● Dashboard
● A/B testing
● ETL/scheduled job
● Online service *
4

What made presto different?
● Scalable architecture
● Pluggable Connectors
● Performance
5

Scalable architecture
● Two roles -- coordinator and worker
● Easy scale up and scale down
○ Scale up to 1000 workers*
○ Fit in web scaled companies
6

Presto Connector Data Model
● Connector: Driver for a data source.
○ Example: HDFS, Cassandra, Kafka, SQL Server
● Catalog: Contains schemas from a datasource specified by the connector
● Schemas: Namespace to organize tables.
● Tables: Set of unordered rows organized into columns with types.
8

Presto Hive Connector -- Access Control
10

Presto Hive Connector -- Data File Types
11
● Supported File Types
○ ORC
○ Parquet
○ Avro
○ RCFile
○ SequenceFile
○ JSON
○ Text
● No data ingestion needed

Why Presto is Fast
● In-Memory processing
● Pull model
● Columnar storage and execution
● Bytecode generation
13

The Life of a Query -- Simple Scan
SELECT *
FROM orders
WHERE discount = 0

The Life of a Query -- Join and Aggregation
SELECT
orders.orderkey, SUM(tax)
FROM orders
LEFT JOIN lineitem
ON orders.orderkey = lineitem.orderkey
WHERE discount = 0
GROUP BY orders.orderkey
This example is from Presto: SQL on Everything https://research.fb.com/publications/presto-sql-on-everything/

Logical Plan -- do NOT join two big tables
This example is from Presto: SQL on Everything https://research.fb.com/publications/presto-sql-on-everything/

Limitations
● Memory Limitation
● Fault Tolerance
● Single Point of Failure: Coordinator
17

Time for a demo!
Local Setup
Query TPC-DS
Cloud Setup
Query S3 / Parquet

Docker Sandbox for Presto
https://hub.docker.com/r/ahanaio/prestodb-sandbox

AWS Sandbox AMI for Presto
https://ahana.io/tutorials/aws-sandbox/

Join the Presto Community
● Require new feature or file a bug: github.com/prestodb/presto
● Slack: prestodb.slack.com
● Twitter: @prestodb
22
Stay up-to-date with Ahana
● URL: ahana.io
● Twitter: @ahanaio

Level 101 for Presto: What is PrestoDB?

More Related Content

What's hot

Similar to Level 101 for Presto: What is PrestoDB?

Recently uploaded

Level 101 for Presto: What is PrestoDB?