SQL for Everything at CWT2014

SQL for
Everything
Presto: Distributed SQL Query Engine
Masahiro Nakagawa
Nov 6, 2014
Cloudera World Tokyo

Who are you?
> Masahiro Nakagawa
> github/twitter: @repeatedly
> Ingress: Blue
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> I love OSS :)
> D language - Phobos committer
> Fluentd - Main maintainer
> MessagePack / RPC- D and Python (only RPC)
> The organizer of Presto Source Code Reading
> etc…

SQL Players on Hadoop
This color indicates a commercial product
> Hive
> Spark SQL Batch
Short Batch
Low latency
Stream
> Presto
> Impala
> Drill
> Norikra
> StreamSQL
> HAWQ
> Actian
> etc…
Latency: minutes - hours
Latency: seconds - minutes
Latency: immediate

SQL Players on Hadoop
This color indicates a commercial product
> Hive
> Spark SQL
Batch
Short Batch
Low latency
Stream
> Presto
> Impala
> Drill
> HAWQ
> Actian
> etc…
Red Ocean
Blue Ocean?
> Norikra
> StreamSQL

Presto overview
> Open sourced by Facebook
> https://github.com/facebook/presto
• github is a primary
> written in Java
> latest version is 0.81
> Built-in useful features
> Connectors
> Machine Learning
> Window function
> Approximate query
> etc…

What’s Presto?
A distributed SQL query engine
for interactive data analisys
against GBs to PBs of data.

What problems does it solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more & less scalable
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze

HDFS
Hive Dashboard
Presto
PostgreSQL, etc.
Daily/Hourly Batch
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Interactive query

Presto
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
SQL on any data sets Commercial
Cassandra MySQL Commertial DBs
BI Tools
✓ IBM Cognos
✓ Tableau
✓ ...
Data analysis platform

Presto’s deployment
> Facebook
> Multiple geographical regions
> scaled to 1,000 nodes
> actively used by 1,000+ employees
> processing 1PB/day
> Netflix, Dropbox, Treasure Data, Airbnb,
Qubole, LINE, GREE, Scaleout, etc
> Presto as a Service
> Treasure Data, Qubole

PostgreSQL gateway for Presto
> A PostgreSQL protocol gateway based on
PostgreSQL’s stable ODBC / JDBC drivers
> Developed by Sadayuki Furuhashi
https://github.com/treasure-data/prestogres

Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service

What’s Connectors?
> Access to storage and metadata
> provide table schema to coordinators
> provide table rows to workers
> Connectors are pluggable to Presto
> written in Java
> Implementations:
> Hive(CDH, HDP, Community), Cassandra,
MySQL, JDBC, Kafka, etc…
> Or your own connector
• Treasure Data has own connector

Client
Coordinator
other
connectors
...
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Hive
Connector
HDFS / Metastore
Multiple connectors in a query
Cassandra
Connector
Other data sources...

Distributed architecture
> 3 type of servers:
> Coordinator, worker, discovery service
> Get data/metadata through connector
plugins.
> Presto is NOT a database
> Presto provides SQL to existent data stores
> Client protocol is HTTP + JSON
> Language bindings:
Ruby, Python, PHP, Java (JDBC), R, Node.JS...

Presto’s execution model
> Presto is NOT MapReduce
> Use its own execution engine
> Presto’s query plan is based on DAG
> more like Apache Tez / Spark or
traditional MPP databases
> Impala and Drill use a similar model

Query Planner
SQL
SELECT
name,
count(*) AS c
FROM impressions
GROUP BY name
Table schema
impressions (
name varchar
time bigint
)
Output
(name, c)
GROUP BY
(name,
count(*))
Table scan
(name:varchar)
+
Output
Exchange
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Logical query plan
Distributed query plan

Query Planner - Stages
Output
Exchange
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
inter-worker
data transfer Stage-0
pipelined
aggregation
inter-worker
data transfer
Stage-1
Stage-2

Output
Exchange
Sink
Partial aggr
Table scan
Sink
Partial aggr
Table scan
Execution Planner
+Node list
✓ 2 workers
Sink
Final aggr
Exchange
Sink
Final aggr
Exchange
Output
Exchange
Sink
Final aggr
Exchange
Sink
Partial aggr
Table scan
Worker 1 Worker 2

All stages are pipe-lined
✓ No wait time
✓ No fault-tolerance
MapReduce vs. Presto
MapReduce Presto
reduce reduce
disk
map map
disk
reduce reduce
map map
task
task
task task
task task
memory-to-memory
data transfer
✓ No disk IO
✓ Data chunk must
fit in memory
task
disk
Wait between
stages
Write data
to disk

Presto Meetup
The first half of 2015

Cloud service for the entire data pipeline,
including Presto
Check: treasuredata.com

SQL for Everything at CWT2014

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to SQL for Everything at CWT2014

Similar to SQL for Everything at CWT2014 (20)

More from N Masahiro

More from N Masahiro (20)

Recently uploaded

Recently uploaded (20)

SQL for Everything at CWT2014