SlideShare a Scribd company logo
1 of 61
Download to read offline
Sadayuki Furuhashi
Founder & Software Architect
Treasure Data, inc.
PrestoInteractive SQL Query Engine for Big Data
Hadoop Conference in Japan 2014
A little about me...
> Sadayuki Furuhashi
> github/twitter: @frsyuki
> Treasure Data, Inc.
> Founder & Software Architect
> Open-source hacker
> MessagePack - efficient object serializer
> Fluentd - data collection tool
> ServerEngine - Ruby framework to build multiprocess servers
> LS4 - distributed object storage system
> kumofs - distributed key-value data store
0. Background + Intro
What’s Presto?
A distributed SQL query engine
for interactive data analisys
against GBs to PBs of data.
Presto’s history
> 2012 Fall: Project started at Facebook
> Designed for interactive query
> with speed of commercial data warehouse
> and scalability to the size of Facebook
> 2013 Winter: Open sourced!
> 30+ contributes in 6 months
> including people from outside of Facebook
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
HDFS
Hive
PostgreSQL, etc.
Daily/Hourly Batch
Interactive query
Commercial
BI Tools
Batch analysis platform Visualization platform
Dashboard
HDFS
Hive
PostgreSQL, etc.
Daily/Hourly Batch
Interactive query
✓ Less scalable
✓ Extra cost
Commercial
BI Tools
Dashboard
✓ More work to manage
2 platforms
✓ Can’t query against
“live”data directly
Batch analysis platform Visualization platform
HDFS
Hive Dashboard
Presto
PostgreSQL, etc.
Daily/Hourly Batch
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Interactive query
Presto
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Cassandra MySQL Commertial DBs
SQL on any data sets
Presto
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Cassandra MySQL Commertial DBs
SQL on any data sets Commercial
BI Tools
✓ IBM Cognos
✓ Tableau
✓ ...
Data analysis platform
dashboard on chart.io: https://chartio.com/
What can Presto do?
> Query interactively (in milli-seconds to minues)
> MapReduce and Hive are still necessary for ETL
> Query using commercial BI tools or dashboards
> Reliable ODBC/JDBC connectivity
> Query across multiple data sources such as
Hive, HBase, Cassandra, or even commertial DBs
> Plugin mechanism
> Integrate batch analisys + visualization
into a single data analysis platform
Presto’s deployment
> Facebook
> Multiple geographical regions
> scaled to 1,000 nodes
> actively used by 1,000+ employees
> who run 30,000+ queries every day
> processing 1PB/day
> Netflix, Dropbox, Treasure Data, Airbnb, Qubole
> Presto as a Service
Today’s talk
1. Distributed architecture
2. Data visualization - Demo
3. Query Execution - Presto vs. MapReduce
4. Monitoring & Configuration
5. Roadmap - the future
1. Distributed architecture
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
1. find servers in a cluster
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
2. Client sends a query
using HTTP
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
3. Coordinator builds
a query plan
Connector plugin
provides metadata
(table schema, etc.)
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
4. Coordinator sends
tasks to workers
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
5. Workers read data
through connector plugin
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
6. Workers run tasks
in memory
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
7. Client gets the result
from a worker
Client
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
What’s Connectors?
> Connectors are plugins to Presto
> written in Java
> Access to storage and metadata
> provide table schema to coordinators
> provide table rows to workers
> Implementations:
> Hive connector
> Cassandra connector
> MySQL through JDBC connector (prerelease)
> Or your own connector
Client
Coordinator Hive
Connector
Worker
Worker
Worker
HDFS,
Hive Metastore
Discovery Service
find servers in a cluster
Hive connector
Client
Coordinator Cassandra
Connector
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Cassandra connector
Client
Coordinator
other
connectors
...
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Hive
Connector
HDFS / Metastore
Multiple connectors in a query
Cassandra
Connector
Other data sources...
1. Distributed architecture
> 3 type of servers:
> Coordinator, worker, discovery service
> Get data/metadata through connector plugins.
> Presto is NOT a database
> Presto provides SQL to existent data stores
> Client protocol is HTTP + JSON
> Language bindings:
Ruby, Python, PHP, Java (JDBC), R, Node.JS...
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
Coordinator
Coordinator HA
2. Data visualization
The problems to use BI tools
> BI tools need ODBC or JDBC connectivity
> Tableau, IBM Cognos, QlickView, Chart.IO, ...
> JasperSoft, Pentaho, MotionBoard, ...
> ODBC/JDBC is VERY COMPLICATED
> Matured implementation needs LONG time
A solution: PostgreSQL protocol
> Creating a PostgreSQL protocol gateway
> Using PostgreSQL’s stable ODBC / JDBC driver
https://github.com/treasure-data/prestogres
How Prestogres works?
2. select run_presto_as_temp_table(
‘presto_result’,‘SELECT COUNT(1) FROM tbl1’);
pgpool-II
+ patchclient
1. SELECT COUNT(1) FROM tbl1
4. SELECT * FROM presto_result;
PostgreSQL
3.“run_persto_as_temp_table”function
runs query on Presto
Coordinator
Demo
2. Data visualization with Presto
> Data visualization tools need ODBC/JDBC driver
> but implemetation takes LONG time
> A solution is to use PostgreSQL protocol
> and use PostgreSQL’s ODBC/JDBC driver
> Prestogres is already confirmed to work with
some commertial BI tools
3. Query Execution
Presto’s execution model
> Presto is NOT MapReduce
> Presto’s query plan is based on DAG
> more like Apache Tez or traditional MPP
databases
How query runs?
> Coordinator
> SQL Parser
> Query Planner
> Execution planner
> Workers
> Task execution scheduler
SQL
SQL Parser
AST
Logical
Planner
Distributed
Planner
Logical
Query Plan
Execution
Planner
Discovery Server
Connector
Distributed
Query Plan Execution Plan
Optimizer
NodeManager
✓ node list
✓ table schema
Metadata
SQL
SQL Parser
SQL
Distributed
Planner
Logical
Query Plan
Execution
Planner
Discovery Service
Connector
Query Plan Execution Plan
Optimizer
NodeManager
✓ node list
✓ table schema
Metadata
(today’s talk)
Query
Planner
Query Planner
SELECT
name,
count(*) AS c
FROM impressions
GROUP BY name
SQL
impressions (
name varchar
time bigint
)
Table schema
Table scan
(name:varchar)
GROUP BY
(name, count(*))
Output
(name, c)
+
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
Logical query plan
Distributed query plan
Query Planner - Stages
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
inter-worker
data transfer
pipelined
aggregation
inter-worker
data transfer
Stage-0
Stage-1
Stage-2
Sink
Partial aggregation
Table scan
Sink
Partial aggregation
Table scan
Execution Planner
+ Node list
✓ 2 workers
Sink
Final aggregation
Exchange
Output
Exchange
Sink
Final aggregation
Exchange
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
Worker 1 Worker 2
Execution Planner - Tasks
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Task
1 task / worker / stage
✓ All tasks in parallel
Output
Exchange
Worker 1 Worker 2
Execution Planner - Split
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
Split
many splits / task
= many threads / worker
(table scan)
1 split / task
= 1 thread / worker
Worker 1 Worker 2
1 split / worker
= 1 thread / worker
All stages are pipe-lined
✓ No wait time
✓ No fault-tolerance
MapReduce vs. Presto
MapReduce Presto
map map
reduce reduce
task task
task task
task
task
memory-to-memory
data transfer
✓ No disk IO
✓ Data chunk must
fit in memory
task
disk
map map
reduce reduce
disk
disk
Write data
to disk
Wait between
stages
3. Query Execution
> SQL is converted into stages, tasks and splits
> All tasks run in parallel
> No wait time between stages (pipelined)
> If one task fails, all tasks fail at once (query fails)
> Memory-to-memory data transfer
> No disk IO
> If aggregated data doesn’t fit in memory,
query fails
• Note: query dies but worker doesn’t die.
Memory consumption of all queries is fully managed
4. Monitoring & Configuration
Monitoring
> Web UI
> basic query status check
> JMX HTTP API
> GET /v1/jmx/mbean[/{objectName}]
• com.facebook.presto.execution:name=TaskManager
• com.facebook.presto.execution:name=QueryManager
• com.facebook.presto.execution:name=NodeScheduler
> Event notification (remote logging)
> POST http://remote.server/v2/event
• query start, query complete, split complete
Configuration
> Execution planning (for coordinator)
> query.initial-hash-partitions
• max number of hash buckets (=tasks) of a GROUP BY
(default: 8)
> node-scheduler.min-candidates
• max number of workers to run a stage in parallel
(default: 10)
> node-scheduler.include-coordinator
• whether run tasks only on workers or include coordinator
> query.schedule-split-batch-size
• number of splits of a stage to start at once
Configuration
> Task execution (for workers)
> task.cpu-timer-enabled
• enable detailed statistics (causes some overhead)
(default: true)
> task.max-memory
• memory limit of a task especially for hash tables used by
GROUP BY and JOIN operations (default: 256MB)
• enlarge if you get“Task exceeded max memory size”error
> task.shard.max-threads
• max number of threads of a worker to run active splits
(default: number of CPU cores * 4)
5. Roadmap
A report of Presto Meetup 2014
http://www.slideshare.net/dain1/presto-meetup-20140514-34731104
"Presto, Past, Present, and Future" by Dain Sundstrom at Facebook
Presto’s future
> Huge JOIN and GROUP BY
> Spill to disk
> Task recovery
> CREATE VIEW (※implemented)
> Native store (※implemented)
> Fast data store in Presto workers
> to cache hot data
> Authentication and permissions
Presto’s future
> DDL/DML statements
> CREATE TABLE with partitioning
> DELETE and INSERT
> Plugin repository
> CLI plugin manager
> JOIN and aggregation pushdown
> Custom optimizers
Links
> Web site & document
> http://prestodb.io
> Mailing list
> https://groups.google.com/group/presto-users
> Github
> https://github.com/facebook/presto
> Guidelines for contribution
> https://github.com/facebook/presto/blob/master/CONTRIBUTING.md
Check: www.treasuredata.com
Cloud service for the entire data pipeline,
including Presto. We’re hiring!

More Related Content

What's hot

Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
HostedbyConfluent
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
DataWorks Summit
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Databricks
 

What's hot (20)

[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
 
Building an open data platform with apache iceberg
Building an open data platform with apache icebergBuilding an open data platform with apache iceberg
Building an open data platform with apache iceberg
 
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
A Deep Dive into Stateful Stream Processing in Structured Streaming with Tath...
 
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta LakeSimplify CDC Pipeline with Spark Streaming SQL and Delta Lake
Simplify CDC Pipeline with Spark Streaming SQL and Delta Lake
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and HudiHow to build a streaming Lakehouse with Flink, Kafka, and Hudi
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
 
Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived Hadoop Strata Talk - Uber, your hadoop has arrived
Hadoop Strata Talk - Uber, your hadoop has arrived
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
InfluxDB IOx Tech Talks: Query Engine Design and the Rust-Based DataFusion in...
 
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
Hudi: Large-Scale, Near Real-Time Pipelines at Uber with Nishith Agarwal and ...
 
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
Streaming Data Lakes using Kafka Connect + Apache Hudi | Vinoth Chandar, Apac...
 
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
Designing Apache Hudi for Incremental Processing With Vinoth Chandar and Etha...
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterpriseUsing Spark Streaming and NiFi for the next generation of ETL in the enterprise
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
Designing ETL Pipelines with Structured Streaming and Delta Lake—How to Archi...
 
Designing Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things RightDesigning Structured Streaming Pipelines—How to Architect Things Right
Designing Structured Streaming Pipelines—How to Architect Things Right
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 

Similar to Presto - Hadoop Conference Japan 2014

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
k4ndar
 

Similar to Presto - Hadoop Conference Japan 2014 (20)

SQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in Taiwan
 
SQL for Everything at CWT2014
SQL for Everything at CWT2014SQL for Everything at CWT2014
SQL for Everything at CWT2014
 
Presto+MySQLで分散SQL
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQL
 
Treasure Data and OSS
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
 
Prestogres internals
Prestogres internalsPrestogres internals
Prestogres internals
 
Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Prestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
 
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
 
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
 
Big data-at-detik
Big data-at-detikBig data-at-detik
Big data-at-detik
 
Boston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
 
Fluentd - RubyKansai 65
Fluentd - RubyKansai 65Fluentd - RubyKansai 65
Fluentd - RubyKansai 65
 
Prashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEWPrashanth Kumar_Hadoop_NEW
Prashanth Kumar_Hadoop_NEW
 
Presto
PrestoPresto
Presto
 
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
 
Hadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to TezHadoop from Hive with Stinger to Tez
Hadoop from Hive with Stinger to Tez
 
Hadoop Big Data A big picture
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
 
Netflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
 
Automating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
 

More from Sadayuki Furuhashi

What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualWhat's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
Sadayuki Furuhashi
 

More from Sadayuki Furuhashi (20)

Scripting Embulk Plugins
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk Plugins
 
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
 
Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
 
Fluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes Meetup
 
DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
 
Fighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
 
Embulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダ
 
Plugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
 
Embuk internals
Embuk internalsEmbuk internals
Embuk internals
 
Embulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
 
Fluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect MoreFluentd - Set Up Once, Collect More
Fluentd - Set Up Once, Collect More
 
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualWhat's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
 
How we use Fluentd in Treasure Data
How we use Fluentd in Treasure DataHow we use Fluentd in Treasure Data
How we use Fluentd in Treasure Data
 
Fluentd meetup at Slideshare
Fluentd meetup at SlideshareFluentd meetup at Slideshare
Fluentd meetup at Slideshare
 
How to collect Big Data into Hadoop
How to collect Big Data into HadoopHow to collect Big Data into Hadoop
How to collect Big Data into Hadoop
 
Fluentd meetup
Fluentd meetupFluentd meetup
Fluentd meetup
 
upload test 1
upload test 1upload test 1
upload test 1
 

Recently uploaded

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
amilabibi1
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
Kayode Fayemi
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Recently uploaded (18)

My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
Aesthetic Colaba Mumbai Cst Call girls 📞 7738631006 Grant road Call Girls ❤️-...
 
Digital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of DrupalDigital collaboration with Microsoft 365 as extension of Drupal
Digital collaboration with Microsoft 365 as extension of Drupal
 
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verifiedSector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
Sector 62, Noida Call girls :8448380779 Noida Escorts | 100% verified
 
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdfThe workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
The workplace ecosystem of the future 24.4.2024 Fabritius_share ii.pdf
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
Bring back lost lover in USA, Canada ,Uk ,Australia ,London Lost Love Spell C...
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
If this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New NigeriaIf this Giant Must Walk: A Manifesto for a New Nigeria
If this Giant Must Walk: A Manifesto for a New Nigeria
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdfAWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
AWS Data Engineer Associate (DEA-C01) Exam Dumps 2024.pdf
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 

Presto - Hadoop Conference Japan 2014

  • 1. Sadayuki Furuhashi Founder & Software Architect Treasure Data, inc. PrestoInteractive SQL Query Engine for Big Data Hadoop Conference in Japan 2014
  • 2. A little about me... > Sadayuki Furuhashi > github/twitter: @frsyuki > Treasure Data, Inc. > Founder & Software Architect > Open-source hacker > MessagePack - efficient object serializer > Fluentd - data collection tool > ServerEngine - Ruby framework to build multiprocess servers > LS4 - distributed object storage system > kumofs - distributed key-value data store
  • 4. What’s Presto? A distributed SQL query engine for interactive data analisys against GBs to PBs of data.
  • 5. Presto’s history > 2012 Fall: Project started at Facebook > Designed for interactive query > with speed of commercial data warehouse > and scalability to the size of Facebook > 2013 Winter: Open sourced! > 30+ contributes in 6 months > including people from outside of Facebook
  • 6. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 7. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 8. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 9. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 10. HDFS Hive PostgreSQL, etc. Daily/Hourly Batch Interactive query Commercial BI Tools Batch analysis platform Visualization platform Dashboard
  • 11. HDFS Hive PostgreSQL, etc. Daily/Hourly Batch Interactive query ✓ Less scalable ✓ Extra cost Commercial BI Tools Dashboard ✓ More work to manage 2 platforms ✓ Can’t query against “live”data directly Batch analysis platform Visualization platform
  • 12. HDFS Hive Dashboard Presto PostgreSQL, etc. Daily/Hourly Batch HDFS Hive Dashboard Daily/Hourly Batch Interactive query Interactive query
  • 14. Presto HDFS Hive Dashboard Daily/Hourly Batch Interactive query Cassandra MySQL Commertial DBs SQL on any data sets Commercial BI Tools ✓ IBM Cognos ✓ Tableau ✓ ... Data analysis platform
  • 15. dashboard on chart.io: https://chartio.com/
  • 16. What can Presto do? > Query interactively (in milli-seconds to minues) > MapReduce and Hive are still necessary for ETL > Query using commercial BI tools or dashboards > Reliable ODBC/JDBC connectivity > Query across multiple data sources such as Hive, HBase, Cassandra, or even commertial DBs > Plugin mechanism > Integrate batch analisys + visualization into a single data analysis platform
  • 17. Presto’s deployment > Facebook > Multiple geographical regions > scaled to 1,000 nodes > actively used by 1,000+ employees > who run 30,000+ queries every day > processing 1PB/day > Netflix, Dropbox, Treasure Data, Airbnb, Qubole > Presto as a Service
  • 18. Today’s talk 1. Distributed architecture 2. Data visualization - Demo 3. Query Execution - Presto vs. MapReduce 4. Monitoring & Configuration 5. Roadmap - the future
  • 21. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 1. find servers in a cluster
  • 22. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 2. Client sends a query using HTTP
  • 23. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 3. Coordinator builds a query plan Connector plugin provides metadata (table schema, etc.)
  • 24. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 4. Coordinator sends tasks to workers
  • 25. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 5. Workers read data through connector plugin
  • 26. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 6. Workers run tasks in memory
  • 27. Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 7. Client gets the result from a worker Client
  • 29. What’s Connectors? > Connectors are plugins to Presto > written in Java > Access to storage and metadata > provide table schema to coordinators > provide table rows to workers > Implementations: > Hive connector > Cassandra connector > MySQL through JDBC connector (prerelease) > Or your own connector
  • 32. Client Coordinator other connectors ... Worker Worker Worker Cassandra Discovery Service find servers in a cluster Hive Connector HDFS / Metastore Multiple connectors in a query Cassandra Connector Other data sources...
  • 33. 1. Distributed architecture > 3 type of servers: > Coordinator, worker, discovery service > Get data/metadata through connector plugins. > Presto is NOT a database > Presto provides SQL to existent data stores > Client protocol is HTTP + JSON > Language bindings: Ruby, Python, PHP, Java (JDBC), R, Node.JS...
  • 34. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service Coordinator Coordinator HA
  • 36. The problems to use BI tools > BI tools need ODBC or JDBC connectivity > Tableau, IBM Cognos, QlickView, Chart.IO, ... > JasperSoft, Pentaho, MotionBoard, ... > ODBC/JDBC is VERY COMPLICATED > Matured implementation needs LONG time
  • 37. A solution: PostgreSQL protocol > Creating a PostgreSQL protocol gateway > Using PostgreSQL’s stable ODBC / JDBC driver https://github.com/treasure-data/prestogres
  • 38. How Prestogres works? 2. select run_presto_as_temp_table( ‘presto_result’,‘SELECT COUNT(1) FROM tbl1’); pgpool-II + patchclient 1. SELECT COUNT(1) FROM tbl1 4. SELECT * FROM presto_result; PostgreSQL 3.“run_persto_as_temp_table”function runs query on Presto Coordinator
  • 39. Demo
  • 40. 2. Data visualization with Presto > Data visualization tools need ODBC/JDBC driver > but implemetation takes LONG time > A solution is to use PostgreSQL protocol > and use PostgreSQL’s ODBC/JDBC driver > Prestogres is already confirmed to work with some commertial BI tools
  • 42. Presto’s execution model > Presto is NOT MapReduce > Presto’s query plan is based on DAG > more like Apache Tez or traditional MPP databases
  • 43. How query runs? > Coordinator > SQL Parser > Query Planner > Execution planner > Workers > Task execution scheduler
  • 44. SQL SQL Parser AST Logical Planner Distributed Planner Logical Query Plan Execution Planner Discovery Server Connector Distributed Query Plan Execution Plan Optimizer NodeManager ✓ node list ✓ table schema Metadata
  • 45. SQL SQL Parser SQL Distributed Planner Logical Query Plan Execution Planner Discovery Service Connector Query Plan Execution Plan Optimizer NodeManager ✓ node list ✓ table schema Metadata (today’s talk) Query Planner
  • 46. Query Planner SELECT name, count(*) AS c FROM impressions GROUP BY name SQL impressions ( name varchar time bigint ) Table schema Table scan (name:varchar) GROUP BY (name, count(*)) Output (name, c) + Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange Logical query plan Distributed query plan
  • 47. Query Planner - Stages Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange inter-worker data transfer pipelined aggregation inter-worker data transfer Stage-0 Stage-1 Stage-2
  • 48. Sink Partial aggregation Table scan Sink Partial aggregation Table scan Execution Planner + Node list ✓ 2 workers Sink Final aggregation Exchange Output Exchange Sink Final aggregation Exchange Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange Worker 1 Worker 2
  • 49. Execution Planner - Tasks Sink Final aggregation Exchange Sink Partial aggregation Table scan Sink Final aggregation Exchange Sink Partial aggregation Table scan Task 1 task / worker / stage ✓ All tasks in parallel Output Exchange Worker 1 Worker 2
  • 50. Execution Planner - Split Sink Final aggregation Exchange Sink Partial aggregation Table scan Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange Split many splits / task = many threads / worker (table scan) 1 split / task = 1 thread / worker Worker 1 Worker 2 1 split / worker = 1 thread / worker
  • 51. All stages are pipe-lined ✓ No wait time ✓ No fault-tolerance MapReduce vs. Presto MapReduce Presto map map reduce reduce task task task task task task memory-to-memory data transfer ✓ No disk IO ✓ Data chunk must fit in memory task disk map map reduce reduce disk disk Write data to disk Wait between stages
  • 52. 3. Query Execution > SQL is converted into stages, tasks and splits > All tasks run in parallel > No wait time between stages (pipelined) > If one task fails, all tasks fail at once (query fails) > Memory-to-memory data transfer > No disk IO > If aggregated data doesn’t fit in memory, query fails • Note: query dies but worker doesn’t die. Memory consumption of all queries is fully managed
  • 53. 4. Monitoring & Configuration
  • 54. Monitoring > Web UI > basic query status check > JMX HTTP API > GET /v1/jmx/mbean[/{objectName}] • com.facebook.presto.execution:name=TaskManager • com.facebook.presto.execution:name=QueryManager • com.facebook.presto.execution:name=NodeScheduler > Event notification (remote logging) > POST http://remote.server/v2/event • query start, query complete, split complete
  • 55. Configuration > Execution planning (for coordinator) > query.initial-hash-partitions • max number of hash buckets (=tasks) of a GROUP BY (default: 8) > node-scheduler.min-candidates • max number of workers to run a stage in parallel (default: 10) > node-scheduler.include-coordinator • whether run tasks only on workers or include coordinator > query.schedule-split-batch-size • number of splits of a stage to start at once
  • 56. Configuration > Task execution (for workers) > task.cpu-timer-enabled • enable detailed statistics (causes some overhead) (default: true) > task.max-memory • memory limit of a task especially for hash tables used by GROUP BY and JOIN operations (default: 256MB) • enlarge if you get“Task exceeded max memory size”error > task.shard.max-threads • max number of threads of a worker to run active splits (default: number of CPU cores * 4)
  • 57. 5. Roadmap A report of Presto Meetup 2014 http://www.slideshare.net/dain1/presto-meetup-20140514-34731104 "Presto, Past, Present, and Future" by Dain Sundstrom at Facebook
  • 58. Presto’s future > Huge JOIN and GROUP BY > Spill to disk > Task recovery > CREATE VIEW (※implemented) > Native store (※implemented) > Fast data store in Presto workers > to cache hot data > Authentication and permissions
  • 59. Presto’s future > DDL/DML statements > CREATE TABLE with partitioning > DELETE and INSERT > Plugin repository > CLI plugin manager > JOIN and aggregation pushdown > Custom optimizers
  • 60. Links > Web site & document > http://prestodb.io > Mailing list > https://groups.google.com/group/presto-users > Github > https://github.com/facebook/presto > Guidelines for contribution > https://github.com/facebook/presto/blob/master/CONTRIBUTING.md
  • 61. Check: www.treasuredata.com Cloud service for the entire data pipeline, including Presto. We’re hiring!