Presto - Hadoop Conference Japan 2014

Sadayuki Furuhashi
Sadayuki FuruhashiFounder & Architect at Treasure Data, Inc.
Sadayuki Furuhashi
Founder & Software Architect
Treasure Data, inc.
PrestoInteractive SQL Query Engine for Big Data
Hadoop Conference in Japan 2014
A little about me...
> Sadayuki Furuhashi
> github/twitter: @frsyuki
> Treasure Data, Inc.
> Founder & Software Architect
> Open-source hacker
> MessagePack - efficient object serializer
> Fluentd - data collection tool
> ServerEngine - Ruby framework to build multiprocess servers
> LS4 - distributed object storage system
> kumofs - distributed key-value data store
0. Background + Intro
What’s Presto?
A distributed SQL query engine
for interactive data analisys
against GBs to PBs of data.
Presto’s history
> 2012 Fall: Project started at Facebook
> Designed for interactive query
> with speed of commercial data warehouse
> and scalability to the size of Facebook
> 2013 Winter: Open sourced!
> 30+ contributes in 6 months
> including people from outside of Facebook
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
What’s the problems to solve?
> We couldn’t visualize data in HDFS directly
using dashboards or BI tools
> because Hive is too slow (not interactive)
> or ODBC connectivity is unavailable/unstable
> We needed to store daily-batch results to an
interactive DB for quick response
(PostgreSQL, Redshift, etc.)
> Interactive DB costs more and less scalable by far
> Some data are not stored in HDFS
> We need to copy the data into HDFS to analyze
HDFS
Hive
PostgreSQL, etc.
Daily/Hourly Batch
Interactive query
Commercial
BI Tools
Batch analysis platform Visualization platform
Dashboard
HDFS
Hive
PostgreSQL, etc.
Daily/Hourly Batch
Interactive query
✓ Less scalable
✓ Extra cost
Commercial
BI Tools
Dashboard
✓ More work to manage
2 platforms
✓ Can’t query against
“live”data directly
Batch analysis platform Visualization platform
HDFS
Hive Dashboard
Presto
PostgreSQL, etc.
Daily/Hourly Batch
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Interactive query
Presto
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Cassandra MySQL Commertial DBs
SQL on any data sets
Presto
HDFS
Hive
Dashboard
Daily/Hourly Batch
Interactive query
Cassandra MySQL Commertial DBs
SQL on any data sets Commercial
BI Tools
✓ IBM Cognos
✓ Tableau
✓ ...
Data analysis platform
dashboard on chart.io: https://chartio.com/
What can Presto do?
> Query interactively (in milli-seconds to minues)
> MapReduce and Hive are still necessary for ETL
> Query using commercial BI tools or dashboards
> Reliable ODBC/JDBC connectivity
> Query across multiple data sources such as
Hive, HBase, Cassandra, or even commertial DBs
> Plugin mechanism
> Integrate batch analisys + visualization
into a single data analysis platform
Presto’s deployment
> Facebook
> Multiple geographical regions
> scaled to 1,000 nodes
> actively used by 1,000+ employees
> who run 30,000+ queries every day
> processing 1PB/day
> Netflix, Dropbox, Treasure Data, Airbnb, Qubole
> Presto as a Service
Today’s talk
1. Distributed architecture
2. Data visualization - Demo
3. Query Execution - Presto vs. MapReduce
4. Monitoring & Configuration
5. Roadmap - the future
1. Distributed architecture
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
1. find servers in a cluster
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
2. Client sends a query
using HTTP
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
3. Coordinator builds
a query plan
Connector plugin
provides metadata
(table schema, etc.)
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
4. Coordinator sends
tasks to workers
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
5. Workers read data
through connector plugin
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
6. Workers run tasks
in memory
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
7. Client gets the result
from a worker
Client
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
What’s Connectors?
> Connectors are plugins to Presto
> written in Java
> Access to storage and metadata
> provide table schema to coordinators
> provide table rows to workers
> Implementations:
> Hive connector
> Cassandra connector
> MySQL through JDBC connector (prerelease)
> Or your own connector
Client
Coordinator Hive
Connector
Worker
Worker
Worker
HDFS,
Hive Metastore
Discovery Service
find servers in a cluster
Hive connector
Client
Coordinator Cassandra
Connector
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Cassandra connector
Client
Coordinator
other
connectors
...
Worker
Worker
Worker
Cassandra
Discovery Service
find servers in a cluster
Hive
Connector
HDFS / Metastore
Multiple connectors in a query
Cassandra
Connector
Other data sources...
1. Distributed architecture
> 3 type of servers:
> Coordinator, worker, discovery service
> Get data/metadata through connector plugins.
> Presto is NOT a database
> Presto provides SQL to existent data stores
> Client protocol is HTTP + JSON
> Language bindings:
Ruby, Python, PHP, Java (JDBC), R, Node.JS...
Client
Coordinator Connector
Plugin
Worker
Worker
Worker
Storage / Metadata
Discovery Service
Coordinator
Coordinator HA
2. Data visualization
The problems to use BI tools
> BI tools need ODBC or JDBC connectivity
> Tableau, IBM Cognos, QlickView, Chart.IO, ...
> JasperSoft, Pentaho, MotionBoard, ...
> ODBC/JDBC is VERY COMPLICATED
> Matured implementation needs LONG time
A solution: PostgreSQL protocol
> Creating a PostgreSQL protocol gateway
> Using PostgreSQL’s stable ODBC / JDBC driver
https://github.com/treasure-data/prestogres
How Prestogres works?
2. select run_presto_as_temp_table(
‘presto_result’,‘SELECT COUNT(1) FROM tbl1’);
pgpool-II
+ patchclient
1. SELECT COUNT(1) FROM tbl1
4. SELECT * FROM presto_result;
PostgreSQL
3.“run_persto_as_temp_table”function
runs query on Presto
Coordinator
Demo
2. Data visualization with Presto
> Data visualization tools need ODBC/JDBC driver
> but implemetation takes LONG time
> A solution is to use PostgreSQL protocol
> and use PostgreSQL’s ODBC/JDBC driver
> Prestogres is already confirmed to work with
some commertial BI tools
3. Query Execution
Presto’s execution model
> Presto is NOT MapReduce
> Presto’s query plan is based on DAG
> more like Apache Tez or traditional MPP
databases
How query runs?
> Coordinator
> SQL Parser
> Query Planner
> Execution planner
> Workers
> Task execution scheduler
SQL
SQL Parser
AST
Logical
Planner
Distributed
Planner
Logical
Query Plan
Execution
Planner
Discovery Server
Connector
Distributed
Query Plan Execution Plan
Optimizer
NodeManager
✓ node list
✓ table schema
Metadata
SQL
SQL Parser
SQL
Distributed
Planner
Logical
Query Plan
Execution
Planner
Discovery Service
Connector
Query Plan Execution Plan
Optimizer
NodeManager
✓ node list
✓ table schema
Metadata
(today’s talk)
Query
Planner
Query Planner
SELECT
name,
count(*) AS c
FROM impressions
GROUP BY name
SQL
impressions (
name varchar
time bigint
)
Table schema
Table scan
(name:varchar)
GROUP BY
(name, count(*))
Output
(name, c)
+
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
Logical query plan
Distributed query plan
Query Planner - Stages
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
inter-worker
data transfer
pipelined
aggregation
inter-worker
data transfer
Stage-0
Stage-1
Stage-2
Sink
Partial aggregation
Table scan
Sink
Partial aggregation
Table scan
Execution Planner
+ Node list
✓ 2 workers
Sink
Final aggregation
Exchange
Output
Exchange
Sink
Final aggregation
Exchange
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
Worker 1 Worker 2
Execution Planner - Tasks
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Task
1 task / worker / stage
✓ All tasks in parallel
Output
Exchange
Worker 1 Worker 2
Execution Planner - Split
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Sink
Final aggregation
Exchange
Sink
Partial aggregation
Table scan
Output
Exchange
Split
many splits / task
= many threads / worker
(table scan)
1 split / task
= 1 thread / worker
Worker 1 Worker 2
1 split / worker
= 1 thread / worker
All stages are pipe-lined
✓ No wait time
✓ No fault-tolerance
MapReduce vs. Presto
MapReduce Presto
map map
reduce reduce
task task
task task
task
task
memory-to-memory
data transfer
✓ No disk IO
✓ Data chunk must
fit in memory
task
disk
map map
reduce reduce
disk
disk
Write data
to disk
Wait between
stages
3. Query Execution
> SQL is converted into stages, tasks and splits
> All tasks run in parallel
> No wait time between stages (pipelined)
> If one task fails, all tasks fail at once (query fails)
> Memory-to-memory data transfer
> No disk IO
> If aggregated data doesn’t fit in memory,
query fails
• Note: query dies but worker doesn’t die.
Memory consumption of all queries is fully managed
4. Monitoring & Configuration
Monitoring
> Web UI
> basic query status check
> JMX HTTP API
> GET /v1/jmx/mbean[/{objectName}]
• com.facebook.presto.execution:name=TaskManager
• com.facebook.presto.execution:name=QueryManager
• com.facebook.presto.execution:name=NodeScheduler
> Event notification (remote logging)
> POST http://remote.server/v2/event
• query start, query complete, split complete
Configuration
> Execution planning (for coordinator)
> query.initial-hash-partitions
• max number of hash buckets (=tasks) of a GROUP BY
(default: 8)
> node-scheduler.min-candidates
• max number of workers to run a stage in parallel
(default: 10)
> node-scheduler.include-coordinator
• whether run tasks only on workers or include coordinator
> query.schedule-split-batch-size
• number of splits of a stage to start at once
Configuration
> Task execution (for workers)
> task.cpu-timer-enabled
• enable detailed statistics (causes some overhead)
(default: true)
> task.max-memory
• memory limit of a task especially for hash tables used by
GROUP BY and JOIN operations (default: 256MB)
• enlarge if you get“Task exceeded max memory size”error
> task.shard.max-threads
• max number of threads of a worker to run active splits
(default: number of CPU cores * 4)
5. Roadmap
A report of Presto Meetup 2014
http://www.slideshare.net/dain1/presto-meetup-20140514-34731104
"Presto, Past, Present, and Future" by Dain Sundstrom at Facebook
Presto’s future
> Huge JOIN and GROUP BY
> Spill to disk
> Task recovery
> CREATE VIEW (※implemented)
> Native store (※implemented)
> Fast data store in Presto workers
> to cache hot data
> Authentication and permissions
Presto’s future
> DDL/DML statements
> CREATE TABLE with partitioning
> DELETE and INSERT
> Plugin repository
> CLI plugin manager
> JOIN and aggregation pushdown
> Custom optimizers
Links
> Web site & document
> http://prestodb.io
> Mailing list
> https://groups.google.com/group/presto-users
> Github
> https://github.com/facebook/presto
> Guidelines for contribution
> https://github.com/facebook/presto/blob/master/CONTRIBUTING.md
Check: www.treasuredata.com
Cloud service for the entire data pipeline,
including Presto. We’re hiring!
1 of 61

Recommended

Presto+MySQLで分散SQL by
Presto+MySQLで分散SQLPresto+MySQLで分散SQL
Presto+MySQLで分散SQLSadayuki Furuhashi
6.2K views24 slides
Prestogres internals by
Prestogres internalsPrestogres internals
Prestogres internalsSadayuki Furuhashi
9.1K views47 slides
Logging for Production Systems in The Container Era by
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
1.4K views46 slides
Presto in my_use_case by
Presto in my_use_casePresto in my_use_case
Presto in my_use_casewyukawa
6.2K views12 slides
Tale of ISUCON and Its Bench Tools by
Tale of ISUCON and Its Bench ToolsTale of ISUCON and Its Bench Tools
Tale of ISUCON and Its Bench ToolsSATOSHI TAGOMORI
5.7K views28 slides
Overview of data analytics service: Treasure Data Service by
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceSATOSHI TAGOMORI
2.6K views60 slides

More Related Content

What's hot

Fighting Against Chaotically Separated Values with Embulk by
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with EmbulkSadayuki Furuhashi
2.1K views45 slides
Fluentd at Bay Area Kubernetes Meetup by
Fluentd at Bay Area Kubernetes MeetupFluentd at Bay Area Kubernetes Meetup
Fluentd at Bay Area Kubernetes MeetupSadayuki Furuhashi
1.2K views31 slides
Data Analytics Service Company and Its Ruby Usage by
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby UsageSATOSHI TAGOMORI
21K views51 slides
Planet-scale Data Ingestion Pipeline: Bigdam by
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: BigdamSATOSHI TAGOMORI
6.3K views46 slides
How to Make Norikra Perfect by
How to Make Norikra PerfectHow to Make Norikra Perfect
How to Make Norikra PerfectSATOSHI TAGOMORI
4.7K views29 slides
Automating Workflows for Analytics Pipelines by
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics PipelinesSadayuki Furuhashi
1.9K views28 slides

What's hot(20)

Fighting Against Chaotically Separated Values with Embulk by Sadayuki Furuhashi
Fighting Against Chaotically Separated Values with EmbulkFighting Against Chaotically Separated Values with Embulk
Fighting Against Chaotically Separated Values with Embulk
Sadayuki Furuhashi2.1K views
Data Analytics Service Company and Its Ruby Usage by SATOSHI TAGOMORI
Data Analytics Service Company and Its Ruby UsageData Analytics Service Company and Its Ruby Usage
Data Analytics Service Company and Its Ruby Usage
SATOSHI TAGOMORI21K views
Planet-scale Data Ingestion Pipeline: Bigdam by SATOSHI TAGOMORI
Planet-scale Data Ingestion Pipeline: BigdamPlanet-scale Data Ingestion Pipeline: Bigdam
Planet-scale Data Ingestion Pipeline: Bigdam
SATOSHI TAGOMORI6.3K views
Automating Workflows for Analytics Pipelines by Sadayuki Furuhashi
Automating Workflows for Analytics PipelinesAutomating Workflows for Analytics Pipelines
Automating Workflows for Analytics Pipelines
Sadayuki Furuhashi1.9K views
Fluentd - Flexible, Stable, Scalable by Shu Ting Tseng
Fluentd - Flexible, Stable, ScalableFluentd - Flexible, Stable, Scalable
Fluentd - Flexible, Stable, Scalable
Shu Ting Tseng14.2K views
Ruby and Distributed Storage Systems by SATOSHI TAGOMORI
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
SATOSHI TAGOMORI13.3K views
Fluentd and Docker - running fluentd within a docker container by Treasure Data, Inc.
Fluentd and Docker - running fluentd within a docker containerFluentd and Docker - running fluentd within a docker container
Fluentd and Docker - running fluentd within a docker container
Treasure Data, Inc.5.6K views
Presto at Facebook - Presto Meetup @ Boston (10/6/2015) by Martin Traverso
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Presto at Facebook - Presto Meetup @ Boston (10/6/2015)
Martin Traverso3.5K views
Empowering developers to deploy their own data stores by Tomas Doran
Empowering developers to deploy their own data storesEmpowering developers to deploy their own data stores
Empowering developers to deploy their own data stores
Tomas Doran23.1K views
Why we love pgpool-II and why we hate it! by PGConf APAC
Why we love pgpool-II and why we hate it!Why we love pgpool-II and why we hate it!
Why we love pgpool-II and why we hate it!
PGConf APAC6K views
Natural Language Query and Conversational Interface to Apache Spark by Databricks
Natural Language Query and Conversational Interface to Apache SparkNatural Language Query and Conversational Interface to Apache Spark
Natural Language Query and Conversational Interface to Apache Spark
Databricks217 views
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El... by PROIDEA
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
DOD 2016 - Rafał Kuć - Building a Resilient Log Aggregation Pipeline Using El...
PROIDEA243 views
Technologies for Data Analytics Platform by N Masahiro
Technologies for Data Analytics PlatformTechnologies for Data Analytics Platform
Technologies for Data Analytics Platform
N Masahiro9.2K views
Digdagによる大規模データ処理の自動化とエラー処理 by Sadayuki Furuhashi
Digdagによる大規模データ処理の自動化とエラー処理Digdagによる大規模データ処理の自動化とエラー処理
Digdagによる大規模データ処理の自動化とエラー処理
Sadayuki Furuhashi23.7K views
Open Source Software, Distributed Systems, Database as a Cloud Service by SATOSHI TAGOMORI
Open Source Software, Distributed Systems, Database as a Cloud ServiceOpen Source Software, Distributed Systems, Database as a Cloud Service
Open Source Software, Distributed Systems, Database as a Cloud Service
SATOSHI TAGOMORI5.4K views

Similar to Presto - Hadoop Conference Japan 2014

SQL on Hadoop in Taiwan by
SQL on Hadoop in TaiwanSQL on Hadoop in Taiwan
SQL on Hadoop in TaiwanTreasure Data, Inc.
9.4K views64 slides
SQL for Everything at CWT2014 by
SQL for Everything at CWT2014SQL for Everything at CWT2014
SQL for Everything at CWT2014N Masahiro
1.7K views29 slides
Treasure Data and OSS by
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSSN Masahiro
6K views39 slides
Understanding Presto - Presto meetup @ Tokyo #1 by
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
14.1K views39 slides
Prestogres, ODBC & JDBC connectivity for Presto by
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for PrestoSadayuki Furuhashi
6.3K views18 slides
Big Data and Hadoop by
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoopch adnan
78 views25 slides

Similar to Presto - Hadoop Conference Japan 2014(20)

SQL for Everything at CWT2014 by N Masahiro
SQL for Everything at CWT2014SQL for Everything at CWT2014
SQL for Everything at CWT2014
N Masahiro1.7K views
Treasure Data and OSS by N Masahiro
Treasure Data and OSSTreasure Data and OSS
Treasure Data and OSS
N Masahiro6K views
Understanding Presto - Presto meetup @ Tokyo #1 by Sadayuki Furuhashi
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
Sadayuki Furuhashi14.1K views
Prestogres, ODBC & JDBC connectivity for Presto by Sadayuki Furuhashi
Prestogres, ODBC & JDBC connectivity for PrestoPrestogres, ODBC & JDBC connectivity for Presto
Prestogres, ODBC & JDBC connectivity for Presto
Sadayuki Furuhashi6.3K views
Big Data and Hadoop by ch adnan
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
ch adnan78 views
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P... by viirya
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
Speed up Interactive Analytic Queries over Existing Big Data on Hadoop with P...
viirya3.1K views
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey by Alluxio, Inc.
Modernizing Global Shared Data Analytics Platform and our Alluxio JourneyModernizing Global Shared Data Analytics Platform and our Alluxio Journey
Modernizing Global Shared Data Analytics Platform and our Alluxio Journey
Alluxio, Inc.344 views
Big data-at-detik by k4ndar
Big data-at-detikBig data-at-detik
Big data-at-detik
k4ndar1.6K views
Boston Hadoop Meetup: Presto for the Enterprise by Matt Fuller
Boston Hadoop Meetup: Presto for the EnterpriseBoston Hadoop Meetup: Presto for the Enterprise
Boston Hadoop Meetup: Presto for the Enterprise
Matt Fuller2.9K views
Fluentd - RubyKansai 65 by N Masahiro
Fluentd - RubyKansai 65Fluentd - RubyKansai 65
Fluentd - RubyKansai 65
N Masahiro2.9K views
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da... by MongoDB
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
MongoDB1.6K views
Hadoop Big Data A big picture by J S Jodha
Hadoop Big Data A big pictureHadoop Big Data A big picture
Hadoop Big Data A big picture
J S Jodha727 views
Netflix Data Engineering @ Uber Engineering Meetup by Blake Irvine
Netflix Data Engineering @ Uber Engineering MeetupNetflix Data Engineering @ Uber Engineering Meetup
Netflix Data Engineering @ Uber Engineering Meetup
Blake Irvine4.7K views
Finding the needles in the haystack. An Overview of Analyzing Big Data with H... by Chris Baglieri
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Finding the needles in the haystack. An Overview of Analyzing Big Data with H...
Chris Baglieri1.3K views
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da... by MongoDB
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
Lightning Talk: Why and How to Integrate MongoDB and NoSQL into Hadoop Big Da...
MongoDB1K views

More from Sadayuki Furuhashi

Scripting Embulk Plugins by
Scripting Embulk PluginsScripting Embulk Plugins
Scripting Embulk PluginsSadayuki Furuhashi
2.6K views14 slides
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019 by
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Sadayuki Furuhashi
4.7K views63 slides
Making KVS 10x Scalable by
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x ScalableSadayuki Furuhashi
3.4K views39 slides
DigdagはなぜYAMLなのか? by
DigdagはなぜYAMLなのか?DigdagはなぜYAMLなのか?
DigdagはなぜYAMLなのか?Sadayuki Furuhashi
5.5K views35 slides
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11 by
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11Sadayuki Furuhashi
71.1K views49 slides
Embulk - 進化するバルクデータローダ by
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダSadayuki Furuhashi
4.9K views35 slides

More from Sadayuki Furuhashi(20)

Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019 by Sadayuki Furuhashi
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Performance Optimization Techniques of MessagePack-Ruby - RubyKaigi 2019
Sadayuki Furuhashi4.7K views
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11 by Sadayuki Furuhashi
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
分散ワークフローエンジン『Digdag』の実装 at Tokyo RubyKaigi #11
Sadayuki Furuhashi71.1K views
Embulk - 進化するバルクデータローダ by Sadayuki Furuhashi
Embulk - 進化するバルクデータローダEmbulk - 進化するバルクデータローダ
Embulk - 進化するバルクデータローダ
Sadayuki Furuhashi4.9K views
Plugin-based software design with Ruby and RubyGems by Sadayuki Furuhashi
Plugin-based software design with Ruby and RubyGemsPlugin-based software design with Ruby and RubyGems
Plugin-based software design with Ruby and RubyGems
Sadayuki Furuhashi7.9K views
Embulk, an open-source plugin-based parallel bulk data loader by Sadayuki Furuhashi
Embulk, an open-source plugin-based parallel bulk data loaderEmbulk, an open-source plugin-based parallel bulk data loader
Embulk, an open-source plugin-based parallel bulk data loader
Sadayuki Furuhashi226.2K views
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual by Sadayuki Furuhashi
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasualWhat's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
What's new in v11 - Fluentd Casual Talks #3 #fluentdcasual
Sadayuki Furuhashi9.2K views
Programming Tools and Techniques #369 - The MessagePack Project by Sadayuki Furuhashi
Programming Tools and Techniques #369 - The MessagePack ProjectProgramming Tools and Techniques #369 - The MessagePack Project
Programming Tools and Techniques #369 - The MessagePack Project
NoSQL afternoon in Japan Kumofs & MessagePack by Sadayuki Furuhashi
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
Sadayuki Furuhashi2.4K views

Recently uploaded

OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru... by
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...NETWAYS
7 views20 slides
231121 SP slides - PAS workshop November 2023.pdf by
231121 SP slides - PAS workshop November 2023.pdf231121 SP slides - PAS workshop November 2023.pdf
231121 SP slides - PAS workshop November 2023.pdfPAS_Team
150 views15 slides
Timeahead Agency Pitch Deck.pdf by
Timeahead Agency Pitch Deck.pdfTimeahead Agency Pitch Deck.pdf
Timeahead Agency Pitch Deck.pdfHabib-ur- Rehman
9 views13 slides
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn by
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp KrennOSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp KrennNETWAYS
22 views25 slides
SOA PPT ON SEA TURTLES.pptx by
SOA PPT ON SEA TURTLES.pptxSOA PPT ON SEA TURTLES.pptx
SOA PPT ON SEA TURTLES.pptxEuniceOseiYeboah
9 views18 slides
Helko van den Brom - VSL by
Helko van den Brom - VSLHelko van den Brom - VSL
Helko van den Brom - VSLDutch Power
63 views18 slides

Recently uploaded(20)

OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru... by NETWAYS
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
OSMC 2023 | IGNITE: Metrics, Margins, Mutiny – How to make your SREs (not) ru...
NETWAYS7 views
231121 SP slides - PAS workshop November 2023.pdf by PAS_Team
231121 SP slides - PAS workshop November 2023.pdf231121 SP slides - PAS workshop November 2023.pdf
231121 SP slides - PAS workshop November 2023.pdf
PAS_Team150 views
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn by NETWAYS
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp KrennOSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn
OSMC 2023 | Will ChatGPT Take Over My Job? by Philipp Krenn
NETWAYS22 views
Helko van den Brom - VSL by Dutch Power
Helko van den Brom - VSLHelko van den Brom - VSL
Helko van den Brom - VSL
Dutch Power63 views
Christan van Dorst - Hyteps by Dutch Power
Christan van Dorst - HytepsChristan van Dorst - Hyteps
Christan van Dorst - Hyteps
Dutch Power65 views
Synthetic Biology.pptx by ShubNoor4
Synthetic Biology.pptxSynthetic Biology.pptx
Synthetic Biology.pptx
ShubNoor45 views
Roozbeh Torkzadeh - TU Eindhoven by Dutch Power
Roozbeh Torkzadeh - TU EindhovenRoozbeh Torkzadeh - TU Eindhoven
Roozbeh Torkzadeh - TU Eindhoven
Dutch Power62 views
Post-event report intro session-1.docx by RohitRathi59
Post-event report intro session-1.docxPost-event report intro session-1.docx
Post-event report intro session-1.docx
RohitRathi5912 views
BLogSite (Web Programming) (1).pdf by Fiverr
BLogSite (Web Programming) (1).pdfBLogSite (Web Programming) (1).pdf
BLogSite (Web Programming) (1).pdf
Fiverr11 views
Gym Members Community.pptx by nasserbf1987
Gym Members Community.pptxGym Members Community.pptx
Gym Members Community.pptx
nasserbf19876 views
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf by NETWAYS
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf
OSMC | SNMP Monitoring at scale by Rocco Pezzani & Thomas Gelf
NETWAYS11 views

Presto - Hadoop Conference Japan 2014

  • 1. Sadayuki Furuhashi Founder & Software Architect Treasure Data, inc. PrestoInteractive SQL Query Engine for Big Data Hadoop Conference in Japan 2014
  • 2. A little about me... > Sadayuki Furuhashi > github/twitter: @frsyuki > Treasure Data, Inc. > Founder & Software Architect > Open-source hacker > MessagePack - efficient object serializer > Fluentd - data collection tool > ServerEngine - Ruby framework to build multiprocess servers > LS4 - distributed object storage system > kumofs - distributed key-value data store
  • 4. What’s Presto? A distributed SQL query engine for interactive data analisys against GBs to PBs of data.
  • 5. Presto’s history > 2012 Fall: Project started at Facebook > Designed for interactive query > with speed of commercial data warehouse > and scalability to the size of Facebook > 2013 Winter: Open sourced! > 30+ contributes in 6 months > including people from outside of Facebook
  • 6. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 7. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 8. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 9. What’s the problems to solve? > We couldn’t visualize data in HDFS directly using dashboards or BI tools > because Hive is too slow (not interactive) > or ODBC connectivity is unavailable/unstable > We needed to store daily-batch results to an interactive DB for quick response (PostgreSQL, Redshift, etc.) > Interactive DB costs more and less scalable by far > Some data are not stored in HDFS > We need to copy the data into HDFS to analyze
  • 10. HDFS Hive PostgreSQL, etc. Daily/Hourly Batch Interactive query Commercial BI Tools Batch analysis platform Visualization platform Dashboard
  • 11. HDFS Hive PostgreSQL, etc. Daily/Hourly Batch Interactive query ✓ Less scalable ✓ Extra cost Commercial BI Tools Dashboard ✓ More work to manage 2 platforms ✓ Can’t query against “live”data directly Batch analysis platform Visualization platform
  • 12. HDFS Hive Dashboard Presto PostgreSQL, etc. Daily/Hourly Batch HDFS Hive Dashboard Daily/Hourly Batch Interactive query Interactive query
  • 14. Presto HDFS Hive Dashboard Daily/Hourly Batch Interactive query Cassandra MySQL Commertial DBs SQL on any data sets Commercial BI Tools ✓ IBM Cognos ✓ Tableau ✓ ... Data analysis platform
  • 15. dashboard on chart.io: https://chartio.com/
  • 16. What can Presto do? > Query interactively (in milli-seconds to minues) > MapReduce and Hive are still necessary for ETL > Query using commercial BI tools or dashboards > Reliable ODBC/JDBC connectivity > Query across multiple data sources such as Hive, HBase, Cassandra, or even commertial DBs > Plugin mechanism > Integrate batch analisys + visualization into a single data analysis platform
  • 17. Presto’s deployment > Facebook > Multiple geographical regions > scaled to 1,000 nodes > actively used by 1,000+ employees > who run 30,000+ queries every day > processing 1PB/day > Netflix, Dropbox, Treasure Data, Airbnb, Qubole > Presto as a Service
  • 18. Today’s talk 1. Distributed architecture 2. Data visualization - Demo 3. Query Execution - Presto vs. MapReduce 4. Monitoring & Configuration 5. Roadmap - the future
  • 21. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 1. find servers in a cluster
  • 22. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 2. Client sends a query using HTTP
  • 23. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 3. Coordinator builds a query plan Connector plugin provides metadata (table schema, etc.)
  • 24. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 4. Coordinator sends tasks to workers
  • 25. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 5. Workers read data through connector plugin
  • 26. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 6. Workers run tasks in memory
  • 27. Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service 7. Client gets the result from a worker Client
  • 29. What’s Connectors? > Connectors are plugins to Presto > written in Java > Access to storage and metadata > provide table schema to coordinators > provide table rows to workers > Implementations: > Hive connector > Cassandra connector > MySQL through JDBC connector (prerelease) > Or your own connector
  • 32. Client Coordinator other connectors ... Worker Worker Worker Cassandra Discovery Service find servers in a cluster Hive Connector HDFS / Metastore Multiple connectors in a query Cassandra Connector Other data sources...
  • 33. 1. Distributed architecture > 3 type of servers: > Coordinator, worker, discovery service > Get data/metadata through connector plugins. > Presto is NOT a database > Presto provides SQL to existent data stores > Client protocol is HTTP + JSON > Language bindings: Ruby, Python, PHP, Java (JDBC), R, Node.JS...
  • 34. Client Coordinator Connector Plugin Worker Worker Worker Storage / Metadata Discovery Service Coordinator Coordinator HA
  • 36. The problems to use BI tools > BI tools need ODBC or JDBC connectivity > Tableau, IBM Cognos, QlickView, Chart.IO, ... > JasperSoft, Pentaho, MotionBoard, ... > ODBC/JDBC is VERY COMPLICATED > Matured implementation needs LONG time
  • 37. A solution: PostgreSQL protocol > Creating a PostgreSQL protocol gateway > Using PostgreSQL’s stable ODBC / JDBC driver https://github.com/treasure-data/prestogres
  • 38. How Prestogres works? 2. select run_presto_as_temp_table( ‘presto_result’,‘SELECT COUNT(1) FROM tbl1’); pgpool-II + patchclient 1. SELECT COUNT(1) FROM tbl1 4. SELECT * FROM presto_result; PostgreSQL 3.“run_persto_as_temp_table”function runs query on Presto Coordinator
  • 39. Demo
  • 40. 2. Data visualization with Presto > Data visualization tools need ODBC/JDBC driver > but implemetation takes LONG time > A solution is to use PostgreSQL protocol > and use PostgreSQL’s ODBC/JDBC driver > Prestogres is already confirmed to work with some commertial BI tools
  • 42. Presto’s execution model > Presto is NOT MapReduce > Presto’s query plan is based on DAG > more like Apache Tez or traditional MPP databases
  • 43. How query runs? > Coordinator > SQL Parser > Query Planner > Execution planner > Workers > Task execution scheduler
  • 44. SQL SQL Parser AST Logical Planner Distributed Planner Logical Query Plan Execution Planner Discovery Server Connector Distributed Query Plan Execution Plan Optimizer NodeManager ✓ node list ✓ table schema Metadata
  • 45. SQL SQL Parser SQL Distributed Planner Logical Query Plan Execution Planner Discovery Service Connector Query Plan Execution Plan Optimizer NodeManager ✓ node list ✓ table schema Metadata (today’s talk) Query Planner
  • 46. Query Planner SELECT name, count(*) AS c FROM impressions GROUP BY name SQL impressions ( name varchar time bigint ) Table schema Table scan (name:varchar) GROUP BY (name, count(*)) Output (name, c) + Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange Logical query plan Distributed query plan
  • 47. Query Planner - Stages Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange inter-worker data transfer pipelined aggregation inter-worker data transfer Stage-0 Stage-1 Stage-2
  • 48. Sink Partial aggregation Table scan Sink Partial aggregation Table scan Execution Planner + Node list ✓ 2 workers Sink Final aggregation Exchange Output Exchange Sink Final aggregation Exchange Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange Worker 1 Worker 2
  • 49. Execution Planner - Tasks Sink Final aggregation Exchange Sink Partial aggregation Table scan Sink Final aggregation Exchange Sink Partial aggregation Table scan Task 1 task / worker / stage ✓ All tasks in parallel Output Exchange Worker 1 Worker 2
  • 50. Execution Planner - Split Sink Final aggregation Exchange Sink Partial aggregation Table scan Sink Final aggregation Exchange Sink Partial aggregation Table scan Output Exchange Split many splits / task = many threads / worker (table scan) 1 split / task = 1 thread / worker Worker 1 Worker 2 1 split / worker = 1 thread / worker
  • 51. All stages are pipe-lined ✓ No wait time ✓ No fault-tolerance MapReduce vs. Presto MapReduce Presto map map reduce reduce task task task task task task memory-to-memory data transfer ✓ No disk IO ✓ Data chunk must fit in memory task disk map map reduce reduce disk disk Write data to disk Wait between stages
  • 52. 3. Query Execution > SQL is converted into stages, tasks and splits > All tasks run in parallel > No wait time between stages (pipelined) > If one task fails, all tasks fail at once (query fails) > Memory-to-memory data transfer > No disk IO > If aggregated data doesn’t fit in memory, query fails • Note: query dies but worker doesn’t die. Memory consumption of all queries is fully managed
  • 53. 4. Monitoring & Configuration
  • 54. Monitoring > Web UI > basic query status check > JMX HTTP API > GET /v1/jmx/mbean[/{objectName}] • com.facebook.presto.execution:name=TaskManager • com.facebook.presto.execution:name=QueryManager • com.facebook.presto.execution:name=NodeScheduler > Event notification (remote logging) > POST http://remote.server/v2/event • query start, query complete, split complete
  • 55. Configuration > Execution planning (for coordinator) > query.initial-hash-partitions • max number of hash buckets (=tasks) of a GROUP BY (default: 8) > node-scheduler.min-candidates • max number of workers to run a stage in parallel (default: 10) > node-scheduler.include-coordinator • whether run tasks only on workers or include coordinator > query.schedule-split-batch-size • number of splits of a stage to start at once
  • 56. Configuration > Task execution (for workers) > task.cpu-timer-enabled • enable detailed statistics (causes some overhead) (default: true) > task.max-memory • memory limit of a task especially for hash tables used by GROUP BY and JOIN operations (default: 256MB) • enlarge if you get“Task exceeded max memory size”error > task.shard.max-threads • max number of threads of a worker to run active splits (default: number of CPU cores * 4)
  • 57. 5. Roadmap A report of Presto Meetup 2014 http://www.slideshare.net/dain1/presto-meetup-20140514-34731104 "Presto, Past, Present, and Future" by Dain Sundstrom at Facebook
  • 58. Presto’s future > Huge JOIN and GROUP BY > Spill to disk > Task recovery > CREATE VIEW (※implemented) > Native store (※implemented) > Fast data store in Presto workers > to cache hot data > Authentication and permissions
  • 59. Presto’s future > DDL/DML statements > CREATE TABLE with partitioning > DELETE and INSERT > Plugin repository > CLI plugin manager > JOIN and aggregation pushdown > Custom optimizers
  • 60. Links > Web site & document > http://prestodb.io > Mailing list > https://groups.google.com/group/presto-users > Github > https://github.com/facebook/presto > Guidelines for contribution > https://github.com/facebook/presto/blob/master/CONTRIBUTING.md
  • 61. Check: www.treasuredata.com Cloud service for the entire data pipeline, including Presto. We’re hiring!