SlideShare a Scribd company logo
Presto in Treasure Data
(Updated)
Mitsunori Komatsu
• Mitsunori Komatsu, 

Software engineer @ Treasure Data.
• Presto, Hive, PlazmaDB, td-android-sdk, 

td-ios-sdk, Mobile SDK backend,

embedded-sdk
• github:komamitsu,

msgpack-java committer, Presto contributor,
etc…
About me
Today’s talk
• What's Presto?
• Pros & Cons
• Architecture
• Recent updates
• Who uses Presto?
• How do we use Presto?
What’s Presto?
Fast
• Distributed SQL query engine (MPP)
• Low latency and good performance
• No disk IO
• Pipelined execution (not Map Reduce)
• Compile a query plan down to byte code
• Off heap memory
• Suitable for ad-hoc query
Pluggable
• Pluggable backends (“connectors”)
• Cassandra / Hive / JMX / Kafka / MySQL /
PostgreSQL / System / TPCH
• We can add a new connector by 

extending SPI
• Treasure Data has been developed a
connector to access our storage
What kind of SQL
• Supports ANSI SQL (Not HiveQL)
• Easy to use Presto compared to HiveQL
• Structural type: Map, Array, JSON, Row
• Window functions
• Approximate queries
• http://blinkdb.org/
Limitations
• Fails with huge JOIN or DISTINCT
• In memory only (broadcast / distributed JOIN)
• No grace / hybrid hash join
• No fault tolerance
• Coordinator is SPOF
• No “cost based” optimization
• No authentication / authorization
• No native ODBC => Prestogres
Limitations
• Fails with huge JOIN or DISTINCT
• In memory only (broadcast / distributed JOIN)
• No grace / hybrid hash join
• No fault tolerance
• Coordinator is SPOF
• No “cost based” optimization
• No authentication / authorization
• No native ODBC => Prestogres
https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
Architectural overview
https://prestodb.io/overview.html
With Hive connector
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 1
Stage 2
Stage 0
Query, stage, task and split
Query
Task 0.0
Split
Task 1.0
Split
Task 1.1 Task 1.2
Split Split Split
Task 2.0
Split
Task 2.1 Task 2.2
Split Split Split Split Split Split Split
Split
For example…
TableScan
(FROM)
Aggregation
(GROUP BY)
Output
@worker#2 @worker#3 @worker#0
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Stage 1
Query plan
Output[nationkey, _col1] => [nationkey:bigint, count:bigint]

- _col1 := count
Exchange[GATHER] => nationkey:bigint, count:bigint
Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]

- count := "count"("count_15")
Exchange[REPARTITION] => nationkey:bigint, count_15:bigint
Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint]
- count_15 := "count"("expr")
Project => [nationkey:bigint, expr:bigint]
- expr := 1
InnerJoin[("custkey" = "custkey_0")] =>
[custkey:bigint, custkey_0:bigint, nationkey:bigint]
Project => [custkey:bigint]
Filter[("orderpriority" = '1-URGENT')] =>
[custkey:bigint, orderpriority:varchar]
TableScan[tpch:tpch:orders:sf0.01, original constraint=

('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]

- custkey := tpch:custkey:1

- orderpriority := tpch:orderpriority:5
Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint
TableScan[tpch:tpch:customer:sf0.01, original constraint=true] =>
[custkey_0:bigint, nationkey:bigint]

- custkey_0 := tpch:custkey:0

- nationkey := tpch:nationkey:3
select

c.nationkey,

count(1)

from orders o
join customer c

on o.custkey = c.custkey
where
o.orderpriority = '1-URGENT'
group by c.nationkey
Stage 3
Stage 2
Stage 1
Stage 0
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Presto Cli
Coordinator
- Parse Query
- Analyze Query
- Create Query Plan
- Execute Query
- Contains Stages
- Execute Stages
- Contains Tasks
- Issue Tasks
Discovery Service
Worker
Worker
- Execute Tasks
- Convert Query to Java
Bytecode (Operator)
- Execute Operator
Connector
- MetaData
- Table, Column…
- SplitManager
- Split, …
Connector
- RecordSetProvider
- RecordSet
- RecordCursor
- Read Storage
Connector
Storage
Worker
Connector
External

Metadata?
What each component does?
Multi connectors
Presto MySQL
- test.users
PostgreSQL
- public.accesses
- public.codes
Raptor
- default.

user_codes
create table raptor.default.user_codes
as
select c.text, u.name, count(1) as count
from postgres.public.accesses a
join mysql.test.users u
on cast(a.user as bigint) = u.id
join postgres.public.codes c
on a.code = c.code
where a.time < 1200000000
connector.name=mysql
connection-url=jdbc:mysql://127.0.0.1:3306
connection-user=root
- etc/catalog/mysql.properties
connector.name=postgresql
connection-url=jdbc:postgresql://127.0.0.1:5432/postgres
connection-user=komamitsu
- etc/catalog/postgres.properties
connector.name=raptor
metadata.db.type=h2
metadata.db.filename=var/data/db/MetaStore
- etc/catalog/postgres.properties
Multi connectors
:
2015-08-30T22:29:28.882+0900 DEBUG 20150830_132930_00032_btyir.
4.0-0-50 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT `id`, `name` FROM `test`.`users`
:
2015-08-30T22:29:28.856+0900 DEBUG 20150830_132930_00032_btyir.
5.0-0-56 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT "text", "code" FROM “public"."codes"
:
2015-08-30T22:30:09.294+0900 DEBUG 20150830_132930_00032_btyir.
3.0-2-70 com.facebook.presto.plugin.jdbc.JdbcRecordCursor
Executing: SELECT "user", "code", "time" FROM "public"."accesses" WHERE
(("time" < 1200000000))
:
- log message
Condition pushdown
Join pushdown isn’t supported
Multi connectors
Recent updates (0.108~0.117)
New functions:
normalize(), from_iso8601_timestamp(),
from_iso8601_date(), to_iso8601()
slice(), md5(), array_min(), array_max(), histogram()
element_at(), url_encode(), url_decode()
sha1(), sha256(), sha512()
multimap_agg(), checksum()
Teradata compatibility functions:
index(), char2hexint(), to_char(),
to_date(), to_timestamp()
Recent updates (0.108~0.117)
Cluster Resource Management.
- query.max-memory
- query.max-memory-per-node
- resources.reserved-system-memory
“Big Query” option was removed
Semi-joins are hash-partitioned if distributed_join is turned on.
Add support for partial cast from JSON.
For example, json can be cast to array<json>, map<varchar, json>, etc.
Use JSON_PARSE() and JSON_FORMAT() instead of CAST
Add query_max_run_time session property and query.max-run-time
config.
Queries are failed after the specified duration.
optimizer.optimize-hash-generation and distributed-joins-enabled are
both enabled by default now.
Recent updates (0.108~0.117)
Cluster Resource Management.
- query.max-memory
- query.max-memory-per-node
- resources.reserved-system-memory
“Big Query” option was removed
Semi-joins are hash-partitioned if distributed_join is turned on.
Add support for partial cast from JSON.
For example, json can be cast to array<json>, map<varchar, json>, etc.
Use JSON_PARSE() and JSON_FORMAT() instead of CAST
Add query_max_run_time session property and query.max-run-time
config.
Queries are failed after the specified duration.
optimizer.optimize-hash-generation and distributed-joins-enabled are
both enabled by default now.
https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
Who uses Presto?
• Facebook
http://www.slideshare.net/dain1/presto-meetup-2015
• Dropbox
Who uses Presto?
• Airbnb
Who uses Presto?
• Qubole
• SaaS
• Treasure Data
• SaaS
• Teradata
• commercial support
Who uses Presto?
As a service…
Today’s talk
• What's Presto?
• How do we use Presto?
• What’s Treasure Data?
• System architecture
• How we manage Presto
What’s Treasure Data?
What’s Treasure Data?
and more…
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
Founded in 2011 in the U.S.
Over 85 employees
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
What’s Treasure Data?
and more…
Founded in 2011 in the U.S.
Over 85 employees
• Collect, store and analyze data
• With multi-tenancy
• With a schema-less structure
• On cloud, but mitigates the outage
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Time to Value
Send query result 
Result Push
Acquire
 Analyze
Store
Plazma DB
Flexible, Scalable,
Columnar Storage
Web Log
App Log
Censor
CRM
ERP
RDBMS
Treasure Agent(Server)
SDK(JS, Android, iOS, Unity)
Streaming Collector
Batch /
Reliability
Ad-hoc /

Low latency
KPI$
KPI Dashboard
BI Tools
Other Products
RDBMS, Google Docs,
AWS S3, FTP Server, etc.
Metric Insights 
Tableau, 
Motion Board etc. 
POS
REST API
ODBC / JDBC
SQL, Pig 
Bulk Uploader
Embulk,

TD Toolbelt
SQL-based query
@AWS or @IDCF
Connectivity
Economy & Flexibility Simple & Supported
Collect! Store! Analyze!
Integrations?
Treasure Data in figures
Some of our customers
System architecture
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
result bucket
(S3/RiakCS) Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Retry failed query
if needed
Authentication /
Authorization
Columnar file format.
Schema-less.
td-presto
connector
result bucket
(S3/RiakCS)
Components in Treasure Data
Schema on read
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 GET
2015-06-01 10:11:30 200 POST
2015-06-01 10:20:45 200 GET
2015-06-01 10:33:50 400 GET 206
2015-06-01 10:40:11 200 GET 852
2015-06-01 10:51:32 200 PUT 1223
2015-06-01 10:58:02 200 GET 5118
2015-06-01 11:02:11 404 GET 12
2015-06-01 11:14:27 200 GET 3447
access_logs table
User added a new
column “user_id” in
imported data
User can select
this column with
only adding it to
the schema
(w/o reconstruct
the table)
Schema on read
Columnar file format
time code method user_id
2015-06-01 10:07:11 200 GET
2015-06-01 10:10:12 “200” GET
2015-06-01 10:10:20 200 GET
2015-06-01 10:11:30 200 POST
2015-06-01 10:20:45 200 GET
2015-06-01 10:33:50 400 GET 206
2015-06-01 10:40:11 200 GET 852
2015-06-01 10:51:32 200 PUT 1223
2015-06-01 10:58:02 200 GET 5118
2015-06-01 11:02:11 404 GET 12
2015-06-01 11:14:27 200 GET 3447
access_logs table
time
code
method
user_id
Columnar file
format
This query accesses
only code column


select code,
count(1) from tbl
group by code
td-presto connector
• MessagePack v07
• off heap
• avoiding “TypeProfile”
• Async IO with Jetty-client
• Scheduling & Resource
management
How we manage Presto
• Blue-Green Deployment
• Stress test tool
• Monitoring with DataDog
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production
rc
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production
rcTest Test Test!
Blue-Green Deployment
worker queue
(MySQL)
api server
td worker
process
plazmadb
(PostgreSQL +
S3/RiakCS)
select user_id,
count(1) from
…
Presto
coordinator
result bucket
(S3)
Presto
coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
worker
production!Release stable cluster
No downtime
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Stress test tool
• Collect queries that has ever caused issues.
• Add a new query with just adding this entry.
• Issue the query, gets the result and
implements a calculated digest
automatically.

• We can send all the queries including very
heavy ones (around 6000 stages) to Presto
- job_id: 28889999
- result: 227d16d801a9a43148c2b7149ce4657c
- job_id: 28889999
Monitoring with DataDog
Presto coordinator
Presto
worker
Presto
worker
Presto
worker
Presto
worker
Presto
process
td-agent
in_presto_metrics/v1/jmx/mbean
/v1/query
/v1/node
out_metricsense
DataDog
Monitoring with DataDog
Query stalled time
- Most important for us.
- It triggers alert calls to us…
- It can be mainly increased by td-presto
connector problems. Most of them are race
condition issue.
How many queries processed
More than 20000 queries / day
How many queries processed
Most queries finish within 1min
Analytics Infrastructure, Simplified in the Cloud
We’re hiring! https://jobs.lever.co/treasure-data

More Related Content

What's hot

Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
Databricks
 
Pycon2017 이성용 Dances with the Last Samurai
Pycon2017 이성용 Dances with the Last SamuraiPycon2017 이성용 Dances with the Last Samurai
Pycon2017 이성용 Dances with the Last Samurai
SungYong Lee
 
Building spatial applications with Google Cloud SQL and Google Maps API
Building spatial applications with Google Cloud SQL and Google Maps APIBuilding spatial applications with Google Cloud SQL and Google Maps API
Building spatial applications with Google Cloud SQL and Google Maps API
Vivek Tripathi
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL AdministrationCommand Prompt., Inc
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
Databricks
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
Databricks
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Databricks
 
カラムストアインデックス 最初の一歩
カラムストアインデックス 最初の一歩カラムストアインデックス 最初の一歩
カラムストアインデックス 最初の一歩Masayuki Ozawa
 
MySQLとPostgreSQLの基本的なパラメータ比較
MySQLとPostgreSQLの基本的なパラメータ比較MySQLとPostgreSQLの基本的なパラメータ比較
MySQLとPostgreSQLの基本的なパラメータ比較
Shinya Sugiyama
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
Shubham Tagra
 
주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기
InfraEngineer
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Timothy Spann
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Databricks
 
Dbts 分散olt pv2
Dbts 分散olt pv2Dbts 分散olt pv2
Dbts 分散olt pv2
Takashi Kambayashi
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
Dongwon Kim
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
PgDay.Seoul
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
Kasun Don
 
Déploiement ELK en conditions réelles
Déploiement ELK en conditions réellesDéploiement ELK en conditions réelles
Déploiement ELK en conditions réelles
Geoffroy Arnoud
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
Kaxil Naik
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
Manish Gupta
 

What's hot (20)

Productizing Structured Streaming Jobs
Productizing Structured Streaming JobsProductizing Structured Streaming Jobs
Productizing Structured Streaming Jobs
 
Pycon2017 이성용 Dances with the Last Samurai
Pycon2017 이성용 Dances with the Last SamuraiPycon2017 이성용 Dances with the Last Samurai
Pycon2017 이성용 Dances with the Last Samurai
 
Building spatial applications with Google Cloud SQL and Google Maps API
Building spatial applications with Google Cloud SQL and Google Maps APIBuilding spatial applications with Google Cloud SQL and Google Maps API
Building spatial applications with Google Cloud SQL and Google Maps API
 
Mastering PostgreSQL Administration
Mastering PostgreSQL AdministrationMastering PostgreSQL Administration
Mastering PostgreSQL Administration
 
Large Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured StreamingLarge Scale Lakehouse Implementation Using Structured Streaming
Large Scale Lakehouse Implementation Using Structured Streaming
 
Change Data Feed in Delta
Change Data Feed in DeltaChange Data Feed in Delta
Change Data Feed in Delta
 
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang WangApache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
Apache Spark Data Source V2 with Wenchen Fan and Gengliang Wang
 
カラムストアインデックス 最初の一歩
カラムストアインデックス 最初の一歩カラムストアインデックス 最初の一歩
カラムストアインデックス 最初の一歩
 
MySQLとPostgreSQLの基本的なパラメータ比較
MySQLとPostgreSQLの基本的なパラメータ比較MySQLとPostgreSQLの基本的なパラメータ比較
MySQLとPostgreSQLの基本的なパラメータ比較
 
Presto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analystsPresto best practices for Cluster admins, data engineers and analysts
Presto best practices for Cluster admins, data engineers and analysts
 
주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기주니어의 쿠버네티스 생태계에서 살아남기
주니어의 쿠버네티스 생태계에서 살아남기
 
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
Designing Event-Driven Applications with Apache NiFi, Apache Flink, Apache Sp...
 
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the HoodRadical Speed for SQL Queries on Databricks: Photon Under the Hood
Radical Speed for SQL Queries on Databricks: Photon Under the Hood
 
Dbts 分散olt pv2
Dbts 分散olt pv2Dbts 分散olt pv2
Dbts 分散olt pv2
 
Hive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmarkHive, Presto, and Spark on TPC-DS benchmark
Hive, Presto, and Spark on TPC-DS benchmark
 
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG[Pgday.Seoul 2018]  이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
[Pgday.Seoul 2018] 이기종 DB에서 PostgreSQL로의 Migration을 위한 DB2PG
 
Kafka Connect - debezium
Kafka Connect - debeziumKafka Connect - debezium
Kafka Connect - debezium
 
Déploiement ELK en conditions réelles
Déploiement ELK en conditions réellesDéploiement ELK en conditions réelles
Déploiement ELK en conditions réelles
 
Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0Airflow Best Practises & Roadmap to Airflow 2.0
Airflow Best Practises & Roadmap to Airflow 2.0
 
Real-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFiReal-Time Data Flows with Apache NiFi
Real-Time Data Flows with Apache NiFi
 

Viewers also liked

Bq sushi(BigQuery lessons learned)
Bq sushi(BigQuery lessons learned)Bq sushi(BigQuery lessons learned)
Bq sushi(BigQuery lessons learned)
(shibao)芝尾 (kouichiro)幸一郎
 
Tuning Tips
Tuning TipsTuning Tips
Tuning Tips
Jun Shimizu
 
SQLチューニング勉強会資料
SQLチューニング勉強会資料SQLチューニング勉強会資料
SQLチューニング勉強会資料
Shinnosuke Akita
 
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
Shinnosuke Akita
 
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
Insight Technology, Inc.
 
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya MoritaInsight Technology, Inc.
 
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
Michitoshi Yoshida
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
kbajda
 
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
CO-Sol for Community
 
監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜
Michitoshi Yoshida
 
35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだこと35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだこと
Shinnosuke Akita
 
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦 おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
CO-Sol for Community
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
Ryota Watabe
 
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Ryota Watabe
 
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
Michitoshi Yoshida
 
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
CO-Sol for Community
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違いバックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
Ryota Watabe
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
LinkedIn
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
Ned Potter
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
Aaron Irizarry
 

Viewers also liked (20)

Bq sushi(BigQuery lessons learned)
Bq sushi(BigQuery lessons learned)Bq sushi(BigQuery lessons learned)
Bq sushi(BigQuery lessons learned)
 
Tuning Tips
Tuning TipsTuning Tips
Tuning Tips
 
SQLチューニング勉強会資料
SQLチューニング勉強会資料SQLチューニング勉強会資料
SQLチューニング勉強会資料
 
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
障害とオペミスに備える! ~Oracle Databaseのバックアップを考えよう~
 
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
[db tech showcase Sapporo 2015] B15:ビッグデータ/クラウドにデータ連携自由自在 (オンプレミス ↔ クラウド ↔ クラ...
 
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
[B31] LOGMinerってレプリケーションソフトで使われているけどどうなってる? by Toshiya Morita
 
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
DBA だってもっと効率化したい!〜最近の自動化事情とOracle Database〜
 
Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016Presto at Hadoop Summit 2016
Presto at Hadoop Summit 2016
 
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
SQL Developerって必要ですか? 株式会社コーソル 河野 敏彦
 
監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜監査ログをもっと身近に!〜統合監査のすすめ〜
監査ログをもっと身近に!〜統合監査のすすめ〜
 
35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだこと35歳でDBAになった私がデータベースを壊して学んだこと
35歳でDBAになった私がデータベースを壊して学んだこと
 
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦 おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
おじさん二人が語る OOW デビューのススメ! Oracle OpenWorld 2016参加報告 [検閲版] 株式会社コーソル 杉本 篤信, 河野 敏彦
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い - Database Lounge Tokyo #2
 
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
Standard Edition 2でも使えるOracle Database 12c Release 2オススメ新機能
 
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
進化したのはサーバだけじゃない!〜DBA の毎日をもっと豊かにするユーティリティのすすめ〜
 
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
Oracle SQL Developerを使い倒そう! 株式会社コーソル 守田 典男
 
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違いバックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
バックアップと障害復旧から考えるOracle Database, MySQL, PostgreSQLの違い
 
Study: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving CarsStudy: The Future of VR, AR and Self-Driving Cars
Study: The Future of VR, AR and Self-Driving Cars
 
UX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and ArchivesUX, ethnography and possibilities: for Libraries, Museums and Archives
UX, ethnography and possibilities: for Libraries, Museums and Archives
 
Designing Teams for Emerging Challenges
Designing Teams for Emerging ChallengesDesigning Teams for Emerging Challenges
Designing Teams for Emerging Challenges
 

Similar to Presto in Treasure Data (presented at db tech showcase Sapporo 2015)

Presto in Treasure Data
Presto in Treasure DataPresto in Treasure Data
Presto in Treasure Data
Mitsunori Komatsu
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
Taro L. Saito
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
OCoderFest
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applications
Paweł Żurowski
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
MongoDB
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Spark Summit
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
Eric Xiao
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
Fastly
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analyticsdatablend
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Altinity Ltd
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
Konrad Kokosa
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
Mayank Shrivastava
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Debashis Saha
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
Stefan Krawczyk
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
Koray Kocabas
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
Jonathan Katz
 
Solving the n + 1 query problem
Solving the n + 1 query problemSolving the n + 1 query problem
Solving the n + 1 query problem
Sebastien Pelletier
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analyticsMongoDB
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
Petr Zapletal
 

Similar to Presto in Treasure Data (presented at db tech showcase Sapporo 2015) (20)

Presto in Treasure Data
Presto in Treasure DataPresto in Treasure Data
Presto in Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Timeseries - data visualization in Grafana
Timeseries - data visualization in GrafanaTimeseries - data visualization in Grafana
Timeseries - data visualization in Grafana
 
Architecture for scalable Angular applications
Architecture for scalable Angular applicationsArchitecture for scalable Angular applications
Architecture for scalable Angular applications
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
Horizontally Scalable Relational Databases with Spark: Spark Summit East talk...
 
Write Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdfWrite Faster SQL with Trino.pdf
Write Faster SQL with Trino.pdf
 
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
Accumulo Summit 2015: Building Aggregation Systems on Accumulo [Leveraging Ac...
 
Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge Altitude San Francisco 2018: Logging at the Edge
Altitude San Francisco 2018: Logging at the Edge
 
MongoDB Analytics
MongoDB AnalyticsMongoDB Analytics
MongoDB Analytics
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
A miało być tak... bez wycieków
A miało być tak... bez wyciekówA miało być tak... bez wycieków
A miało być tak... bez wycieków
 
Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020Apache Pinot Meetup Sept02, 2020
Apache Pinot Meetup Sept02, 2020
 
Apache Kylin - Balance between space and time - Hadoop Summit 2015
Apache Kylin -  Balance between space and time - Hadoop Summit 2015Apache Kylin -  Balance between space and time - Hadoop Summit 2015
Apache Kylin - Balance between space and time - Hadoop Summit 2015
 
Why you should be using structured logs
Why you should be using structured logsWhy you should be using structured logs
Why you should be using structured logs
 
Social media analytics using Azure Technologies
Social media analytics using Azure TechnologiesSocial media analytics using Azure Technologies
Social media analytics using Azure Technologies
 
Developing and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDWDeveloping and Deploying Apps with the Postgres FDW
Developing and Deploying Apps with the Postgres FDW
 
Solving the n + 1 query problem
Solving the n + 1 query problemSolving the n + 1 query problem
Solving the n + 1 query problem
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
Distributed Real-Time Stream Processing: Why and How 2.0
Distributed Real-Time Stream Processing:  Why and How 2.0Distributed Real-Time Stream Processing:  Why and How 2.0
Distributed Real-Time Stream Processing: Why and How 2.0
 

Recently uploaded

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Mind IT Systems
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
e20449
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Yara Milbes
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
Roshan Dwivedi
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
Google
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
Donna Lenk
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
abdulrafaychaudhry
 

Recently uploaded (20)

Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
Custom Healthcare Software for Managing Chronic Conditions and Remote Patient...
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 
Graphic Design Crash Course for beginners
Graphic Design Crash Course for beginnersGraphic Design Crash Course for beginners
Graphic Design Crash Course for beginners
 
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi ArabiaTop 7 Unique WhatsApp API Benefits | Saudi Arabia
Top 7 Unique WhatsApp API Benefits | Saudi Arabia
 
Launch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in MinutesLaunch Your Streaming Platforms in Minutes
Launch Your Streaming Platforms in Minutes
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing SuiteAI Pilot Review: The World’s First Virtual Assistant Marketing Suite
AI Pilot Review: The World’s First Virtual Assistant Marketing Suite
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"Navigating the Metaverse: A Journey into Virtual Evolution"
Navigating the Metaverse: A Journey into Virtual Evolution"
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Lecture 1 Introduction to games development
Lecture 1 Introduction to games developmentLecture 1 Introduction to games development
Lecture 1 Introduction to games development
 

Presto in Treasure Data (presented at db tech showcase Sapporo 2015)

  • 1. Presto in Treasure Data (Updated) Mitsunori Komatsu
  • 2. • Mitsunori Komatsu, 
 Software engineer @ Treasure Data. • Presto, Hive, PlazmaDB, td-android-sdk, 
 td-ios-sdk, Mobile SDK backend,
 embedded-sdk • github:komamitsu,
 msgpack-java committer, Presto contributor, etc… About me
  • 3. Today’s talk • What's Presto? • Pros & Cons • Architecture • Recent updates • Who uses Presto? • How do we use Presto?
  • 5. Fast • Distributed SQL query engine (MPP) • Low latency and good performance • No disk IO • Pipelined execution (not Map Reduce) • Compile a query plan down to byte code • Off heap memory • Suitable for ad-hoc query
  • 6. Pluggable • Pluggable backends (“connectors”) • Cassandra / Hive / JMX / Kafka / MySQL / PostgreSQL / System / TPCH • We can add a new connector by 
 extending SPI • Treasure Data has been developed a connector to access our storage
  • 7. What kind of SQL • Supports ANSI SQL (Not HiveQL) • Easy to use Presto compared to HiveQL • Structural type: Map, Array, JSON, Row • Window functions • Approximate queries • http://blinkdb.org/
  • 8. Limitations • Fails with huge JOIN or DISTINCT • In memory only (broadcast / distributed JOIN) • No grace / hybrid hash join • No fault tolerance • Coordinator is SPOF • No “cost based” optimization • No authentication / authorization • No native ODBC => Prestogres
  • 9. Limitations • Fails with huge JOIN or DISTINCT • In memory only (broadcast / distributed JOIN) • No grace / hybrid hash join • No fault tolerance • Coordinator is SPOF • No “cost based” optimization • No authentication / authorization • No native ODBC => Prestogres https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
  • 11. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey
  • 12. Stage 1 Stage 2 Stage 0 Query, stage, task and split Query Task 0.0 Split Task 1.0 Split Task 1.1 Task 1.2 Split Split Split Task 2.0 Split Task 2.1 Task 2.2 Split Split Split Split Split Split Split Split For example… TableScan (FROM) Aggregation (GROUP BY) Output @worker#2 @worker#3 @worker#0
  • 13. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey
  • 14. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3
  • 15. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2
  • 16. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1
  • 17. Query plan Output[nationkey, _col1] => [nationkey:bigint, count:bigint]
 - _col1 := count Exchange[GATHER] => nationkey:bigint, count:bigint Aggregate(FINAL)[nationkey] => [nationkey:bigint, count:bigint]
 - count := "count"("count_15") Exchange[REPARTITION] => nationkey:bigint, count_15:bigint Aggregate(PARTIAL)[nationkey] => [nationkey:bigint, count_15:bigint] - count_15 := "count"("expr") Project => [nationkey:bigint, expr:bigint] - expr := 1 InnerJoin[("custkey" = "custkey_0")] => [custkey:bigint, custkey_0:bigint, nationkey:bigint] Project => [custkey:bigint] Filter[("orderpriority" = '1-URGENT')] => [custkey:bigint, orderpriority:varchar] TableScan[tpch:tpch:orders:sf0.01, original constraint=
 ('1-URGENT' = "orderpriority")] => [custkey:bigint, orderpriority:varchar]
 - custkey := tpch:custkey:1
 - orderpriority := tpch:orderpriority:5 Exchange[REPLICATE] => custkey_0:bigint, nationkey:bigint TableScan[tpch:tpch:customer:sf0.01, original constraint=true] => [custkey_0:bigint, nationkey:bigint]
 - custkey_0 := tpch:custkey:0
 - nationkey := tpch:nationkey:3 select
 c.nationkey,
 count(1)
 from orders o join customer c
 on o.custkey = c.custkey where o.orderpriority = '1-URGENT' group by c.nationkey Stage 3 Stage 2 Stage 1 Stage 0
  • 18. What each component does? Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata?
  • 19. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 20. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 21. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 22. Presto Cli Coordinator - Parse Query - Analyze Query - Create Query Plan - Execute Query - Contains Stages - Execute Stages - Contains Tasks - Issue Tasks Discovery Service Worker Worker - Execute Tasks - Convert Query to Java Bytecode (Operator) - Execute Operator Connector - MetaData - Table, Column… - SplitManager - Split, … Connector - RecordSetProvider - RecordSet - RecordCursor - Read Storage Connector Storage Worker Connector External
 Metadata? What each component does?
  • 23. Multi connectors Presto MySQL - test.users PostgreSQL - public.accesses - public.codes Raptor - default.
 user_codes create table raptor.default.user_codes as select c.text, u.name, count(1) as count from postgres.public.accesses a join mysql.test.users u on cast(a.user as bigint) = u.id join postgres.public.codes c on a.code = c.code where a.time < 1200000000
  • 25. : 2015-08-30T22:29:28.882+0900 DEBUG 20150830_132930_00032_btyir. 4.0-0-50 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT `id`, `name` FROM `test`.`users` : 2015-08-30T22:29:28.856+0900 DEBUG 20150830_132930_00032_btyir. 5.0-0-56 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT "text", "code" FROM “public"."codes" : 2015-08-30T22:30:09.294+0900 DEBUG 20150830_132930_00032_btyir. 3.0-2-70 com.facebook.presto.plugin.jdbc.JdbcRecordCursor Executing: SELECT "user", "code", "time" FROM "public"."accesses" WHERE (("time" < 1200000000)) : - log message Condition pushdown Join pushdown isn’t supported Multi connectors
  • 26. Recent updates (0.108~0.117) New functions: normalize(), from_iso8601_timestamp(), from_iso8601_date(), to_iso8601() slice(), md5(), array_min(), array_max(), histogram() element_at(), url_encode(), url_decode() sha1(), sha256(), sha512() multimap_agg(), checksum() Teradata compatibility functions: index(), char2hexint(), to_char(), to_date(), to_timestamp()
  • 27. Recent updates (0.108~0.117) Cluster Resource Management. - query.max-memory - query.max-memory-per-node - resources.reserved-system-memory “Big Query” option was removed Semi-joins are hash-partitioned if distributed_join is turned on. Add support for partial cast from JSON. For example, json can be cast to array<json>, map<varchar, json>, etc. Use JSON_PARSE() and JSON_FORMAT() instead of CAST Add query_max_run_time session property and query.max-run-time config. Queries are failed after the specified duration. optimizer.optimize-hash-generation and distributed-joins-enabled are both enabled by default now.
  • 28. Recent updates (0.108~0.117) Cluster Resource Management. - query.max-memory - query.max-memory-per-node - resources.reserved-system-memory “Big Query” option was removed Semi-joins are hash-partitioned if distributed_join is turned on. Add support for partial cast from JSON. For example, json can be cast to array<json>, map<varchar, json>, etc. Use JSON_PARSE() and JSON_FORMAT() instead of CAST Add query_max_run_time session property and query.max-run-time config. Queries are failed after the specified duration. optimizer.optimize-hash-generation and distributed-joins-enabled are both enabled by default now. https://highlyscalable.wordpress.com/2013/08/20/in-stream-big-data-processing/
  • 29. Who uses Presto? • Facebook http://www.slideshare.net/dain1/presto-meetup-2015
  • 32. • Qubole • SaaS • Treasure Data • SaaS • Teradata • commercial support Who uses Presto? As a service…
  • 33. Today’s talk • What's Presto? • How do we use Presto? • What’s Treasure Data? • System architecture • How we manage Presto
  • 35. What’s Treasure Data? and more… • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage Founded in 2011 in the U.S. Over 85 employees
  • 36. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 37. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 38. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 39. What’s Treasure Data? and more… Founded in 2011 in the U.S. Over 85 employees • Collect, store and analyze data • With multi-tenancy • With a schema-less structure • On cloud, but mitigates the outage
  • 40. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 41. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 42. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 43. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 44. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 45. Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported Collect! Store! Analyze!
  • 47. Treasure Data in figures
  • 48. Some of our customers
  • 50. Components in Treasure Data worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker result bucket (S3/RiakCS) Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector
  • 51. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 52. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 53. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 54. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 55. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 56. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 57. worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator Presto worker Presto worker Presto worker Presto worker Retry failed query if needed Authentication / Authorization Columnar file format. Schema-less. td-presto connector result bucket (S3/RiakCS) Components in Treasure Data
  • 58. Schema on read time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table User added a new column “user_id” in imported data User can select this column with only adding it to the schema (w/o reconstruct the table) Schema on read
  • 59. Columnar file format time code method user_id 2015-06-01 10:07:11 200 GET 2015-06-01 10:10:12 “200” GET 2015-06-01 10:10:20 200 GET 2015-06-01 10:11:30 200 POST 2015-06-01 10:20:45 200 GET 2015-06-01 10:33:50 400 GET 206 2015-06-01 10:40:11 200 GET 852 2015-06-01 10:51:32 200 PUT 1223 2015-06-01 10:58:02 200 GET 5118 2015-06-01 11:02:11 404 GET 12 2015-06-01 11:14:27 200 GET 3447 access_logs table time code method user_id Columnar file format This query accesses only code column 
 select code, count(1) from tbl group by code
  • 60. td-presto connector • MessagePack v07 • off heap • avoiding “TypeProfile” • Async IO with Jetty-client • Scheduling & Resource management
  • 61. How we manage Presto • Blue-Green Deployment • Stress test tool • Monitoring with DataDog
  • 62. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rc
  • 63. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production rcTest Test Test!
  • 64. Blue-Green Deployment worker queue (MySQL) api server td worker process plazmadb (PostgreSQL + S3/RiakCS) select user_id, count(1) from … Presto coordinator result bucket (S3) Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker Presto worker production!Release stable cluster No downtime
  • 65. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 66. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 67. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 68. Stress test tool • Collect queries that has ever caused issues. • Add a new query with just adding this entry. • Issue the query, gets the result and implements a calculated digest automatically.
 • We can send all the queries including very heavy ones (around 6000 stages) to Presto - job_id: 28889999 - result: 227d16d801a9a43148c2b7149ce4657c - job_id: 28889999
  • 69. Monitoring with DataDog Presto coordinator Presto worker Presto worker Presto worker Presto worker Presto process td-agent in_presto_metrics/v1/jmx/mbean /v1/query /v1/node out_metricsense DataDog
  • 70. Monitoring with DataDog Query stalled time - Most important for us. - It triggers alert calls to us… - It can be mainly increased by td-presto connector problems. Most of them are race condition issue.
  • 71. How many queries processed More than 20000 queries / day
  • 72. How many queries processed Most queries finish within 1min
  • 73. Analytics Infrastructure, Simplified in the Cloud We’re hiring! https://jobs.lever.co/treasure-data