Presto At Arm Treasure Data - 2019 Updates

Copyright 1995-2019 Arm Limited (or its aﬃliates). All rights reserved.
Kai Sasaki, Taro L. Saito
Arm Treasure Data
June 11th, 2019
Presto Conference Tokyo 2019
Presto At Arm Treasure Data
2019 Updates
1

About Me: Kai Sasaki
2
● Kai Sasaki (@Lewuathe)
● https://www.lewuathe.com/
● Software Engineer in Arm
● MPP team to maintain Presto cluster
and around ecosystems
● OSS Contributor
○ Presto, Hadoop, Spark, TensorFlow

About Us

400+
Customers
Founded in
2011
Raised
$54M
Security
Acquired by Arm / Softbank
2018
Arm Treasure Data

Treasure Data = Uniﬁed Data Platform
5

The Architecture of Treasure Data
6
DataLogs
Device
Data
Batch
Data
PlazmaDB
Table Schema
Data Collection Cloud Storage Distributed Data Processing
2 million records / sec. 130 trillion records 1 billion rows processed / sec.
Jobs
Job Management
SQL Editor
Scheduler
Workflows
Machine
Learning
Treasure Data OSS
Third Party OSS

How We Use Presto

Presto Usage (2019)
● 3x more usage since 2017
8
3,500 ~ users
(400+ customer accounts)
600,000~ Queries / Day
100 Trillion Rows Processed / Day
(= 1.2 billion rows processed / sec.)

PlazmaDB: MessagePack DBMS
● Fluentd -> MessagePack -> Arm Treasure Data
● Generating table schema from the input MessagePack data
■ No need to worry about changing schema as adding columns or escalating column types are
managed by the service
● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc.
● Storage Format:
● Our internal MessagePack Columnar Format (MPC1) for schema-on-read
Table Schema
Int Column Reader
String Column Reader
Update
Schema
Generate
Reader Set
Table Reader
Schema-free Data
9
Data Collection Distributed Data Processing

Treasure Data Storage Architecture
● Real-Time vs Archive Storages
● Provide an access to the recent data in real-time storage
● Store optimized partitions into Archive Storage by using MapReduce jobs
(LogMergeJob)
10

TD_INTERVAL UDF
● Support human-friendly time window support
11

PlazmaDB Partition Indexes
● Q: How can we get a list of partition files?
● Limitations of the S3 API:
■ LIST operation of S3 files is quite slow
■ No range filtering of S3 files
○ Time range queries are not supported
● PlazmaDB Partition Indexes
● Manages indexes to partition files on S3
● Implemented on top of PostgreSQL
■ SQL + PL/Python functions
● Use GiST indexes (B-tree) to support time range filtering
■ dataset id, (partition start_time, end_time)
● Support transactional partition insertion + deletion for a single table
■ INSERT INTO, DELETE are atomic operations in TD
12

Extension to Presto
● No major change has been made to Presto master branch
● Fork: https://github.com/treasure-data/presto (almost no diff)
● This is a strategy for catching up with the latest master.
● Adding extension modules in a different internal repository (td-presto)
● td-presto-server
■ Extending presto-server main to inject our own modules
■ Adding a split-resource manager for throttling query resource usage
● td-presto-plugin
■ Metadata Management
○ A bridge to TD API (table metadata API)
○ PlazmaDB: Partition indexes
■ MPC1 file reader
○ S3 I/O request manager for pipelining a lot of S3 GET requests
■ Treasure Data specific UDFs
○ https://support.treasuredata.com/hc/en-us/articles/360001450828-Supported-Presto-and-TD-Functions
● td-presto-stella
■ Partition maintenance module implemented as Presto connectors 13

Stella Plugin
● A Presto plugin for maintaining
fragmented partitions
● Too small partitions
● Too large partitions
● Use Presto to merge/split partitions
● Guidelines
■ less than 1M records /
partition
■ 250MB / partition
● Using CTAS statement for merging
partitions:
■ CREATE TABLE stella (account_id =
xxx, database = xxx, table = xxx,
max_ﬁle_size=xxx,
max_time_range=xxx)
14

Ecosystems Around
Presto

Prestobase: Presto Gateway (api-presto)
16
● Prestobase is a proxy gateway to Presto clusters to support standard presto
clients (e.g., presto-cli, jdbc, odbc, etc.)
● Written in Scala

td-spark: Apache Spark Driver for Treasure Data
17
● td-spark provides a way to use TD table as a datasource of Spark application
● Supporting both read/write mode makes TD extended to further use cases
Python
$ pip install td-pyspark
Or
$ docker pull armtd/td-spark-pyspark
Scala
Add td-spark-assembly.jar in the Spark class path.
https://support.treasuredata.com/hc/en-us/articles/360000716627-Apache-Spark-Driver-td-sp
ark-Release

Internal Optimization

Data-Driven System Optimization
● TD is one of the biggest users of TD
● Query logs
● Collecting all Presto query logs since 2015
● Query statements, performance statistics, logs, etc.
● Logs are our valuable assets
● To understand user activities and enable data-driven decisions
19
Logs
User
Query
Collect Query Logs
Analyze Query Logs
Machine
Learning
Query
Optimization
Optimize System

Checking Query Correctness And Performance
● Upgrading Query Engine Versions
● Need to check customer query compatibilities, performance degradation, etc.
● Testing all 500,000 query / day = 15M queries / month is impractical
● Use ML techniques to eﬀectively reduce the problem size
● Simulate all possible customer query patterns to check the compatibility
● Compute checksums of queries, record performance results to TD
20
User
QueryUser
QueryUser
QueryUser
QueryUser
QueryUser
Query
15,000,000
queries
clustering
Query
SigQuery
SigQuery
SigQuery
Sig
minimize
Small
QuerySmall
QuerySmall
QuerySmall
Query
100,000 query
patterns
100,000 small
queries
simulate
queries
simulation
results and
stats

Query Metric Analysis
● Resource Usage Prediction
○ Based on the historical metric data
● Further optimization leveraged by prediction result
● Working with Internship Student from UCB
21

Presto Resource Manager
● Collecting the system metric in robust manner is challenging
● Uniﬁed metric collector of Presto
○ Cluster metric management
○ Query routing optimization
22

Challenges

Challenge: Optimizing Query Workload As A Whole
● 2000 query patterns (5000 queries in a day)
● A real example in our production workload
● How can we improve the entire data processing?
24

Detecting Redundant Data Processing
● Redundancy In Queries
○ Same table scans, joins,
aggregations, UDF processing, etc.
● Related work:
○ Selecting Subexpressions to
Materialize At Datacenter Scale
(Microsoft, VLDB 2018)
■ Extract best common sub-
expressions from query graphs
■ Linear programming
● Challenges
● Updating sub-expression caches
■ Cache invalidation
● Combine cached results + query
results for time series data
25

Challenge: Maximizing Machine Resource Utilization
● Uneven CPU usage due to regional
diﬀerences
● US (upper)
■ Global customers
● Tokyo (lower)
■ Only Japan customers
● Semi-scheduled auto-scaling
● Using past stats + runtime
metrics
● Distributing workloads to oﬀ-peak
times
● Early data processing
● Optimizing query scheduling
26

Idea: Using Presto As A Backend of Other Query Engines
● Presto is eﬃcient for table scans, ﬁlter, aggregations
● Can’t we use the power of Presto for accelerating other query engines?
27
Primary Query EngineSecondary Query Engine

Idea: Presto-Presto Connector
● Launch multiple Presto clusters with diﬀerent conﬁgurations
● Presto 1
● Caching common sub-expressions (e.g., Materialized Views or in-memory
storage)
● Presto 2
● Delegate sub-expression processing to the upstream Presto cluster
● Challenge:
● Extracting appropriate sub-queries to run at the upstream Presto
28
With Sub-Query CacheReuse Pre-Computed Results

Missing: Binary Protocol
● Current Presto sends query results in JSON format (v1 protocol)
● Need a faster data transfer method
○ Idea: Using MessagePack for binary data representation
○ Support parallel query results transfer
29
v1 JSON
(slow)
Binary Data Transfer

Join Arm Treasure Data Team!
● Solving Challenges In The Real World Data Processing
● Building a scalable data processing platform on the cloud
■ Enabled 400+ companies to use Presto for their data processing
● Stream data ingestion systems
■ PlazmaDB indexes, physical storage optimization
● Advanced analytics with Presto, Hive, Spark, and their workﬂows.
● CDP (Customer Data Platform)
● Building data platform for managing our customers’ customer data
● Supporting non-engineers (e.g., marketers, executives) to manage their own data
30

Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017
Thank You!
Danke!
Merci!
谢谢!
ありがとう!
Gracias!
Kiitos!
31

Presto At Arm Treasure Data - 2019 Updates

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Presto At Arm Treasure Data - 2019 Updates

Similar to Presto At Arm Treasure Data - 2019 Updates (20)

More from Taro L. Saito

More from Taro L. Saito (20)

Recently uploaded

Recently uploaded (20)

Presto At Arm Treasure Data - 2019 Updates