Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Presto At Arm Treasure Data - 2019 Updates

1,709 views

Published on

Presentation at Presto Conference Tokyo 2019

- Arm Treasure Data
- Plazma DB Indexes
- Real-time, Archive Storages
- Schema-on-read data processing
- Physical partition maintenance via presto-stella plugin

Published in: Technology
  • Positions Available Now! We currently have several openings for writing workers. ■■■ http://t.cn/AieXS5j0
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Presto At Arm Treasure Data - 2019 Updates

  1. 1. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Kai Sasaki, Taro L. Saito Arm Treasure Data June 11th, 2019 Presto Conference Tokyo 2019 Presto At Arm Treasure Data 2019 Updates 1
  2. 2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. About Me: Kai Sasaki 2 ● Kai Sasaki (@Lewuathe) ● https://www.lewuathe.com/ ● Software Engineer in Arm ● MPP team to maintain Presto cluster and around ecosystems ● OSS Contributor ○ Presto, Hadoop, Spark, TensorFlow
  3. 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. About Us
  4. 4. 400+ Customers Founded in 2011 Raised $54M Security Acquired by Arm / Softbank 2018 Arm Treasure Data
  5. 5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Treasure Data = Unified Data Platform 5
  6. 6. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. The Architecture of Treasure Data 6 DataLogs Device Data Batch Data PlazmaDB Table Schema Data Collection Cloud Storage Distributed Data Processing 2 million records / sec. 130 trillion records 1 billion rows processed / sec. Jobs Job Management SQL Editor Scheduler Workflows Machine Learning Treasure Data OSS Third Party OSS
  7. 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. How We Use Presto
  8. 8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Usage (2019) ● 3x more usage since 2017 8 3,500 ~ users (400+ customer accounts) 600,000~ Queries / Day 100 Trillion Rows Processed / Day (= 1.2 billion rows processed / sec.)
  9. 9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. PlazmaDB: MessagePack DBMS ● Fluentd -> MessagePack -> Arm Treasure Data ● Generating table schema from the input MessagePack data ■ No need to worry about changing schema as adding columns or escalating column types are managed by the service ● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc. ● Storage Format: ● Our internal MessagePack Columnar Format (MPC1) for schema-on-read Table Schema Int Column Reader String Column Reader Update Schema Generate Reader Set Table Reader Schema-free Data 9 Data Collection Distributed Data Processing
  10. 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Storage Architecture ● Real-Time vs Archive Storages ● Provide an access to the recent data in real-time storage ● Store optimized partitions into Archive Storage by using MapReduce jobs (LogMergeJob) 10
  11. 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. TD_INTERVAL UDF ● Support human-friendly time window support 11
  12. 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PlazmaDB Partition Indexes ● Q: How can we get a list of partition files? ● Limitations of the S3 API: ■ LIST operation of S3 files is quite slow ■ No range filtering of S3 files ○ Time range queries are not supported ● PlazmaDB Partition Indexes ● Manages indexes to partition files on S3 ● Implemented on top of PostgreSQL ■ SQL + PL/Python functions ● Use GiST indexes (B-tree) to support time range filtering ■ dataset id, (partition start_time, end_time) ● Support transactional partition insertion + deletion for a single table ■ INSERT INTO, DELETE are atomic operations in TD 12
  13. 13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Extension to Presto ● No major change has been made to Presto master branch ● Fork: https://github.com/treasure-data/presto (almost no diff) ● This is a strategy for catching up with the latest master. ● Adding extension modules in a different internal repository (td-presto) ● td-presto-server ■ Extending presto-server main to inject our own modules ■ Adding a split-resource manager for throttling query resource usage ● td-presto-plugin ■ Metadata Management ○ A bridge to TD API (table metadata API) ○ PlazmaDB: Partition indexes ■ MPC1 file reader ○ S3 I/O request manager for pipelining a lot of S3 GET requests ■ Treasure Data specific UDFs ○ https://support.treasuredata.com/hc/en-us/articles/360001450828-Supported-Presto-and-TD-Functions ● td-presto-stella ■ Partition maintenance module implemented as Presto connectors 13
  14. 14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Stella Plugin ● A Presto plugin for maintaining fragmented partitions ● Too small partitions ● Too large partitions ● Use Presto to merge/split partitions ● Guidelines ■ less than 1M records / partition ■ 250MB / partition ● Using CTAS statement for merging partitions: ■ CREATE TABLE stella (account_id = xxx, database = xxx, table = xxx, max_file_size=xxx, max_time_range=xxx) 14
  15. 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Ecosystems Around Presto
  16. 16. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Prestobase: Presto Gateway (api-presto) 16 ● Prestobase is a proxy gateway to Presto clusters to support standard presto clients (e.g., presto-cli, jdbc, odbc, etc.) ● Written in Scala
  17. 17. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. td-spark: Apache Spark Driver for Treasure Data 17 ● td-spark provides a way to use TD table as a datasource of Spark application ● Supporting both read/write mode makes TD extended to further use cases Python $ pip install td-pyspark Or $ docker pull armtd/td-spark-pyspark Scala Add td-spark-assembly.jar in the Spark class path. https://support.treasuredata.com/hc/en-us/articles/360000716627-Apache-Spark-Driver-td-sp ark-Release
  18. 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Internal Optimization
  19. 19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Data-Driven System Optimization ● TD is one of the biggest users of TD ● Query logs ● Collecting all Presto query logs since 2015 ● Query statements, performance statistics, logs, etc. ● Logs are our valuable assets ● To understand user activities and enable data-driven decisions 19 Logs User Query Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System
  20. 20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Checking Query Correctness And Performance ● Upgrading Query Engine Versions ● Need to check customer query compatibilities, performance degradation, etc. ● Testing all 500,000 query / day = 15M queries / month is impractical ● Use ML techniques to effectively reduce the problem size ● Simulate all possible customer query patterns to check the compatibility ● Compute checksums of queries, record performance results to TD 20 User QueryUser QueryUser QueryUser QueryUser QueryUser Query 15,000,000 queries clustering Query SigQuery SigQuery SigQuery Sig minimize Small QuerySmall QuerySmall QuerySmall Query 100,000 query patterns 100,000 small queries simulate queries simulation results and stats
  21. 21. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Query Metric Analysis ● Resource Usage Prediction ○ Based on the historical metric data ● Further optimization leveraged by prediction result ● Working with Internship Student from UCB 21
  22. 22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Resource Manager ● Collecting the system metric in robust manner is challenging ● Unified metric collector of Presto ○ Cluster metric management ○ Query routing optimization 22
  23. 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges
  24. 24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Challenge: Optimizing Query Workload As A Whole ● 2000 query patterns (5000 queries in a day) ● A real example in our production workload ● How can we improve the entire data processing? 24
  25. 25. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Detecting Redundant Data Processing ● Redundancy In Queries ○ Same table scans, joins, aggregations, UDF processing, etc. ● Related work: ○ Selecting Subexpressions to Materialize At Datacenter Scale (Microsoft, VLDB 2018) ■ Extract best common sub- expressions from query graphs ■ Linear programming ● Challenges ● Updating sub-expression caches ■ Cache invalidation ● Combine cached results + query results for time series data 25
  26. 26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Challenge: Maximizing Machine Resource Utilization ● Uneven CPU usage due to regional differences ● US (upper) ■ Global customers ● Tokyo (lower) ■ Only Japan customers ● Semi-scheduled auto-scaling ● Using past stats + runtime metrics ● Distributing workloads to off-peak times ● Early data processing ● Optimizing query scheduling 26
  27. 27. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Idea: Using Presto As A Backend of Other Query Engines ● Presto is efficient for table scans, filter, aggregations ● Can’t we use the power of Presto for accelerating other query engines? 27 Primary Query EngineSecondary Query Engine
  28. 28. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Idea: Presto-Presto Connector ● Launch multiple Presto clusters with different configurations ● Presto 1 ● Caching common sub-expressions (e.g., Materialized Views or in-memory storage) ● Presto 2 ● Delegate sub-expression processing to the upstream Presto cluster ● Challenge: ● Extracting appropriate sub-queries to run at the upstream Presto 28 With Sub-Query CacheReuse Pre-Computed Results
  29. 29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Missing: Binary Protocol ● Current Presto sends query results in JSON format (v1 protocol) ● Need a faster data transfer method ○ Idea: Using MessagePack for binary data representation ○ Support parallel query results transfer 29 v1 JSON (slow) Binary Data Transfer
  30. 30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Join Arm Treasure Data Team! ● Solving Challenges In The Real World Data Processing ● Building a scalable data processing platform on the cloud ■ Enabled 400+ companies to use Presto for their data processing ● Stream data ingestion systems ■ PlazmaDB indexes, physical storage optimization ● Advanced analytics with Presto, Hive, Spark, and their workflows. ● CDP (Customer Data Platform) ● Building data platform for managing our customers’ customer data ● Supporting non-engineers (e.g., marketers, executives) to manage their own data 30
  31. 31. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 31

×