SlideShare a Scribd company logo

Presto At Arm Treasure Data - 2019 Updates

Taro L. Saito
Taro L. Saito
Taro L. SaitoPh.D., Software Engineer at Treasure Data

Presentation at Presto Conference Tokyo 2019 - Arm Treasure Data - Plazma DB Indexes - Real-time, Archive Storages - Schema-on-read data processing - Physical partition maintenance via presto-stella plugin

Presto At Arm Treasure Data - 2019 Updates

1 of 31
Download to read offline
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Kai Sasaki, Taro L. Saito
Arm Treasure Data
June 11th, 2019
Presto Conference Tokyo 2019
Presto At Arm Treasure Data
2019 Updates
1
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
About Me: Kai Sasaki
2
● Kai Sasaki (@Lewuathe)
● https://www.lewuathe.com/
● Software Engineer in Arm
● MPP team to maintain Presto cluster
and around ecosystems
● OSS Contributor
○ Presto, Hadoop, Spark, TensorFlow
Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved.
About Us
400+
Customers
Founded in
2011
Raised
$54M
Security
Acquired by Arm / Softbank
2018
Arm Treasure Data
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
Treasure Data = Unified Data Platform
5
Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved.
The Architecture of Treasure Data
6
DataLogs
Device
Data
Batch
Data
PlazmaDB
Table Schema
Data Collection Cloud Storage Distributed Data Processing
2 million records / sec. 130 trillion records 1 billion rows processed / sec.
Jobs
Job Management
SQL Editor
Scheduler
Workflows
Machine
Learning
Treasure Data OSS
Third Party OSS
Ad

Recommended

Data Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaData Ingest Self Service and Management using Nifi and Kafka
Data Ingest Self Service and Management using Nifi and KafkaDataWorks Summit
 
Serving BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeServing BERT Models in Production with TorchServe
Serving BERT Models in Production with TorchServeNidhin Pattaniyil
 
A Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiA Thorough Comparison of Delta Lake, Iceberg and Hudi
A Thorough Comparison of Delta Lake, Iceberg and HudiDatabricks
 
Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Iceberg: A modern table format for big data (Strata NY 2018)
Iceberg: A modern table format for big data (Strata NY 2018)Ryan Blue
 
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...
Best Practices for ETL with Apache NiFi on Kubernetes - Albert Lewandowski, G...GetInData
 
Technical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASTechnical Introduction to PostgreSQL and PPAS
Technical Introduction to PostgreSQL and PPASAshnikbiz
 
Building a modern data warehouse
Building a modern data warehouseBuilding a modern data warehouse
Building a modern data warehouseJames Serra
 

More Related Content

What's hot

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Sadayuki Furuhashi
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lakeJames Serra
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAPEDB
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyAlexander Kukushkin
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersDataWorks Summit
 
Practical Memory Tuning for PostgreSQL
Practical Memory Tuning for PostgreSQLPractical Memory Tuning for PostgreSQL
Practical Memory Tuning for PostgreSQLGrant McAlister
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLSeveralnines
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Altinity Ltd
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & MarquezJulien Le Dem
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkSergey Petrunya
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfAltinity Ltd
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of PrestoTaro L. Saito
 

What's hot (20)

Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1Understanding Presto - Presto meetup @ Tokyo #1
Understanding Presto - Presto meetup @ Tokyo #1
 
Big data architectures and the data lake
Big data architectures and the data lakeBig data architectures and the data lake
Big data architectures and the data lake
 
OLTP+OLAP=HTAP
 OLTP+OLAP=HTAP OLTP+OLAP=HTAP
OLTP+OLAP=HTAP
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
NoSQL
NoSQLNoSQL
NoSQL
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Patroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easyPatroni - HA PostgreSQL made easy
Patroni - HA PostgreSQL made easy
 
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark ClustersTensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
TensorFlowOnSpark: Scalable TensorFlow Learning on Spark Clusters
 
Practical Memory Tuning for PostgreSQL
Practical Memory Tuning for PostgreSQLPractical Memory Tuning for PostgreSQL
Practical Memory Tuning for PostgreSQL
 
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQLWebinar slides: An Introduction to Performance Monitoring for PostgreSQL
Webinar slides: An Introduction to Performance Monitoring for PostgreSQL
 
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
Introduction to the Mysteries of ClickHouse Replication, By Robert Hodges and...
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Data pipelines observability: OpenLineage & Marquez
Data pipelines observability:  OpenLineage & MarquezData pipelines observability:  OpenLineage & Marquez
Data pipelines observability: OpenLineage & Marquez
 
Lessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmarkLessons for the optimizer from running the TPC-DS benchmark
Lessons for the optimizer from running the TPC-DS benchmark
 
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdfDeep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
Deep Dive on ClickHouse Sharding and Replication-2202-09-22.pdf
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 
Vue d'ensemble Dremio
Vue d'ensemble DremioVue d'ensemble Dremio
Vue d'ensemble Dremio
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 

Similar to Presto At Arm Treasure Data - 2019 Updates

Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Taro L. Saito
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataAki Ariga
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSKimmo Kantojärvi
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataTaro L. Saito
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performanceOracle Korea
 
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020Taro L. Saito
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futuremarkgrover
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesKarthik Murugesan
 
How to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQLHow to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQLKeisuke Suzuki
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLYugabyte
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetupByung Ho Lee
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Taro L. Saito
 
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS SummitAmazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS SummitAmazon Web Services
 
Airframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpecAirframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpecTaro L. Saito
 
MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?Norvald Ryeng
 
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Amazon Web Services
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerFederico Palladoro
 

Similar to Presto At Arm Treasure Data - 2019 Updates (20)

Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
Managing Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure DataManaging Machine Learning workflows on Treasure Data
Managing Machine Learning workflows on Treasure Data
 
Make your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWSMake your data fly - Building data platform in AWS
Make your data fly - Building data platform in AWS
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
 
201810 td tech_talk
201810 td tech_talk201810 td tech_talk
201810 td tech_talk
 
times ten in-memory database for extreme performance
times ten in-memory database for extreme performancetimes ten in-memory database for extreme performance
times ten in-memory database for extreme performance
 
Greenplum Architecture
Greenplum ArchitectureGreenplum Architecture
Greenplum Architecture
 
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
 
The Lyft data platform: Now and in the future
The Lyft data platform: Now and in the futureThe Lyft data platform: Now and in the future
The Lyft data platform: Now and in the future
 
Lyft data Platform - 2019 slides
Lyft data Platform - 2019 slidesLyft data Platform - 2019 slides
Lyft data Platform - 2019 slides
 
How to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQLHow to Upgrade Major Version of Your Production PostgreSQL
How to Upgrade Major Version of Your Production PostgreSQL
 
How YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQLHow YugaByte DB Implements Distributed PostgreSQL
How YugaByte DB Implements Distributed PostgreSQL
 
Times ten 18.1_overview_meetup
Times ten 18.1_overview_meetupTimes ten 18.1_overview_meetup
Times ten 18.1_overview_meetup
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
 
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS SummitAmazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
Amazon Aurora: Deep Dive - SRV308 - Chicago AWS Summit
 
Airframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpecAirframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpec
 
PostgreSQL
PostgreSQL PostgreSQL
PostgreSQL
 
MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?MySQL 8.0: What Is New in Optimizer and Executor?
MySQL 8.0: What Is New in Optimizer and Executor?
 
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
Tiered Data Sets in Amazon Redshift (ANT321) - AWS re:Invent 2018
 
Big data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on dockerBig data Argentina meetup 2020-09: Intro to presto on docker
Big data Argentina meetup 2020-09: Intro to presto on docker
 

More from Taro L. Saito

Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Taro L. Saito
 
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Taro L. Saito
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Taro L. Saito
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTaro L. Saito
 
Learning Silicon Valley Culture
Learning Silicon Valley CultureLearning Silicon Valley Culture
Learning Silicon Valley CultureTaro L. Saito
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure DataTaro L. Saito
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure DataTaro L. Saito
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure DataTaro L. Saito
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoTaro L. Saito
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Taro L. Saito
 
Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Taro L. Saito
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Taro L. Saito
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoTaro L. Saito
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanTaro L. Saito
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Taro L. Saito
 
Silkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミングSilkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミングTaro L. Saito
 
2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データ2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データTaro L. Saito
 

More from Taro L. Saito (20)

Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
 
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
 
Airframe RPC
Airframe RPCAirframe RPC
Airframe RPC
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
 
Learning Silicon Valley Culture
Learning Silicon Valley CultureLearning Silicon Valley Culture
Learning Silicon Valley Culture
 
Presto At Treasure Data
Presto At Treasure DataPresto At Treasure Data
Presto At Treasure Data
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例
 
JNuma Library
JNuma LibraryJNuma Library
JNuma Library
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
 
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, TokyoWeaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
 
Spark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in JapanSpark Internals - Hadoop Source Code Reading #16 in Japan
Spark Internals - Hadoop Source Code Reading #16 in Japan
 
Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014Streaming Distributed Data Processing with Silk #deim2014
Streaming Distributed Data Processing with Silk #deim2014
 
Silkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミングSilkによる並列分散ワークフロープログラミング
Silkによる並列分散ワークフロープログラミング
 
2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データ2011年度 生物データベース論 2日目 木構造データ
2011年度 生物データベース論 2日目 木構造データ
 

Recently uploaded

"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...shaiyuvasv
 
Enhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersEnhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersThousandEyes
 
From eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the ManufacturingFrom eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the ManufacturingSoracom Global, Inc.
 
Heltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdfHeltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdfMarielaL5
 
Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxMaarten Balliauw
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch TuesdayIvanti
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERNRonnelBaroc
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfDomotica daVinci
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolProduct School
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Adrian Sanabria
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfIsidro Navarro
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS Chicago
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfDomotica daVinci
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupMemory Fabric Forum
 
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEDNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEandreiandasan
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVARobert McDermott
 
Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologySafe Software
 
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
Zi-Stick UBS Dongle ZIgbee from  Aeotec manualZi-Stick UBS Dongle ZIgbee from  Aeotec manual
Zi-Stick UBS Dongle ZIgbee from Aeotec manualDomotica daVinci
 
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdfZ-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdfDomotica daVinci
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Memory Fabric Forum
 

Recently uploaded (20)

"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre..."Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
"Journey of Aspiration: Unveiling the Path to Becoming a Technocrat and Entre...
 
Enhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for PartnersEnhancing SaaS Performance: A Hands-on Workshop for Partners
Enhancing SaaS Performance: A Hands-on Workshop for Partners
 
From eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the ManufacturingFrom eSIMs to iSIMs: It’s Inside the Manufacturing
From eSIMs to iSIMs: It’s Inside the Manufacturing
 
Heltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdfHeltun_HE-RS01_User_Manual_B9AH.pdf
Heltun_HE-RS01_User_Manual_B9AH.pdf
 
Bringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptxBringing nullability into existing code - dammit is not the answer.pptx
Bringing nullability into existing code - dammit is not the answer.pptx
 
2024 February Patch Tuesday
2024 February Patch Tuesday2024 February Patch Tuesday
2024 February Patch Tuesday
 
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
21ST CENTURY LITERACY FROM TRADITIONAL TO MODERN
 
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdfQuinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
Quinto Z-Wave Heltun_HE-RS01_User_Manual_B9AH.pdf
 
The Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product SchoolThe Future of Product, by Founder & CEO, Product School
The Future of Product, by Founder & CEO, Product School
 
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
Early Tech Adoption: Foolish or Pragmatic? - 17th ISACA South Florida WOW Con...
 
Artificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdfArtificial-Intelligence-in-Marketing-Data.pdf
Artificial-Intelligence-in-Marketing-Data.pdf
 
AWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user groupAWS reInvent 2023 recaps from Chicago AWS user group
AWS reInvent 2023 recaps from Chicago AWS user group
 
Curtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdfCurtain Module Manual Zigbee Neo CS01-1C.pdf
Curtain Module Manual Zigbee Neo CS01-1C.pdf
 
Q1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product LineupQ1 Memory Fabric Forum: SMART CXL Product Lineup
Q1 Memory Fabric Forum: SMART CXL Product Lineup
 
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFEDNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
DNA LIGASE BIOTECHNOLOGY BIOLOGY STUDY OF LIFE
 
Introduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVAIntroduction to Multimodal LLMs with LLaVA
Introduction to Multimodal LLMs with LLaVA
 
Breaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI TechnologyBreaking Barriers & Leveraging the Latest Developments in AI Technology
Breaking Barriers & Leveraging the Latest Developments in AI Technology
 
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
Zi-Stick UBS Dongle ZIgbee from  Aeotec manualZi-Stick UBS Dongle ZIgbee from  Aeotec manual
Zi-Stick UBS Dongle ZIgbee from Aeotec manual
 
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdfZ-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
Z-Wave Fan coil Thermostat Heltun_HE-HT01_User_Manual.pdf
 
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
Q1 Memory Fabric Forum: Intel Enabling Compute Express Link (CXL)
 

Presto At Arm Treasure Data - 2019 Updates

  • 1. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Kai Sasaki, Taro L. Saito Arm Treasure Data June 11th, 2019 Presto Conference Tokyo 2019 Presto At Arm Treasure Data 2019 Updates 1
  • 2. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. About Me: Kai Sasaki 2 ● Kai Sasaki (@Lewuathe) ● https://www.lewuathe.com/ ● Software Engineer in Arm ● MPP team to maintain Presto cluster and around ecosystems ● OSS Contributor ○ Presto, Hadoop, Spark, TensorFlow
  • 3. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. About Us
  • 5. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Treasure Data = Unified Data Platform 5
  • 6. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. The Architecture of Treasure Data 6 DataLogs Device Data Batch Data PlazmaDB Table Schema Data Collection Cloud Storage Distributed Data Processing 2 million records / sec. 130 trillion records 1 billion rows processed / sec. Jobs Job Management SQL Editor Scheduler Workflows Machine Learning Treasure Data OSS Third Party OSS
  • 7. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. How We Use Presto
  • 8. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Usage (2019) ● 3x more usage since 2017 8 3,500 ~ users (400+ customer accounts) 600,000~ Queries / Day 100 Trillion Rows Processed / Day (= 1.2 billion rows processed / sec.)
  • 9. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. PlazmaDB: MessagePack DBMS ● Fluentd -> MessagePack -> Arm Treasure Data ● Generating table schema from the input MessagePack data ■ No need to worry about changing schema as adding columns or escalating column types are managed by the service ● Apply schema–on-read for providing table data for Presto/Hive/Spark, etc. ● Storage Format: ● Our internal MessagePack Columnar Format (MPC1) for schema-on-read Table Schema Int Column Reader String Column Reader Update Schema Generate Reader Set Table Reader Schema-free Data 9 Data Collection Distributed Data Processing
  • 10. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Treasure Data Storage Architecture ● Real-Time vs Archive Storages ● Provide an access to the recent data in real-time storage ● Store optimized partitions into Archive Storage by using MapReduce jobs (LogMergeJob) 10
  • 11. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. TD_INTERVAL UDF ● Support human-friendly time window support 11
  • 12. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. PlazmaDB Partition Indexes ● Q: How can we get a list of partition files? ● Limitations of the S3 API: ■ LIST operation of S3 files is quite slow ■ No range filtering of S3 files ○ Time range queries are not supported ● PlazmaDB Partition Indexes ● Manages indexes to partition files on S3 ● Implemented on top of PostgreSQL ■ SQL + PL/Python functions ● Use GiST indexes (B-tree) to support time range filtering ■ dataset id, (partition start_time, end_time) ● Support transactional partition insertion + deletion for a single table ■ INSERT INTO, DELETE are atomic operations in TD 12
  • 13. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Extension to Presto ● No major change has been made to Presto master branch ● Fork: https://github.com/treasure-data/presto (almost no diff) ● This is a strategy for catching up with the latest master. ● Adding extension modules in a different internal repository (td-presto) ● td-presto-server ■ Extending presto-server main to inject our own modules ■ Adding a split-resource manager for throttling query resource usage ● td-presto-plugin ■ Metadata Management ○ A bridge to TD API (table metadata API) ○ PlazmaDB: Partition indexes ■ MPC1 file reader ○ S3 I/O request manager for pipelining a lot of S3 GET requests ■ Treasure Data specific UDFs ○ https://support.treasuredata.com/hc/en-us/articles/360001450828-Supported-Presto-and-TD-Functions ● td-presto-stella ■ Partition maintenance module implemented as Presto connectors 13
  • 14. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Stella Plugin ● A Presto plugin for maintaining fragmented partitions ● Too small partitions ● Too large partitions ● Use Presto to merge/split partitions ● Guidelines ■ less than 1M records / partition ■ 250MB / partition ● Using CTAS statement for merging partitions: ■ CREATE TABLE stella (account_id = xxx, database = xxx, table = xxx, max_file_size=xxx, max_time_range=xxx) 14
  • 15. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Ecosystems Around Presto
  • 16. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Prestobase: Presto Gateway (api-presto) 16 ● Prestobase is a proxy gateway to Presto clusters to support standard presto clients (e.g., presto-cli, jdbc, odbc, etc.) ● Written in Scala
  • 17. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. td-spark: Apache Spark Driver for Treasure Data 17 ● td-spark provides a way to use TD table as a datasource of Spark application ● Supporting both read/write mode makes TD extended to further use cases Python $ pip install td-pyspark Or $ docker pull armtd/td-spark-pyspark Scala Add td-spark-assembly.jar in the Spark class path. https://support.treasuredata.com/hc/en-us/articles/360000716627-Apache-Spark-Driver-td-sp ark-Release
  • 18. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Internal Optimization
  • 19. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Data-Driven System Optimization ● TD is one of the biggest users of TD ● Query logs ● Collecting all Presto query logs since 2015 ● Query statements, performance statistics, logs, etc. ● Logs are our valuable assets ● To understand user activities and enable data-driven decisions 19 Logs User Query Collect Query Logs Analyze Query Logs Machine Learning Query Optimization Optimize System
  • 20. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Checking Query Correctness And Performance ● Upgrading Query Engine Versions ● Need to check customer query compatibilities, performance degradation, etc. ● Testing all 500,000 query / day = 15M queries / month is impractical ● Use ML techniques to effectively reduce the problem size ● Simulate all possible customer query patterns to check the compatibility ● Compute checksums of queries, record performance results to TD 20 User QueryUser QueryUser QueryUser QueryUser QueryUser Query 15,000,000 queries clustering Query SigQuery SigQuery SigQuery Sig minimize Small QuerySmall QuerySmall QuerySmall Query 100,000 query patterns 100,000 small queries simulate queries simulation results and stats
  • 21. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Query Metric Analysis ● Resource Usage Prediction ○ Based on the historical metric data ● Further optimization leveraged by prediction result ● Working with Internship Student from UCB 21
  • 22. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Presto Resource Manager ● Collecting the system metric in robust manner is challenging ● Unified metric collector of Presto ○ Cluster metric management ○ Query routing optimization 22
  • 23. Copyright 1995-2018 Arm Limited (or its affiliates). All rights reserved. Challenges
  • 24. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Challenge: Optimizing Query Workload As A Whole ● 2000 query patterns (5000 queries in a day) ● A real example in our production workload ● How can we improve the entire data processing? 24
  • 25. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Detecting Redundant Data Processing ● Redundancy In Queries ○ Same table scans, joins, aggregations, UDF processing, etc. ● Related work: ○ Selecting Subexpressions to Materialize At Datacenter Scale (Microsoft, VLDB 2018) ■ Extract best common sub- expressions from query graphs ■ Linear programming ● Challenges ● Updating sub-expression caches ■ Cache invalidation ● Combine cached results + query results for time series data 25
  • 26. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Challenge: Maximizing Machine Resource Utilization ● Uneven CPU usage due to regional differences ● US (upper) ■ Global customers ● Tokyo (lower) ■ Only Japan customers ● Semi-scheduled auto-scaling ● Using past stats + runtime metrics ● Distributing workloads to off-peak times ● Early data processing ● Optimizing query scheduling 26
  • 27. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Idea: Using Presto As A Backend of Other Query Engines ● Presto is efficient for table scans, filter, aggregations ● Can’t we use the power of Presto for accelerating other query engines? 27 Primary Query EngineSecondary Query Engine
  • 28. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Idea: Presto-Presto Connector ● Launch multiple Presto clusters with different configurations ● Presto 1 ● Caching common sub-expressions (e.g., Materialized Views or in-memory storage) ● Presto 2 ● Delegate sub-expression processing to the upstream Presto cluster ● Challenge: ● Extracting appropriate sub-queries to run at the upstream Presto 28 With Sub-Query CacheReuse Pre-Computed Results
  • 29. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Missing: Binary Protocol ● Current Presto sends query results in JSON format (v1 protocol) ● Need a faster data transfer method ○ Idea: Using MessagePack for binary data representation ○ Support parallel query results transfer 29 v1 JSON (slow) Binary Data Transfer
  • 30. Copyright 1995-2019 Arm Limited (or its affiliates). All rights reserved. Join Arm Treasure Data Team! ● Solving Challenges In The Real World Data Processing ● Building a scalable data processing platform on the cloud ■ Enabled 400+ companies to use Presto for their data processing ● Stream data ingestion systems ■ PlazmaDB indexes, physical storage optimization ● Advanced analytics with Presto, Hive, Spark, and their workflows. ● CDP (Customer Data Platform) ● Building data platform for managing our customers’ customer data ● Supporting non-engineers (e.g., marketers, executives) to manage their own data 30
  • 31. Confidential © Arm 2017Confidential © Arm 2017Confidential © Arm 2017 Thank You! Danke! Merci! 谢谢! ありがとう! Gracias! Kiitos! 31