SlideShare a Scribd company logo
T R E A S U R E D A T A
Presto At Treasure Data
Presto Meetup @ Tokyo - June 15, 2017
Taro L. Saito - GitHub:@xerial
Ph.D., Software Engineer at Treasure Data, Inc.
1
Presto Usage at Treasure Data (2017)
Processing 15 Trillion Rows / Day 

(= 173 Million Rows / sec.)
150,000~ Queries / Day
1,500~ Users
Hosting Presto as a service for 3 years
2
Configurations
• Hosted on AWS (us-east), AWS Tokyo, IDCF (Japan)
• Multi-Tenancy Clusters
• PlazmaDB
• Storage: Amazon S3 or RiakCS
• S3 file indexes: PostgreSQL
• Storage format: Columnar Message Pack (MPC)
• MessagePack: Self-type describing format.
• Compact: 10x compression ratio from the original input data (JSON)
• 200GB JVM memory per node
• To support varieties of query usage
• Estimating required memory in advance is difficult
• For avoiding WAITING_FOR_MEMORY state that blocks the entire query processing
• In small-memory configuration, major GCs was quite frequent
3
Challenges
• Major Complaint
• Presto is slower than usual
• Only 20% of 150,000 queries are using our scheduling feature
• However, 85% of queries are actually scheduled by user scripts or third-party tools 

• How can we know the expected performance?
• (Implicit) Service Level Objectives (SLOs)
4
Understanding Implicit SLOs
• We usually looked into slow queries to figure out the performance bottlenecks.
• However analyzing SQL takes a long time
• Because we need to understand the meaning of the data.
• Understanding a hundred lines of SQL is painful
• Created Presto Query Tuning Guides:
• Presto Query FAQs: https://docs.treasuredata.com/articles/presto-query-faq
• Expectations to Performance
• Scheduled queries: We can estimate SLOs from historical stats
• Scheduled, but submitted from third-party tools or user scripts
• How do we know the expected performance?
• We need to internalize customer’s knowledge on query performance
5
• Bad:
• Collecting stdout/stderr logs of Presto
• Good:
• Collecting logs in a queryable format with Presto
• Collecting Query Event Logs to Treasure Data
• Presto Event Listener -> fluentd -> Treasure Data
• Treasure Data
• schema-less: Schema can be automatically generated from the data
• As we add new fields to the event, the schema evolves automatically
• We are collecting every single query log since the beginning of the Presto service
Our Approach: Data-Driven Improvement
Query Logs
Store
Analyze
SQL
Improve & Optimize
6
Query Event Logs
• Query Completion
• queryId, user id, session parameters, etc.
• Query stats: running time, total rows, bytes, splits, CPU time, etc.
• SQL statement
• Split Completion
• Running time, Processed rows, bytes, etc.
• S3 GET access count, read bytes
• Table Scan
• Accessed tables names, column sets
• Accessed time ranges (e.g., queries looking at data of past 1 hour, 7 days, etc.)
• Filtering conditions (predicate)
7
Clustering Queries with Query Signature
• Finding Implicit SLOs
• Need to classify 85% of scheduled queries
• Extracting Query Signatures
• Simplify complex SQL expressions into a
tiny SQL representation
• Reusing ANTLR parser of Presto
• Query Signature Example:
• S[Cnt](J(T1,G(S[Cnt](T2))))
• SELET count(a),... FROM T1 

JOIN (SELECT count(b),... FROM T2 GROUP BY x)
8
Implicit SLOs
• Collect the historical query running times
• Queries that have the same query signature
• Median-absolute deviation (MAD): the deviation of (running time - median)^2
• CoV: Coefficient of variation = MAD / median
• If CoV > 1, the query running time tends to vary
• If CoV < 1, median of historical running time is useful for query running time
estimation.
• SLO violation:
• If query is running longer than median + MAD
• Customer feels query is slower than usual
• However, query might be processing much more data than usual
• Normalization based on the processing data size is also necessary
9
Typical Performance Bottlenecks
• Huge Queries
• Frequent S3 access, wide table scans
• Single-node operators
• order by, window function, count(distinct x), processing skewed data, etc.
• Ill-performing worker nodes
• Heavy load on a single worker node
• Insufficient pool memory
• Major/full GCs
• We are using min.error-duration = 2m, but GC pause can be longer
• Too much resource usage
• A single query occupies the entire cluster
• e.g., A query with hundreds of query stages!
10
Split Resource Manager
• Problem: A singe query can occupy the entire cluster resource
• But Presto has a limited performance control
• Only for cpu time, memory usage, and concurrent queries (CQ) limits
• No throttling nor boosting
• Created Split Resource Manger
• Limiting the max runnable splits for each customer
• Using a custom RemoteTask class, which adds an wait if no splits are available
• => Efficient Use of Multi-Tenancy Cluster
11
Presto Ops Robot
• Problem: Insufficient memory of a worker
• Queries using that worker node enter WAITING_FOR_MEMORY state
• Report JMX metrics -> fluentd -> DataDog -> Trigger Alert -> Presto Ops Robot
• Presto Ops Robot
• Sending graceful shutdown command (POST SHUTTING_DOWN message to /v1/status)
• or kill memory consuming queries in the worker node
• Restarting worker JVM process
• At least every 1 week, to avoid any issues when running JVM for a long time
• Resetting any effect caused by unknown bugs
• Useful for cleaning up untracked memory (e.g., ANTLR objects, etc.)
12
S3 Access Performance
• Problem: Slow Table Scan
• S3 GET request has constant latency
• 30ms ~ 50ms latency regardless of the read size (up to 8KB read)
• Request retry on 500 (unavailable) or 503 (Slowdown) is also necessary
• Reading small header part of S3 objects can be the majority of query processing time
• Columnar format: header + column blocks
• IO Manager:
• Need to send as many S3 GET requests as possible
• 1 split = multiple S3 objects
• Pipelining S3 GET requests and column reads
13
Presto Stella: Plazma Storage Optimizer
• Problem:
• Some query reads 1 million partitions <- S3 latency overhead is quite high
• Data from mobile applications often have wide-range of time values.
• Presto Stella Connector
• Using Presto for optimizing physical storage partitions
• Input records: File list on S3
• Table writer stage: Merges fragmented partitions, and upload them to S3
• Commit: Update S3 file indexes on PostgreSQL (in an atomic transaction)
• Performance Improvement
• e.g. 10,000 partitions (30 sec.) -> 20 partitions (1.5 sec.)
• 20x performance improvement
• Use Cases
• Maintain fragmented user-defined partitions
• 1-hour partitioning -> more flexible time range partitioning
14
Transitions of Database Usages
15
New Directions Explored By Presto
• Traditional Database Usage
• Required Database Administrator (DBA)
• DBA designs the schema and queries
• DBA tunes query performance
• After Presto
• Schema is designed by data providers
• 1st data (user’s customer data)
• 3rd party data sources
• Analysts or Marketers explore the data with Presto
• Don’t know the schema in advance
• Convenient and low-latency access are necessary
• SQL can be inefficient at first
• While exploring data, SQL can be sophisticated, but not always
16
Prestobase Proxy: Low-Latency Access to Presto
• Needed more interactive experiences of Presto
• Prestobase Proxy: Gateway to Presto Coordinator
• Talks Presto Protocol (/v1/statement/…)
• Written in Scala.
• Runs on Docker
• Based on Finagle (HTTP server written by Twitter)
• Features
• Can work with standard presto clients (e.g., presto-cli, presto-jdbc, presto-odbc, etc.)
• Increased connectivity to BI tools: Tableau, Datorama, ChartIO, Looker, etc.
• Authentication (API key)
• Rewriting nextUri (internal IP address -> external host name)
• BI-tool specific query filters
• etc.
17
Customizing Prestobase Filters
• Prestobase Proxy: Gateway to access Presto
• Adding TD specific binding
• Finagle filters -> Injecting TD Specific filters
• Using Airframe, dependent injection library for Scala
18
Airframe
• http://wvlet.org/airframe
• Three step DI in Scala
• Bind
• Design
• Build
• Built-in life cycle manager
• Session start/shutdown
• examples:
• Open/close Presto connection
• Shutting down Presto server
• etc.
• Session
• Manage singletons and binding rules
19
VCR Record/Replay for Testing Presto
• Launching Presto requires a lot of memory (e.g., 2GB or more)
• Often crashes CI service containers (TravisCI, CircleCI, etc.)
• Recording Presto responses (prestobase-vcr)
• with sqlite-jdbc: https://github.com/xerial/sqlite-jdbc
• DB file for each test suite
• Enabled small-memory footprint testing
• Can run many Presto tests in CI
20
Optimizing QueryResults Transfer in Prestobase
• Accept: application/x-msgpack
• HTTP header
• Returning Presto query result rows in MessagePack format
• QueryResults object
• Contains Array<Array<Object>> => MessagePack (compact binary)
• Encoding QueryResults objects using MessagePack/Jackson
• https://github.com/msgpack/msgpack-java
• Presto client doesn’t need to parse the row part
• 1.5x ~ 2.0x performance improvement for streaming query results
21
Prestobase Modules
• prestobase-proxy
• Proxy server to access Presto with authentication
• prestobase-agent
• Agent for running Presto queries and storing their results
• prestobase-vcr
• For recording/replaying Presto responses
• prestobase-codec
• MessagePack codec of Presto query responses
• prestobase-hq (headquarter)
• Presto usage analysis pipelines, SLO monitoring, etc.
• prestobase-conductor
• Multi Presto cluster management tool
• td-prestobase
• Treasure Data specific bindings of prestobase
• TD Authentication, job logging/monitoring
• BI tool specific filters (Tableau, Looker, etc.)
22
Bridging Gaps Between SQL and Programming Language
• Traditional Approach
• OR-Mapper: app developer design objects and schema, then generate SQLs
• New Approach: SQL First
• Need to manage various SQL results inside Programming Language
• prestobase-hq
• Need to manage hundreds of SQLs and their results
• SLO analysis, query performance analysis, etc.
• But How?
23
sbt-sql: https://github.com/xerial/sbt-sql
• Scala SBT plugin for generating model classes from SQL files
• src/main/sql/presto/*.sql (Presto Queries)
• Using SQL as a function
• Read Presto SQL Results as Objects
• Enabled managing SQL queries in GitHub
• Type-safe data analysis in prestobase-hq
24
Big Challenge: Splitting Huge Queries
• Table Scan Log Analysis
• Revealed most of customers are scanning the same data over and over
• Optimizing SQL is not the major concern.
• Analyzing data has higher priority
• Splitting a huge query into scheduled hourly/daily jobs
• digdag: Open-source workflow engine
• http://digdag.io
• YAML-based task definition
• Scheduling, run Presto queries
• Easy to use
25
Time Range Primitives
• TD_TIME_RANGE(time, ‘2017-06-15’, ’2017-06-16’, ‘PDT’)
• Most frequently used UDF, but inconvenient
• Use short description of relative time ranges
• 1d (1 day)
• 7d (7 days)
• 1h (1 hour)
• 1w (1 week)
• 1M (1 month)
• today, yeasterday, lastWeek, thisWeek, etc.
• Recent data access
• 1dU (1 day until now) => TD_TIME_RANGE(time, ‘2017-06-15’, null, ‘JST’) open range
• Splitting ranges
• 1w.splitIntoDays
26
MessageFrame (In Design)
• Next-generation Tabular Data Format
• Hybrid layout:
• row-oriented: for streaming. Quick write
• column-oriented: better compression & fast read
• Specification Layers
• Layer-0 (basic specs: Keep it simple stupid)
• Data type: MessagePack
• Compression codec: raw, delta, gzip, (snappy, zstd? etc.)
• Column metadata: min/max/sum values of columns
• Layer-1 (advanced compression)
• Layer-N should be convertible to Layer-0
27
Summary
• Managing Implicit SLOs
• Data-oriented approach: Presto -> Fluentd -> Treasure Data -> Presto
• SQL clustering -> Find a bottleneck -> Optimize it!
• Optimization approaches
• Split usage control, Presto Ops Robot, Stella partition optimizer
• Low-latency access by Prestobase
• Workflow
• On-going Work
• Physical storage optimization (Stella)
• Huge query optimization
• Incremental Processing Support
• DigDag workflow
• MessageFrame
28
https://www.treasuredata.com/company/careers/
T R E A S U R E D A T A
29

More Related Content

What's hot

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm Chandler Huang
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Jim Mlodgenski
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
Tathastu.ai
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringTaro L. Saito
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
Julian Hyde
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
SANG WON PARK
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
Databricks
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
Ryan Blue
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
Bill Liu
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Databricks
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
aiuy
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
SANG WON PARK
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
Altinity Ltd
 
Map reduce vs spark
Map reduce vs sparkMap reduce vs spark
Map reduce vs spark
Tudor Lapusan
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
Mayur Rathod
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
Databricks
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
NAVER D2
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
StampedeCon
 

What's hot (20)

Introduction to Storm
Introduction to Storm Introduction to Storm
Introduction to Storm
 
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQLTop 10 Mistakes When Migrating From Oracle to PostgreSQL
Top 10 Mistakes When Migrating From Oracle to PostgreSQL
 
Building robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and DebeziumBuilding robust CDC pipeline with Apache Hudi and Debezium
Building robust CDC pipeline with Apache Hudi and Debezium
 
Presto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoringPresto as a Service - Tips for operation and monitoring
Presto as a Service - Tips for operation and monitoring
 
SQL on everything, in memory
SQL on everything, in memorySQL on everything, in memory
SQL on everything, in memory
 
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Understanding and Improving Code Generation
Understanding and Improving Code GenerationUnderstanding and Improving Code Generation
Understanding and Improving Code Generation
 
Parquet performance tuning: the missing guide
Parquet performance tuning: the missing guideParquet performance tuning: the missing guide
Parquet performance tuning: the missing guide
 
Building large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudiBuilding large scale transactional data lake using apache hudi
Building large scale transactional data lake using apache hudi
 
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F... Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
Scalable Monitoring Using Prometheus with Apache Spark Clusters with Diane F...
 
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
DataFusion-and-Arrow_Supercharge-Your-Data-Analytical-Tool-with-a-Rusty-Query...
 
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...Apache kafka performance(throughput) - without data loss and guaranteeing dat...
Apache kafka performance(throughput) - without data loss and guaranteeing dat...
 
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander ZaitsevClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
ClickHouse in Real Life. Case Studies and Best Practices, by Alexander Zaitsev
 
Map reduce vs spark
Map reduce vs sparkMap reduce vs spark
Map reduce vs spark
 
ElasticSearch Basic Introduction
ElasticSearch Basic IntroductionElasticSearch Basic Introduction
ElasticSearch Basic Introduction
 
The Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization OpportunitiesThe Parquet Format and Performance Optimization Opportunities
The Parquet Format and Performance Optimization Opportunities
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기[245] presto 내부구조 파헤치기
[245] presto 내부구조 파헤치기
 
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
Choosing an HDFS data storage format- Avro vs. Parquet and more - StampedeCon...
 

Similar to Presto At Treasure Data

Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
Treasure Data, Inc.
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
Treasure Data, Inc.
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1sqlserver.co.il
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
SATOSHI TAGOMORI
 
SQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should KnowSQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should Know
Dean Richards
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
BIOVIA
 
OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...
OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...
OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...
NETWAYS
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
javier ramirez
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
BIOVIA
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildTim Vaillancourt
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
DataWorks Summit
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACKristofferson A
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
Kai Sasaki
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
javier ramirez
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
Stuart (Pid) Williams
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
Brian Culver
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
MongoDB
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
Keeyong Han
 
computer networking
computer networkingcomputer networking
computer networking
seyvan rahimi
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Jignesh Shah
 

Similar to Presto At Treasure Data (20)

Internals of Presto Service
Internals of Presto ServiceInternals of Presto Service
Internals of Presto Service
 
Presto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @FacebookPresto meetup 2015-03-19 @Facebook
Presto meetup 2015-03-19 @Facebook
 
SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1SQL Explore 2012: P&T Part 1
SQL Explore 2012: P&T Part 1
 
Overview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data ServiceOverview of data analytics service: Treasure Data Service
Overview of data analytics service: Treasure Data Service
 
SQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should KnowSQL Server Wait Types Everyone Should Know
SQL Server Wait Types Everyone Should Know
 
(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance(ATS6-PLAT06) Maximizing AEP Performance
(ATS6-PLAT06) Maximizing AEP Performance
 
OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...
OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...
OSDC 2015: Tudor Golubenco | Application Performance Management with Packetbe...
 
QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...QuestDB: ingesting a million time series per second on a single instance. Big...
QuestDB: ingesting a million time series per second on a single instance. Big...
 
(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management(ATS4-PLAT08) Server Pool Management
(ATS4-PLAT08) Server Pool Management
 
Monitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the WildMonitoring MongoDB’s Engines in the Wild
Monitoring MongoDB’s Engines in the Wild
 
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive MetastoreOracleStore: A Highly Performant RawStore Implementation for Hive Metastore
OracleStore: A Highly Performant RawStore Implementation for Hive Metastore
 
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RACPerformance Scenario: Diagnosing and resolving sudden slow down on two node RAC
Performance Scenario: Diagnosing and resolving sudden slow down on two node RAC
 
Optimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud StorageOptimizing Presto Connector on Cloud Storage
Optimizing Presto Connector on Cloud Storage
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Fastest Servlets in the West
Fastest Servlets in the WestFastest Servlets in the West
Fastest Servlets in the West
 
Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!Boost the Performance of SharePoint Today!
Boost the Performance of SharePoint Today!
 
Hardware Provisioning
Hardware ProvisioningHardware Provisioning
Hardware Provisioning
 
AWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data AnalyticsAWS Redshift Introduction - Big Data Analytics
AWS Redshift Introduction - Big Data Analytics
 
computer networking
computer networkingcomputer networking
computer networking
 
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte DataProblems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
Problems with PostgreSQL on Multi-core Systems with MultiTerabyte Data
 

More from Taro L. Saito

Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Taro L. Saito
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Taro L. Saito
 
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Taro L. Saito
 
Airframe RPC
Airframe RPCAirframe RPC
Airframe RPC
Taro L. Saito
 
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
Taro L. Saito
 
Airframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpecAirframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpec
Taro L. Saito
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
Taro L. Saito
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
Taro L. Saito
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
Taro L. Saito
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Taro L. Saito
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Taro L. Saito
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
Taro L. Saito
 
Learning Silicon Valley Culture
Learning Silicon Valley CultureLearning Silicon Valley Culture
Learning Silicon Valley Culture
Taro L. Saito
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
Taro L. Saito
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
Taro L. Saito
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
Taro L. Saito
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
Taro L. Saito
 
Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例
Taro L. Saito
 
JNuma Library
JNuma LibraryJNuma Library
JNuma Library
Taro L. Saito
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
Taro L. Saito
 

More from Taro L. Saito (20)

Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021Unifying Frontend and Backend Development with Scala - ScalaCon 2021
Unifying Frontend and Backend Development with Scala - ScalaCon 2021
 
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
Journey of Migrating 1 Million Presto Queries - Presto Webinar 2020
 
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
Scala for Everything: From Frontend to Backend Applications - Scala Matsuri 2020
 
Airframe RPC
Airframe RPCAirframe RPC
Airframe RPC
 
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
td-spark internals: Extending Spark with Airframe - Spark Meetup Tokyo #3 2020
 
Airframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpecAirframe Meetup #3: 2019 Updates & AirSpec
Airframe Meetup #3: 2019 Updates & AirSpec
 
Presto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 UpdatesPresto At Arm Treasure Data - 2019 Updates
Presto At Arm Treasure Data - 2019 Updates
 
Reading The Source Code of Presto
Reading The Source Code of PrestoReading The Source Code of Presto
Reading The Source Code of Presto
 
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure DataHow To Use Scala At Work - Airframe In Action at Arm Treasure Data
How To Use Scala At Work - Airframe In Action at Arm Treasure Data
 
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
Airframe: Lightweight Building Blocks for Scala - Scale By The Bay 2018
 
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
Airframe: Lightweight Building Blocks for Scala @ TD Tech Talk 2018-10-17
 
Tips For Maintaining OSS Projects
Tips For Maintaining OSS ProjectsTips For Maintaining OSS Projects
Tips For Maintaining OSS Projects
 
Learning Silicon Valley Culture
Learning Silicon Valley CultureLearning Silicon Valley Culture
Learning Silicon Valley Culture
 
Scala at Treasure Data
Scala at Treasure DataScala at Treasure Data
Scala at Treasure Data
 
Introduction to Presto at Treasure Data
Introduction to Presto at Treasure DataIntroduction to Presto at Treasure Data
Introduction to Presto at Treasure Data
 
Workflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. TokyoWorkflow Hacks #1 - dots. Tokyo
Workflow Hacks #1 - dots. Tokyo
 
Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015Presto @ Treasure Data - Presto Meetup Boston 2015
Presto @ Treasure Data - Presto Meetup Boston 2015
 
Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例Presto As A Service - Treasure DataでのPresto運用事例
Presto As A Service - Treasure DataでのPresto運用事例
 
JNuma Library
JNuma LibraryJNuma Library
JNuma Library
 
Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編Treasure Dataを支える技術 - MessagePack編
Treasure Dataを支える技術 - MessagePack編
 

Recently uploaded

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
Uni Systems S.M.S.A.
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
Neo4j
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 

Recently uploaded (20)

Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Microsoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdfMicrosoft - Power Platform_G.Aspiotis.pdf
Microsoft - Power Platform_G.Aspiotis.pdf
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
GraphSummit Singapore | Enhancing Changi Airport Group's Passenger Experience...
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 

Presto At Treasure Data

  • 1. T R E A S U R E D A T A Presto At Treasure Data Presto Meetup @ Tokyo - June 15, 2017 Taro L. Saito - GitHub:@xerial Ph.D., Software Engineer at Treasure Data, Inc. 1
  • 2. Presto Usage at Treasure Data (2017) Processing 15 Trillion Rows / Day 
 (= 173 Million Rows / sec.) 150,000~ Queries / Day 1,500~ Users Hosting Presto as a service for 3 years 2
  • 3. Configurations • Hosted on AWS (us-east), AWS Tokyo, IDCF (Japan) • Multi-Tenancy Clusters • PlazmaDB • Storage: Amazon S3 or RiakCS • S3 file indexes: PostgreSQL • Storage format: Columnar Message Pack (MPC) • MessagePack: Self-type describing format. • Compact: 10x compression ratio from the original input data (JSON) • 200GB JVM memory per node • To support varieties of query usage • Estimating required memory in advance is difficult • For avoiding WAITING_FOR_MEMORY state that blocks the entire query processing • In small-memory configuration, major GCs was quite frequent 3
  • 4. Challenges • Major Complaint • Presto is slower than usual • Only 20% of 150,000 queries are using our scheduling feature • However, 85% of queries are actually scheduled by user scripts or third-party tools 
 • How can we know the expected performance? • (Implicit) Service Level Objectives (SLOs) 4
  • 5. Understanding Implicit SLOs • We usually looked into slow queries to figure out the performance bottlenecks. • However analyzing SQL takes a long time • Because we need to understand the meaning of the data. • Understanding a hundred lines of SQL is painful • Created Presto Query Tuning Guides: • Presto Query FAQs: https://docs.treasuredata.com/articles/presto-query-faq • Expectations to Performance • Scheduled queries: We can estimate SLOs from historical stats • Scheduled, but submitted from third-party tools or user scripts • How do we know the expected performance? • We need to internalize customer’s knowledge on query performance 5
  • 6. • Bad: • Collecting stdout/stderr logs of Presto • Good: • Collecting logs in a queryable format with Presto • Collecting Query Event Logs to Treasure Data • Presto Event Listener -> fluentd -> Treasure Data • Treasure Data • schema-less: Schema can be automatically generated from the data • As we add new fields to the event, the schema evolves automatically • We are collecting every single query log since the beginning of the Presto service Our Approach: Data-Driven Improvement Query Logs Store Analyze SQL Improve & Optimize 6
  • 7. Query Event Logs • Query Completion • queryId, user id, session parameters, etc. • Query stats: running time, total rows, bytes, splits, CPU time, etc. • SQL statement • Split Completion • Running time, Processed rows, bytes, etc. • S3 GET access count, read bytes • Table Scan • Accessed tables names, column sets • Accessed time ranges (e.g., queries looking at data of past 1 hour, 7 days, etc.) • Filtering conditions (predicate) 7
  • 8. Clustering Queries with Query Signature • Finding Implicit SLOs • Need to classify 85% of scheduled queries • Extracting Query Signatures • Simplify complex SQL expressions into a tiny SQL representation • Reusing ANTLR parser of Presto • Query Signature Example: • S[Cnt](J(T1,G(S[Cnt](T2)))) • SELET count(a),... FROM T1 
 JOIN (SELECT count(b),... FROM T2 GROUP BY x) 8
  • 9. Implicit SLOs • Collect the historical query running times • Queries that have the same query signature • Median-absolute deviation (MAD): the deviation of (running time - median)^2 • CoV: Coefficient of variation = MAD / median • If CoV > 1, the query running time tends to vary • If CoV < 1, median of historical running time is useful for query running time estimation. • SLO violation: • If query is running longer than median + MAD • Customer feels query is slower than usual • However, query might be processing much more data than usual • Normalization based on the processing data size is also necessary 9
  • 10. Typical Performance Bottlenecks • Huge Queries • Frequent S3 access, wide table scans • Single-node operators • order by, window function, count(distinct x), processing skewed data, etc. • Ill-performing worker nodes • Heavy load on a single worker node • Insufficient pool memory • Major/full GCs • We are using min.error-duration = 2m, but GC pause can be longer • Too much resource usage • A single query occupies the entire cluster • e.g., A query with hundreds of query stages! 10
  • 11. Split Resource Manager • Problem: A singe query can occupy the entire cluster resource • But Presto has a limited performance control • Only for cpu time, memory usage, and concurrent queries (CQ) limits • No throttling nor boosting • Created Split Resource Manger • Limiting the max runnable splits for each customer • Using a custom RemoteTask class, which adds an wait if no splits are available • => Efficient Use of Multi-Tenancy Cluster 11
  • 12. Presto Ops Robot • Problem: Insufficient memory of a worker • Queries using that worker node enter WAITING_FOR_MEMORY state • Report JMX metrics -> fluentd -> DataDog -> Trigger Alert -> Presto Ops Robot • Presto Ops Robot • Sending graceful shutdown command (POST SHUTTING_DOWN message to /v1/status) • or kill memory consuming queries in the worker node • Restarting worker JVM process • At least every 1 week, to avoid any issues when running JVM for a long time • Resetting any effect caused by unknown bugs • Useful for cleaning up untracked memory (e.g., ANTLR objects, etc.) 12
  • 13. S3 Access Performance • Problem: Slow Table Scan • S3 GET request has constant latency • 30ms ~ 50ms latency regardless of the read size (up to 8KB read) • Request retry on 500 (unavailable) or 503 (Slowdown) is also necessary • Reading small header part of S3 objects can be the majority of query processing time • Columnar format: header + column blocks • IO Manager: • Need to send as many S3 GET requests as possible • 1 split = multiple S3 objects • Pipelining S3 GET requests and column reads 13
  • 14. Presto Stella: Plazma Storage Optimizer • Problem: • Some query reads 1 million partitions <- S3 latency overhead is quite high • Data from mobile applications often have wide-range of time values. • Presto Stella Connector • Using Presto for optimizing physical storage partitions • Input records: File list on S3 • Table writer stage: Merges fragmented partitions, and upload them to S3 • Commit: Update S3 file indexes on PostgreSQL (in an atomic transaction) • Performance Improvement • e.g. 10,000 partitions (30 sec.) -> 20 partitions (1.5 sec.) • 20x performance improvement • Use Cases • Maintain fragmented user-defined partitions • 1-hour partitioning -> more flexible time range partitioning 14
  • 16. New Directions Explored By Presto • Traditional Database Usage • Required Database Administrator (DBA) • DBA designs the schema and queries • DBA tunes query performance • After Presto • Schema is designed by data providers • 1st data (user’s customer data) • 3rd party data sources • Analysts or Marketers explore the data with Presto • Don’t know the schema in advance • Convenient and low-latency access are necessary • SQL can be inefficient at first • While exploring data, SQL can be sophisticated, but not always 16
  • 17. Prestobase Proxy: Low-Latency Access to Presto • Needed more interactive experiences of Presto • Prestobase Proxy: Gateway to Presto Coordinator • Talks Presto Protocol (/v1/statement/…) • Written in Scala. • Runs on Docker • Based on Finagle (HTTP server written by Twitter) • Features • Can work with standard presto clients (e.g., presto-cli, presto-jdbc, presto-odbc, etc.) • Increased connectivity to BI tools: Tableau, Datorama, ChartIO, Looker, etc. • Authentication (API key) • Rewriting nextUri (internal IP address -> external host name) • BI-tool specific query filters • etc. 17
  • 18. Customizing Prestobase Filters • Prestobase Proxy: Gateway to access Presto • Adding TD specific binding • Finagle filters -> Injecting TD Specific filters • Using Airframe, dependent injection library for Scala 18
  • 19. Airframe • http://wvlet.org/airframe • Three step DI in Scala • Bind • Design • Build • Built-in life cycle manager • Session start/shutdown • examples: • Open/close Presto connection • Shutting down Presto server • etc. • Session • Manage singletons and binding rules 19
  • 20. VCR Record/Replay for Testing Presto • Launching Presto requires a lot of memory (e.g., 2GB or more) • Often crashes CI service containers (TravisCI, CircleCI, etc.) • Recording Presto responses (prestobase-vcr) • with sqlite-jdbc: https://github.com/xerial/sqlite-jdbc • DB file for each test suite • Enabled small-memory footprint testing • Can run many Presto tests in CI 20
  • 21. Optimizing QueryResults Transfer in Prestobase • Accept: application/x-msgpack • HTTP header • Returning Presto query result rows in MessagePack format • QueryResults object • Contains Array<Array<Object>> => MessagePack (compact binary) • Encoding QueryResults objects using MessagePack/Jackson • https://github.com/msgpack/msgpack-java • Presto client doesn’t need to parse the row part • 1.5x ~ 2.0x performance improvement for streaming query results 21
  • 22. Prestobase Modules • prestobase-proxy • Proxy server to access Presto with authentication • prestobase-agent • Agent for running Presto queries and storing their results • prestobase-vcr • For recording/replaying Presto responses • prestobase-codec • MessagePack codec of Presto query responses • prestobase-hq (headquarter) • Presto usage analysis pipelines, SLO monitoring, etc. • prestobase-conductor • Multi Presto cluster management tool • td-prestobase • Treasure Data specific bindings of prestobase • TD Authentication, job logging/monitoring • BI tool specific filters (Tableau, Looker, etc.) 22
  • 23. Bridging Gaps Between SQL and Programming Language • Traditional Approach • OR-Mapper: app developer design objects and schema, then generate SQLs • New Approach: SQL First • Need to manage various SQL results inside Programming Language • prestobase-hq • Need to manage hundreds of SQLs and their results • SLO analysis, query performance analysis, etc. • But How? 23
  • 24. sbt-sql: https://github.com/xerial/sbt-sql • Scala SBT plugin for generating model classes from SQL files • src/main/sql/presto/*.sql (Presto Queries) • Using SQL as a function • Read Presto SQL Results as Objects • Enabled managing SQL queries in GitHub • Type-safe data analysis in prestobase-hq 24
  • 25. Big Challenge: Splitting Huge Queries • Table Scan Log Analysis • Revealed most of customers are scanning the same data over and over • Optimizing SQL is not the major concern. • Analyzing data has higher priority • Splitting a huge query into scheduled hourly/daily jobs • digdag: Open-source workflow engine • http://digdag.io • YAML-based task definition • Scheduling, run Presto queries • Easy to use 25
  • 26. Time Range Primitives • TD_TIME_RANGE(time, ‘2017-06-15’, ’2017-06-16’, ‘PDT’) • Most frequently used UDF, but inconvenient • Use short description of relative time ranges • 1d (1 day) • 7d (7 days) • 1h (1 hour) • 1w (1 week) • 1M (1 month) • today, yeasterday, lastWeek, thisWeek, etc. • Recent data access • 1dU (1 day until now) => TD_TIME_RANGE(time, ‘2017-06-15’, null, ‘JST’) open range • Splitting ranges • 1w.splitIntoDays 26
  • 27. MessageFrame (In Design) • Next-generation Tabular Data Format • Hybrid layout: • row-oriented: for streaming. Quick write • column-oriented: better compression & fast read • Specification Layers • Layer-0 (basic specs: Keep it simple stupid) • Data type: MessagePack • Compression codec: raw, delta, gzip, (snappy, zstd? etc.) • Column metadata: min/max/sum values of columns • Layer-1 (advanced compression) • Layer-N should be convertible to Layer-0 27
  • 28. Summary • Managing Implicit SLOs • Data-oriented approach: Presto -> Fluentd -> Treasure Data -> Presto • SQL clustering -> Find a bottleneck -> Optimize it! • Optimization approaches • Split usage control, Presto Ops Robot, Stella partition optimizer • Low-latency access by Prestobase • Workflow • On-going Work • Physical storage optimization (Stella) • Huge query optimization • Incremental Processing Support • DigDag workflow • MessageFrame 28 https://www.treasuredata.com/company/careers/
  • 29. T R E A S U R E D A T A 29