SlideShare a Scribd company logo
1 of 34
Download to read offline
RMIT Classification: Trusted
Steering Query Optimizers: A Practical Take on Big
Data Workloads
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska,
Marc Friedman, Alekh Jindal
Microsoft, MIT & Intel Lab.
SIGMOD’21
Presented by Hai Lan
RMIT Classification: Trusted
Outline
• Background
• Optimizer & Learned methods on optimizer
• Bao (SIGMOD’21)
• To-be-shared work
• Scope optimizer & workload
• Motivation & Goal
• Method
• Discussions
19/8/21 Group meeting 2
RMIT Classification: Trusted
Background
19/8/21 Group meeting 3
RMIT Classification: Trusted
Background – Optimizer
19/8/21 Group meeting 4
Query
Parser
AST
Query Rewrite
AST’
Optimizer
Phy. Plan
Executor
Results
Logical Opt.
Physical Opt.
Log. Plan
Phy. Plan
Life of A Query
RMIT Classification: Trusted
Background – Optimizer
19/8/21 Group meeting 5
Query
Parser
AST
Query Rewrite
AST’
Optimizer
Phy. Plan
Executor
Results
Logical Opt.
Physical Opt.
Log. Plan
Phy. Plan
Life of A Query Two Representative Arch.
Volcano
RMIT Classification: Trusted
Background – Optimizer
19/8/21 Group meeting 6
Query
Parser
AST
Query Rewrite
AST’
Optimizer
Phy. Plan
Executor
Results
Logical Opt.
Physical Opt.
Log. Plan
Phy. Plan
Life of A Query Two Representative Arch.
Cascades
Volcano
RMIT Classification: Trusted
Background – Keys in Optimizer
19/8/21 Group meeting 7
Cardinality Estimation
Plan Enumeration (Join Order)
Cost Model
A structure to store the table statistics, e.g., sample,
histogram, sketch.
Evaluation model, e.g., evaluate on sample, assumptions when
using histogram.
Cardinality Estimation
Predefined parameters, which are related to physical operators, running env.
Cost Model
Large join query
RMIT Classification: Trusted
Background – Keys in Optimizer
19/8/21 Group meeting 8
Cardinality Estimation
Plan Enumeration (Join Order)
Cost Model
A structure to store the table statistics, e.g., sample,
histogram, sketch.
Evaluation model, e.g., evaluate on sample, assumptions when
using histogram.
Cardinality Estimation
Predefined parameters, which are related to physical operator, environment.
Cost Model
Large join query
The root of all evil, the Achilles Heel of query optimization, is the
estimation of the size of intermediate results, known as cardinalities.
-- Guy Lohman
RMIT Classification: Trusted
19/8/21 Group meeting 9
Learned model to estimate
the cardinality.
Learned model to get the cost8,9.
Reinforcement learning methods to obtain the join order10,11.
Query-driven1,2,3
Data-driven 4,5,6
Hybrid 7
1. Andreas Kipf et al. : Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CIDR 2019
2. Anshuman Dutt et al. : Selectivity Estimation for Range Predicates using Lightweight Models. Proc. VLDB Endow. 12(9): 1044-1057 (2019)
3. Chenggang Wu et al. : Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12(3): 210-222 (2018)
4. Zongheng Yang et al. : Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. 13(3): 279-292 (2019)
5. Benjamin Hilprecht et al. : DeepDB: Learn from Data, not from Queries! Proc. VLDB Endow. 13(7): 992-1005 (2020)
6. Rong Zhu et al. : FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. Proc. VLDB Endow. 14(9): 1489-1502 (2021)
7. Peizhi Wu, Gao Cong: A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. SIGMOD Conference 2021: 2009-2022
8. Ji Sun, Guoliang Li: An End-to-End Learning-based Cost Estimator. Proc. VLDB Endow. 13(3): 307-319 (2019)
9. Tarique Siddiqui et al. : Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings. SIGMOD Conference 2020: 99-113
10. Sanjay Krishnan et al. : Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196 (2018)
11. Xiang Yu, Guoliang Li, Chengliang Chai, Nan Tang: Reinforcement Learning with Tree-LSTM for Join Order Selection. ICDE 2020: 1297-1308
RMIT Classification: Trusted
Background – Bao1 (Bandit Optimizer)
19/8/21 Group meeting 10
Motivations.
• Due to the inaccurate cardinality estimation, wrong
physical operators may be selected.
• Databases support hints2 to specify some operators.
1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query
Optimization Practical. SIGMOD Conference 2021: 1275-1288
2. Here `hint` is not the same with in TiDB or MySQL.
RMIT Classification: Trusted
Background – Bao1 (Bandit Optimizer)
19/8/21 Group meeting 11
Motivations.
• Due to the inaccurate cardinality estimation, wrong
physical operators may be selected.
• Databases support hints2 to specify some operators.
Bao’s Work.
• It automatically and adaptively determines the right hint set to use for an incoming query.
• Instead of using `cost` in optimizer, users can specify a metric, like running time used in the paper.
1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query
Optimization Practical. SIGMOD Conference 2021: 1275-1288
2. Here `hint` is not the same with in TiDB or MySQL.
48 (26) hint sets
RMIT Classification: Trusted
Background – Bao
19/8/21 Group meeting 12
Method.
• Train a predictive model for the metric.
• When a query coming, select the plan with the lowest cost under the metric.
RMIT Classification: Trusted
Background – Bao
19/8/21 Group meeting 13
Method.
• Train a predictive model for the metric.
• When a query coming, select the plan with the lowest cost under the metric.
Prons & Cons.
• Prons
• Dynamic situations
• Integrate with a real system
• Training time
• Cons
• It cannot specify the subplan hint.
RMIT Classification: Trusted
Steer Scope Optimizer
19/8/21 Group meeting 14
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 15
Scope Optimizer.
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 16
Scope Optimizer.
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
Default
Rules
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 17
Scope Optimizer.
Workload in Scope.
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
Default
Rules
RMIT Classification: Trusted
Scope Overview
19/8/21 Group meeting 18
Scope Optimizer.
Workload in Scope.
• Recurrent jobs, same template with different variables.
• Short & long running jobs.
• 10% of jobs last over 5 min while consume 90% of containers.
• Metrics
• Runtime
• CPU time
• Total I/O time
• Belong to Cascades family.
• 256 rules in total
• Required rules, e.g., EnforceExchange
• Implementation rules, e.g., HashJoinImpl1
• On-by-default rules, e.g., various rewrite rules
• Off-by-default rules, e.g., CorrelatedJoinOnUnion
Default
Rules
RMIT Classification: Trusted
Motivations & Goal
19/8/21 Group meeting 19
Motivations.
• Due to the inaccurate cardinality estimation, wrong rules may be selected.
• Hints to specify which rules to use.
RMIT Classification: Trusted
Motivations & Goal
19/8/21 Group meeting 20
Motivations.
• Due to the inaccurate cardinality estimation, wrong rules may be selected.
• Hints to specify which rules to use.
Goal.
• Output an alternative rule configuration which is better for optimizing this
particular job, and for a given metric
RMIT Classification: Trusted
Motivations & Goal
19/8/21 Group meeting 21
Motivations.
• Due to the inaccurate cardinality estimation, wrong rules may be selected.
• Hints to specify which rules to use.
Relationship with Bao.
• Directly apply Bao on Scope?
• Hint -> Rule; Hint Set -> Rule configuration
• However …
• A lot more rules (200+ vs. 6) -> too many possible rule configurations
• Large workload -> large running time & hundreds of operator nodes.
Goal.
• Output an alternative rule configuration which is better for optimizing this
particular job, and for a given metric
RMIT Classification: Trusted
Rule Signature & Job Span
19/8/21 Group meeting 22
Rule Signature.
• A bit vector specifying which rules directly contribute to the final query plan produced by the
optimizer as the rule signature.
• The rule signature of a query optimized using the default rule configuration as the default
rule signature.
Job Span
• Given a job, its span contains all non-required rules which, if enabled or disabled, can
affect the final query plan.
• Heuristics to generate the span.
RMIT Classification: Trusted
Which rules to try?
19/8/21 Group meeting 23
• Enable all the rules that are not in the span of the given job.
• For each rule category, independently sample a subset of rules from the job span. Disable
these rules, and enable all others. This gives us a new rule configuration.
• If the rule configuration has not been seen before, add it to the candidate list. Repeat until 𝑀
configurations are generated.
𝑀 = 1000
Randomized Configuration Search.
RMIT Classification: Trusted
Which jobs to try?
19/8/21 Group meeting 24
Choose Jobs & Configurations to Execute.
• Select Jobs.
• Jobs with clearly lower costs with recompiled plans under the default cost model.
• Jobs with low cost, high runtimes under the default configuration (cost model is wrong).
• Select Configurations.
• Select the 10 cheapest (cost model) alternative rule configurations and execute them.
Workload B (compare to the default configuration)
RMIT Classification: Trusted
Which jobs to try?
19/8/21 Group meeting 25
Choose Jobs & Configurations to Execute.
• Select Jobs.
• Jobs with clearly lower costs with recompiled plans under the default cost model.
• Jobs with low cost, high runtimes under the default configuration (cost model is wrong).
• Select Configurations.
• Select the 10 cheapest (cost model) alternative rule configurations and execute them.
Workload B (compare to the default configuration)
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 26
Other metrics sometime see regression.
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 27
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 28
RMIT Classification: Trusted
Different metrics
19/8/21 Group meeting 29
All metrics cannot be improved together.
Potentially to adopt different models for each one.
RMIT Classification: Trusted
Extrapolating to other jobs
19/8/21 Group meeting 30
• The rule signature as the level of granularity across which the same set of rule
configurations could be useful.
• Rule signature job group
• The set of jobs whose default rule signature map to the same bit vector.
Idea.
Methods.
• Case 1: simply apply a previously seen rule configuration.
• Case 2: find set of interesting configurations for each job group and adopt a
model to choose one at the compile time.
RMIT Classification: Trusted
Learning Rule Configurations
19/8/21 Group meeting 31
• Select S rule signatures from Workload.
• Collect the jobs whose default rule signature maps to these rule signatures.
• Obtain K candidate configurations for each job group.
• we sample 𝑀 jobs from all the jobs mapping to these job groups.
• execute each of the 𝐾 configurations for every job.
Training Set.
Learning Problem.
• Treat the dataset of samples in each job group as an independent learning problem.
• Goal is to select one of the 𝐾 candidate configurations for a given query.
• Supervised learning to estimate the running time of query under a configuration.
Featurization.
• Job level features, e.g., input cardinality size, hash of template.
• Rule configuration features, e.g., cost of plan, bit vector of RuleDiff.
• Query graph features, e.g., operators’ id, cost.
Learned Models.
• For each job group, a fully connected neural network with one hidden layer of size 1024.
(Job, RuleConf, Running Time)
RMIT Classification: Trusted
19/8/21 Group meeting 32
Learning Rule Configurations
RMIT Classification: Trusted
Discussion
19/8/21 Group meeting 33
Future work.
• Methods to generate the job span & interesting rule configurations.
• Use feedback from the execution results to guide future iterations of the configuration search
• Other configurable options in Scope.
Discussion.
Summary.
• How to choose the right rule configuration for an incoming query.
• Propose rule signature & job span & several heuristics algos to obtain the candidate rule confs.
• Adopt a learning model to choose the rule confs for each job group.
• Papers.
• Methods.
• Model for each group.
• Parameters.
RMIT Classification: Trusted
Q & A
19/8/21 Group meeting 34

More Related Content

What's hot

High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
Saliya Ekanayake
 

What's hot (20)

Tokyo Webmining Talk1
Tokyo Webmining Talk1Tokyo Webmining Talk1
Tokyo Webmining Talk1
 
High Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC ClustersHigh Performance Data Analytics with Java on Large Multicore HPC Clusters
High Performance Data Analytics with Java on Large Multicore HPC Clusters
 
RISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time DecisionsRISELab:Enabling Intelligent Real-Time Decisions
RISELab:Enabling Intelligent Real-Time Decisions
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
 
Implementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on SparkImplementation of linear regression and logistic regression on Spark
Implementation of linear regression and logistic regression on Spark
 
On-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy ModelsOn-Prem Solution for the Selection of Wind Energy Models
On-Prem Solution for the Selection of Wind Energy Models
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
Project Hydrogen: Unifying State-of-the-Art AI and Big Data in Apache Spark w...
 
MLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott ClarkMLConf 2016 SigOpt Talk by Scott Clark
MLConf 2016 SigOpt Talk by Scott Clark
 
How @twitterhadoop chose google cloud
How @twitterhadoop chose google cloudHow @twitterhadoop chose google cloud
How @twitterhadoop chose google cloud
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
Interactive Graph Analytics with Spark-(Daniel Darabos, Lynx Analytics)
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
 
Next generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph labNext generation analytics with yarn, spark and graph lab
Next generation analytics with yarn, spark and graph lab
 
Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)Python in an Evolving Enterprise System (PyData SV 2013)
Python in an Evolving Enterprise System (PyData SV 2013)
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
Addressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandraAddressing performance issues in titan+cassandra
Addressing performance issues in titan+cassandra
 
20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL20181116 Massive Log Processing using I/O optimized PostgreSQL
20181116 Massive Log Processing using I/O optimized PostgreSQL
 

Similar to [Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workloads

Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarks
Tilmann Rabl
 
PostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge basePostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge base
Jeba Ranjani
 

Similar to [Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workloads (20)

Evolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.comEvolving the Optimal Relevancy Ranking Model at Dice.com
Evolving the Optimal Relevancy Ranking Model at Dice.com
 
dd presentation.pdf
dd presentation.pdfdd presentation.pdf
dd presentation.pdf
 
Crafting bigdatabenchmarks
Crafting bigdatabenchmarksCrafting bigdatabenchmarks
Crafting bigdatabenchmarks
 
Advanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise WebinarAdvanced Optimization for the Enterprise Webinar
Advanced Optimization for the Enterprise Webinar
 
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
Evolving The Optimal Relevancy Scoring Model at Dice.com: Presented by Simon ...
 
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
AI as a Service, Build Shared AI Service Platforms Based on Deep Learning Tec...
 
Parallel Rule Generation For Efficient Classification System
Parallel Rule Generation For Efficient Classification SystemParallel Rule Generation For Efficient Classification System
Parallel Rule Generation For Efficient Classification System
 
Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...Scalable Software Testing and Verification of Non-Functional Properties throu...
Scalable Software Testing and Verification of Non-Functional Properties throu...
 
Review of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering AlgorithmReview of Existing Methods in K-means Clustering Algorithm
Review of Existing Methods in K-means Clustering Algorithm
 
Agile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity managementAgile analytics : An exploratory study of technical complexity management
Agile analytics : An exploratory study of technical complexity management
 
Clustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining TechniquesClustering of Big Data Using Different Data-Mining Techniques
Clustering of Big Data Using Different Data-Mining Techniques
 
Tuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep LearningTuning for Systematic Trading: Talk 2: Deep Learning
Tuning for Systematic Trading: Talk 2: Deep Learning
 
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
Scott Clark, Co-Founder and CEO, SigOpt at MLconf SF 2016
 
IRJET- A Comparative Research of Rule based Classification on Dataset using W...
IRJET- A Comparative Research of Rule based Classification on Dataset using W...IRJET- A Comparative Research of Rule based Classification on Dataset using W...
IRJET- A Comparative Research of Rule based Classification on Dataset using W...
 
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
8th TUC Meeting - Tim Hegeman (TU Delft). Social Network Benchmark, Analytics...
 
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
Irjet v4 iA Survey on FP (Growth) Tree using Association Rule Mining7351
 
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERSN ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERS
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
Testing Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal PerspectiveTesting Machine Learning-enabled Systems: A Personal Perspective
Testing Machine Learning-enabled Systems: A Personal Perspective
 
PostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge basePostMining of weighted assosiation rules using knowledge base
PostMining of weighted assosiation rules using knowledge base
 

More from PingCAP

[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases
PingCAP
 

More from PingCAP (20)

[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
[Paper Reading] Efficient Query Processing with Optimistically Compressed Has...
 
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
[Paper Reading]KVSSD: Close integration of LSM trees and flash translation la...
 
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
[Paper Reading]Chucky: A Succinct Cuckoo Filter for LSM-Tree
 
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
[Paper Reading]The Bw-Tree: A B-tree for New Hardware Platforms
 
[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases[Paper Reading] QAGen: Generating query-aware test databases
[Paper Reading] QAGen: Generating query-aware test databases
 
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...[Paper Reading]  Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
[Paper Reading] Leases: An Efficient Fault-Tolerant Mechanism for Distribute...
 
[Paperreading] Paxos made easy (by sen han)
[Paperreading]  Paxos made easy (by sen han)[Paperreading]  Paxos made easy (by sen han)
[Paperreading] Paxos made easy (by sen han)
 
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
[Paper Reading] Generalized Sub-Query Fusion for Eliminating Redundant I/O fr...
 
The Dark Side Of Go -- Go runtime related problems in TiDB in production
The Dark Side Of Go -- Go runtime related problems in TiDB  in productionThe Dark Side Of Go -- Go runtime related problems in TiDB  in production
The Dark Side Of Go -- Go runtime related problems in TiDB in production
 
TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote TiDB DevCon 2020 Opening Keynote
TiDB DevCon 2020 Opening Keynote
 
Finding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management SystemsFinding Logic Bugs in Database Management Systems
Finding Logic Bugs in Database Management Systems
 
Chaos Practice in PingCAP
Chaos Practice in PingCAPChaos Practice in PingCAP
Chaos Practice in PingCAP
 
TiDB at PayPay
TiDB at PayPayTiDB at PayPay
TiDB at PayPay
 
Paper Reading: FPTree
Paper Reading: FPTreePaper Reading: FPTree
Paper Reading: FPTree
 
Paper Reading: Smooth Scan
Paper Reading: Smooth ScanPaper Reading: Smooth Scan
Paper Reading: Smooth Scan
 
Paper Reading: Flexible Paxos
Paper Reading: Flexible PaxosPaper Reading: Flexible Paxos
Paper Reading: Flexible Paxos
 
Paper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in OraclePaper reading: Cost-based Query Transformation in Oracle
Paper reading: Cost-based Query Transformation in Oracle
 
Paper reading: HashKV and beyond
Paper reading: HashKV and beyondPaper reading: HashKV and beyond
Paper reading: HashKV and beyond
 
Paper Reading: Pessimistic Cardinality Estimation
Paper Reading: Pessimistic Cardinality EstimationPaper Reading: Pessimistic Cardinality Estimation
Paper Reading: Pessimistic Cardinality Estimation
 
Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...Building a transactional key-value store that scales to 100+ nodes (percona l...
Building a transactional key-value store that scales to 100+ nodes (percona l...
 

Recently uploaded

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Recently uploaded (20)

Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 

[Paper Reading] Steering Query Optimizers: A Practical Take on Big Data Workloads

  • 1. RMIT Classification: Trusted Steering Query Optimizers: A Practical Take on Big Data Workloads Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal Microsoft, MIT & Intel Lab. SIGMOD’21 Presented by Hai Lan
  • 2. RMIT Classification: Trusted Outline • Background • Optimizer & Learned methods on optimizer • Bao (SIGMOD’21) • To-be-shared work • Scope optimizer & workload • Motivation & Goal • Method • Discussions 19/8/21 Group meeting 2
  • 4. RMIT Classification: Trusted Background – Optimizer 19/8/21 Group meeting 4 Query Parser AST Query Rewrite AST’ Optimizer Phy. Plan Executor Results Logical Opt. Physical Opt. Log. Plan Phy. Plan Life of A Query
  • 5. RMIT Classification: Trusted Background – Optimizer 19/8/21 Group meeting 5 Query Parser AST Query Rewrite AST’ Optimizer Phy. Plan Executor Results Logical Opt. Physical Opt. Log. Plan Phy. Plan Life of A Query Two Representative Arch. Volcano
  • 6. RMIT Classification: Trusted Background – Optimizer 19/8/21 Group meeting 6 Query Parser AST Query Rewrite AST’ Optimizer Phy. Plan Executor Results Logical Opt. Physical Opt. Log. Plan Phy. Plan Life of A Query Two Representative Arch. Cascades Volcano
  • 7. RMIT Classification: Trusted Background – Keys in Optimizer 19/8/21 Group meeting 7 Cardinality Estimation Plan Enumeration (Join Order) Cost Model A structure to store the table statistics, e.g., sample, histogram, sketch. Evaluation model, e.g., evaluate on sample, assumptions when using histogram. Cardinality Estimation Predefined parameters, which are related to physical operators, running env. Cost Model Large join query
  • 8. RMIT Classification: Trusted Background – Keys in Optimizer 19/8/21 Group meeting 8 Cardinality Estimation Plan Enumeration (Join Order) Cost Model A structure to store the table statistics, e.g., sample, histogram, sketch. Evaluation model, e.g., evaluate on sample, assumptions when using histogram. Cardinality Estimation Predefined parameters, which are related to physical operator, environment. Cost Model Large join query The root of all evil, the Achilles Heel of query optimization, is the estimation of the size of intermediate results, known as cardinalities. -- Guy Lohman
  • 9. RMIT Classification: Trusted 19/8/21 Group meeting 9 Learned model to estimate the cardinality. Learned model to get the cost8,9. Reinforcement learning methods to obtain the join order10,11. Query-driven1,2,3 Data-driven 4,5,6 Hybrid 7 1. Andreas Kipf et al. : Learned Cardinalities: Estimating Correlated Joins with Deep Learning. CIDR 2019 2. Anshuman Dutt et al. : Selectivity Estimation for Range Predicates using Lightweight Models. Proc. VLDB Endow. 12(9): 1044-1057 (2019) 3. Chenggang Wu et al. : Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12(3): 210-222 (2018) 4. Zongheng Yang et al. : Deep Unsupervised Cardinality Estimation. Proc. VLDB Endow. 13(3): 279-292 (2019) 5. Benjamin Hilprecht et al. : DeepDB: Learn from Data, not from Queries! Proc. VLDB Endow. 13(7): 992-1005 (2020) 6. Rong Zhu et al. : FLAT: Fast, Lightweight and Accurate Method for Cardinality Estimation. Proc. VLDB Endow. 14(9): 1489-1502 (2021) 7. Peizhi Wu, Gao Cong: A Unified Deep Model of Learning from both Data and Queries for Cardinality Estimation. SIGMOD Conference 2021: 2009-2022 8. Ji Sun, Guoliang Li: An End-to-End Learning-based Cost Estimator. Proc. VLDB Endow. 13(3): 307-319 (2019) 9. Tarique Siddiqui et al. : Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings. SIGMOD Conference 2020: 99-113 10. Sanjay Krishnan et al. : Learning to Optimize Join Queries With Deep Reinforcement Learning. CoRR abs/1808.03196 (2018) 11. Xiang Yu, Guoliang Li, Chengliang Chai, Nan Tang: Reinforcement Learning with Tree-LSTM for Join Order Selection. ICDE 2020: 1297-1308
  • 10. RMIT Classification: Trusted Background – Bao1 (Bandit Optimizer) 19/8/21 Group meeting 10 Motivations. • Due to the inaccurate cardinality estimation, wrong physical operators may be selected. • Databases support hints2 to specify some operators. 1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query Optimization Practical. SIGMOD Conference 2021: 1275-1288 2. Here `hint` is not the same with in TiDB or MySQL.
  • 11. RMIT Classification: Trusted Background – Bao1 (Bandit Optimizer) 19/8/21 Group meeting 11 Motivations. • Due to the inaccurate cardinality estimation, wrong physical operators may be selected. • Databases support hints2 to specify some operators. Bao’s Work. • It automatically and adaptively determines the right hint set to use for an incoming query. • Instead of using `cost` in optimizer, users can specify a metric, like running time used in the paper. 1. Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, Tim Kraska: Bao: Making Learned Query Optimization Practical. SIGMOD Conference 2021: 1275-1288 2. Here `hint` is not the same with in TiDB or MySQL. 48 (26) hint sets
  • 12. RMIT Classification: Trusted Background – Bao 19/8/21 Group meeting 12 Method. • Train a predictive model for the metric. • When a query coming, select the plan with the lowest cost under the metric.
  • 13. RMIT Classification: Trusted Background – Bao 19/8/21 Group meeting 13 Method. • Train a predictive model for the metric. • When a query coming, select the plan with the lowest cost under the metric. Prons & Cons. • Prons • Dynamic situations • Integrate with a real system • Training time • Cons • It cannot specify the subplan hint.
  • 14. RMIT Classification: Trusted Steer Scope Optimizer 19/8/21 Group meeting 14
  • 15. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 15 Scope Optimizer. • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion
  • 16. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 16 Scope Optimizer. • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion Default Rules
  • 17. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 17 Scope Optimizer. Workload in Scope. • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion Default Rules
  • 18. RMIT Classification: Trusted Scope Overview 19/8/21 Group meeting 18 Scope Optimizer. Workload in Scope. • Recurrent jobs, same template with different variables. • Short & long running jobs. • 10% of jobs last over 5 min while consume 90% of containers. • Metrics • Runtime • CPU time • Total I/O time • Belong to Cascades family. • 256 rules in total • Required rules, e.g., EnforceExchange • Implementation rules, e.g., HashJoinImpl1 • On-by-default rules, e.g., various rewrite rules • Off-by-default rules, e.g., CorrelatedJoinOnUnion Default Rules
  • 19. RMIT Classification: Trusted Motivations & Goal 19/8/21 Group meeting 19 Motivations. • Due to the inaccurate cardinality estimation, wrong rules may be selected. • Hints to specify which rules to use.
  • 20. RMIT Classification: Trusted Motivations & Goal 19/8/21 Group meeting 20 Motivations. • Due to the inaccurate cardinality estimation, wrong rules may be selected. • Hints to specify which rules to use. Goal. • Output an alternative rule configuration which is better for optimizing this particular job, and for a given metric
  • 21. RMIT Classification: Trusted Motivations & Goal 19/8/21 Group meeting 21 Motivations. • Due to the inaccurate cardinality estimation, wrong rules may be selected. • Hints to specify which rules to use. Relationship with Bao. • Directly apply Bao on Scope? • Hint -> Rule; Hint Set -> Rule configuration • However … • A lot more rules (200+ vs. 6) -> too many possible rule configurations • Large workload -> large running time & hundreds of operator nodes. Goal. • Output an alternative rule configuration which is better for optimizing this particular job, and for a given metric
  • 22. RMIT Classification: Trusted Rule Signature & Job Span 19/8/21 Group meeting 22 Rule Signature. • A bit vector specifying which rules directly contribute to the final query plan produced by the optimizer as the rule signature. • The rule signature of a query optimized using the default rule configuration as the default rule signature. Job Span • Given a job, its span contains all non-required rules which, if enabled or disabled, can affect the final query plan. • Heuristics to generate the span.
  • 23. RMIT Classification: Trusted Which rules to try? 19/8/21 Group meeting 23 • Enable all the rules that are not in the span of the given job. • For each rule category, independently sample a subset of rules from the job span. Disable these rules, and enable all others. This gives us a new rule configuration. • If the rule configuration has not been seen before, add it to the candidate list. Repeat until 𝑀 configurations are generated. 𝑀 = 1000 Randomized Configuration Search.
  • 24. RMIT Classification: Trusted Which jobs to try? 19/8/21 Group meeting 24 Choose Jobs & Configurations to Execute. • Select Jobs. • Jobs with clearly lower costs with recompiled plans under the default cost model. • Jobs with low cost, high runtimes under the default configuration (cost model is wrong). • Select Configurations. • Select the 10 cheapest (cost model) alternative rule configurations and execute them. Workload B (compare to the default configuration)
  • 25. RMIT Classification: Trusted Which jobs to try? 19/8/21 Group meeting 25 Choose Jobs & Configurations to Execute. • Select Jobs. • Jobs with clearly lower costs with recompiled plans under the default cost model. • Jobs with low cost, high runtimes under the default configuration (cost model is wrong). • Select Configurations. • Select the 10 cheapest (cost model) alternative rule configurations and execute them. Workload B (compare to the default configuration)
  • 26. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 26 Other metrics sometime see regression.
  • 27. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 27
  • 28. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 28
  • 29. RMIT Classification: Trusted Different metrics 19/8/21 Group meeting 29 All metrics cannot be improved together. Potentially to adopt different models for each one.
  • 30. RMIT Classification: Trusted Extrapolating to other jobs 19/8/21 Group meeting 30 • The rule signature as the level of granularity across which the same set of rule configurations could be useful. • Rule signature job group • The set of jobs whose default rule signature map to the same bit vector. Idea. Methods. • Case 1: simply apply a previously seen rule configuration. • Case 2: find set of interesting configurations for each job group and adopt a model to choose one at the compile time.
  • 31. RMIT Classification: Trusted Learning Rule Configurations 19/8/21 Group meeting 31 • Select S rule signatures from Workload. • Collect the jobs whose default rule signature maps to these rule signatures. • Obtain K candidate configurations for each job group. • we sample 𝑀 jobs from all the jobs mapping to these job groups. • execute each of the 𝐾 configurations for every job. Training Set. Learning Problem. • Treat the dataset of samples in each job group as an independent learning problem. • Goal is to select one of the 𝐾 candidate configurations for a given query. • Supervised learning to estimate the running time of query under a configuration. Featurization. • Job level features, e.g., input cardinality size, hash of template. • Rule configuration features, e.g., cost of plan, bit vector of RuleDiff. • Query graph features, e.g., operators’ id, cost. Learned Models. • For each job group, a fully connected neural network with one hidden layer of size 1024. (Job, RuleConf, Running Time)
  • 32. RMIT Classification: Trusted 19/8/21 Group meeting 32 Learning Rule Configurations
  • 33. RMIT Classification: Trusted Discussion 19/8/21 Group meeting 33 Future work. • Methods to generate the job span & interesting rule configurations. • Use feedback from the execution results to guide future iterations of the configuration search • Other configurable options in Scope. Discussion. Summary. • How to choose the right rule configuration for an incoming query. • Propose rule signature & job span & several heuristics algos to obtain the candidate rule confs. • Adopt a learning model to choose the rule confs for each job group. • Papers. • Methods. • Model for each group. • Parameters.
  • 34. RMIT Classification: Trusted Q & A 19/8/21 Group meeting 34