SlideShare a Scribd company logo
QUERY OPTIMIZATION
PERTEMUAN 8
IR. NIZIRWAN ANWAR, MT
PROGRAM STUDI TEKNIK INFORMATIKA
FAKULTAS ILMU KOMPUTER
Query Processing in a
DDBMS
high level user query
query
processor
Low-level data manipulation
commands for D-DBMS
Query Processing Components
• Query language that is used
• SQL: “intergalactic dataspeak”
• Query execution methodology
• The steps that one goes through in executing high-level (declarative)
user queries.
• Query optimization
• How do we determine the “best” execution plan?
• We assume a homogeneous D-DBMS
Selecting Alternatives
SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND RESP = "Manager"
Strategy 1
ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))
Strategy 2
 ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))
Strategy 2 avoids Cartesian product, so may be “better”
What is the Problem?
Site 1 Site 2 Site 3 Site 4 Site 5
EMP1= ENO≤“E3”(EMP) EMP2= ENO>“E3”(EMP)
ASG2= ENO>“E3”(ASG)
ASG1=ENO≤“E3”(ASG) Result
Site 5
Site 1 Site 2 Site 3 Site 4
ASG1 EMP1 EMP2
ASG2
Site 4
Site 3
Site 1 Site 2
Site 5
EMP’
1=EMP1 ⋈ENO ASG’
1
'
2
EMP
EMP
result 
 '
1
1
Manager"
"
RESP
1 ASG
σ
ASG 

'
2
Manager"
"
RESP
2 ASG
σ
ASG 

'
'
1
ASG '
2
ASG
'
1
EMP '
2
EMP
result= (EMP1 × EMP2)⋈ENOσRESP=“Manager”(ASG1×
ASG2)
EMP’
2=EMP2 ⋈ENO ASG’
2
Query Optimization Objectives
• Minimize a cost function
I/O cost + CPU cost + communication cost
These might have different weights in different distributed environments
• Wide area networks
• communication cost may dominate or vary much
• bandwidth
• speed
• high protocol overhead
• Local area networks
• communication cost not that dominant
• total cost function should be considered
• Can also maximize throughput
Complexity of Relational Operations
• Assume
• relations of cardinality n
• sequential scan
Operation Complexity
Select
Project
(without duplicate elimination)
O(n)
Project
(with duplicate elimination)
Group
O(n  log n)
Join
Semi-join
Division
Set Operators
O(n  log n)
Cartesian Product O(n2)
Query Optimization Issues – Types Of Optimizers
• Exhaustive search
• Cost-based
• Optimal
• Combinatorial complexity in the number of relations
• Heuristics
• Not optimal
• Regroup common sub-expressions
• Perform selection, projection first
• Replace a join by a series of semijoins
• Reorder operations to reduce intermediate relation size
• Optimize individual operations
Query Optimization Issues – Optimization
Granularity
• Single query at a time
• Cannot use common intermediate results
• Multiple queries at a time
• Efficient if many similar queries
• Decision space is much larger
Query Optimization Issues – Optimization Timing
• Static
• Compilation  optimize prior to the execution
• Difficult to estimate the size of the intermediate results⇒error
propagation
• Can amortize over many executions
• R*
• Dynamic
• Run time optimization
• Exact information on the intermediate relation sizes
• Have to reoptimize for multiple executions
• Distributed INGRES
• Hybrid
• Compile using a static algorithm
• If the error in estimate sizes > threshold, reoptimize at run time
• Mermaid
Query Optimization Issues – Statistics
• Relation
• Cardinality
• Size of a tuple
• Fraction of tuples participating in a join with another relation
• Attribute
• Cardinality of domain
• Actual number of distinct values
• Common assumptions
• Independence between different attribute values
• Uniform distribution of attribute values within their domain
Query Optimization Issues – Decision Sites
• Centralized
• Single site determines the “best” schedule
• Simple
• Need knowledge about the entire distributed database
• Distributed
• Cooperation among sites to determine the schedule
• Need only local information
• Cost of cooperation
• Hybrid
• One site determines the global schedule
• Each site optimizes the local subqueries
Query Optimization Issues – Network Topology
• Wide area networks (WAN) – point-to-point
• Characteristics
• Low bandwidth
• Low speed
• High protocol overhead
• Communication cost will dominate; ignore all other cost factors
• Global schedule to minimize communication cost
• Local schedules according to centralized query optimization
• Local area networks (LAN)
• Communication cost not that dominant
• Total cost function should be considered
• Broadcasting can be exploited (joins)
• Special algorithms exist for star networks
Distributed Query
Processing Methodology
Calculus Query on Distributed
Relations
CONTROL
SITE
LOCAL
SITES
Query
Decomposition
Data
Localization
Algebraic Query on Distributed
Relations
Global
Optimization
Fragment Query
Local
Optimization
Optimized Fragment Query
with Communication Operations
Optimized Local Queries
GLOBAL
SCHEMA
FRAGMENT
SCHEMA
STATS ON
FRAGMENTS
LOCAL
SCHEMAS
Query Optimization Process
Search Space
Generation
Search
Strategy
Equivalent QEP
Input Query
Transformation
Rules
Cost Model
Best QEP

More Related Content

Similar to PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx

Presentation
PresentationPresentation
Presentation
Akul1501
 
Disco workshop
Disco workshopDisco workshop
Disco workshop
spil-engineering
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
Cloudera, Inc.
 
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Soheila Dehghanzadeh
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
NAVER Engineering
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
areej qasrawi
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
ENAR short course
ENAR short courseENAR short course
ENAR short course
Deepak Agarwal
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
DataWorks Summit
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Andrii Vozniuk
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OS
Vedant Mane
 
Query Optimizer Improvements for Apache Derby
Query Optimizer Improvements for Apache DerbyQuery Optimizer Improvements for Apache Derby
Query Optimizer Improvements for Apache Derby
Nadeeshani Hewage
 
Phases of distributed query processing
Phases of distributed query processingPhases of distributed query processing
Phases of distributed query processing
Nevil Dsouza
 
Enterprise Software Development Patterns
Enterprise Software Development PatternsEnterprise Software Development Patterns
Enterprise Software Development Patterns
Josh Lane
 
1. Big Data - Introduction(what is bigdata).pdf
1. Big Data - Introduction(what is bigdata).pdf1. Big Data - Introduction(what is bigdata).pdf
1. Big Data - Introduction(what is bigdata).pdf
AmanCSE050
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
Ahmed Magdy Ezzeldin, MSc.
 
Presentation
PresentationPresentation
Presentation
Peyman Faizian
 
Query Tuning for Database Pros & Developers
Query Tuning for Database Pros & DevelopersQuery Tuning for Database Pros & Developers
Query Tuning for Database Pros & Developers
Code Mastery
 
Performance Tuning with Execution Plans
Performance Tuning with Execution PlansPerformance Tuning with Execution Plans
Performance Tuning with Execution Plans
Grant Fritchey
 

Similar to PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx (20)

Presentation
PresentationPresentation
Presentation
 
Disco workshop
Disco workshopDisco workshop
Disco workshop
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
Predicting Multiple Metrics for Queries: Better Decision Enabled by Machine L...
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...MapReduce:Simplified Data Processing on Large Cluster  Presented by Areej Qas...
MapReduce:Simplified Data Processing on Large Cluster Presented by Areej Qas...
 
Writing Scalable Software in Java
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
ENAR short course
ENAR short courseENAR short course
ENAR short course
 
Apache Tez – Present and Future
Apache Tez – Present and FutureApache Tez – Present and Future
Apache Tez – Present and Future
 
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii VozniukCloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
Cloud infrastructure. Google File System and MapReduce - Andrii Vozniuk
 
Hadoop Map Reduce OS
Hadoop Map Reduce OSHadoop Map Reduce OS
Hadoop Map Reduce OS
 
Query Optimizer Improvements for Apache Derby
Query Optimizer Improvements for Apache DerbyQuery Optimizer Improvements for Apache Derby
Query Optimizer Improvements for Apache Derby
 
Phases of distributed query processing
Phases of distributed query processingPhases of distributed query processing
Phases of distributed query processing
 
Enterprise Software Development Patterns
Enterprise Software Development PatternsEnterprise Software Development Patterns
Enterprise Software Development Patterns
 
1. Big Data - Introduction(what is bigdata).pdf
1. Big Data - Introduction(what is bigdata).pdf1. Big Data - Introduction(what is bigdata).pdf
1. Big Data - Introduction(what is bigdata).pdf
 
Distributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offsDistributed RDBMS: Challenges, Solutions & Trade-offs
Distributed RDBMS: Challenges, Solutions & Trade-offs
 
Presentation
PresentationPresentation
Presentation
 
Query Tuning for Database Pros & Developers
Query Tuning for Database Pros & DevelopersQuery Tuning for Database Pros & Developers
Query Tuning for Database Pros & Developers
 
Performance Tuning with Execution Plans
Performance Tuning with Execution PlansPerformance Tuning with Execution Plans
Performance Tuning with Execution Plans
 

Recently uploaded

按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理
按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理
按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理
uwoso
 
一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理
一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理
一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理
bttak
 
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalRBuilding a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Peter Gallagher
 
欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】
欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】
欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】
hanniaarias53
 
"IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER"
"IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER""IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER"
"IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER"
Emmanuel Onwumere
 
一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
bttak
 
欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】
欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】
欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】
lopezkatherina914
 
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
nvoyobt
 
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
bttak
 
欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】
欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】
欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】
akrooshsaleem36
 

Recently uploaded (10)

按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理
按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理
按照学校原版(UPenn文凭证书)宾夕法尼亚大学毕业证快速办理
 
一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理
一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理
一比一原版西三一大学毕业证(TWU毕业证书)学历如何办理
 
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalRBuilding a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
Building a Raspberry Pi Robot with Dot NET 8, Blazor and SignalR
 
欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】
欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】
欧洲杯赌钱-欧洲杯赌钱冠军-欧洲杯赌钱冠军赔率|【​网址​🎉ac10.net🎉​】
 
"IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER"
"IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER""IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER"
"IOS 18 CONTROL CENTRE REVAMP STREAMLINED IPHONE SHUTDOWN MADE EASIER"
 
一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版不列颠哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】
欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】
欧洲杯体彩-欧洲杯体彩比赛投注-欧洲杯体彩比赛投注官网|【​网址​🎉ac99.net🎉​】
 
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
买(usyd毕业证书)澳洲悉尼大学毕业证研究生文凭证书原版一模一样
 
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
一比一原版圣托马斯大学毕业证(UST毕业证书)学历如何办理
 
欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】
欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】
欧洲杯投注-欧洲杯投注押注app-欧洲杯投注押注app官网|【​网址​🎉ac10.net🎉​】
 

PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx

  • 1. QUERY OPTIMIZATION PERTEMUAN 8 IR. NIZIRWAN ANWAR, MT PROGRAM STUDI TEKNIK INFORMATIKA FAKULTAS ILMU KOMPUTER
  • 2. Query Processing in a DDBMS high level user query query processor Low-level data manipulation commands for D-DBMS
  • 3. Query Processing Components • Query language that is used • SQL: “intergalactic dataspeak” • Query execution methodology • The steps that one goes through in executing high-level (declarative) user queries. • Query optimization • How do we determine the “best” execution plan? • We assume a homogeneous D-DBMS
  • 4. Selecting Alternatives SELECT ENAME FROM EMP,ASG WHERE EMP.ENO = ASG.ENO AND RESP = "Manager" Strategy 1 ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG)) Strategy 2  ENAME(EMP ⋈ENO (RESP=“Manager” (ASG)) Strategy 2 avoids Cartesian product, so may be “better”
  • 5. What is the Problem? Site 1 Site 2 Site 3 Site 4 Site 5 EMP1= ENO≤“E3”(EMP) EMP2= ENO>“E3”(EMP) ASG2= ENO>“E3”(ASG) ASG1=ENO≤“E3”(ASG) Result Site 5 Site 1 Site 2 Site 3 Site 4 ASG1 EMP1 EMP2 ASG2 Site 4 Site 3 Site 1 Site 2 Site 5 EMP’ 1=EMP1 ⋈ENO ASG’ 1 ' 2 EMP EMP result   ' 1 1 Manager" " RESP 1 ASG σ ASG   ' 2 Manager" " RESP 2 ASG σ ASG   ' ' 1 ASG ' 2 ASG ' 1 EMP ' 2 EMP result= (EMP1 × EMP2)⋈ENOσRESP=“Manager”(ASG1× ASG2) EMP’ 2=EMP2 ⋈ENO ASG’ 2
  • 6. Query Optimization Objectives • Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments • Wide area networks • communication cost may dominate or vary much • bandwidth • speed • high protocol overhead • Local area networks • communication cost not that dominant • total cost function should be considered • Can also maximize throughput
  • 7. Complexity of Relational Operations • Assume • relations of cardinality n • sequential scan Operation Complexity Select Project (without duplicate elimination) O(n) Project (with duplicate elimination) Group O(n  log n) Join Semi-join Division Set Operators O(n  log n) Cartesian Product O(n2)
  • 8. Query Optimization Issues – Types Of Optimizers • Exhaustive search • Cost-based • Optimal • Combinatorial complexity in the number of relations • Heuristics • Not optimal • Regroup common sub-expressions • Perform selection, projection first • Replace a join by a series of semijoins • Reorder operations to reduce intermediate relation size • Optimize individual operations
  • 9. Query Optimization Issues – Optimization Granularity • Single query at a time • Cannot use common intermediate results • Multiple queries at a time • Efficient if many similar queries • Decision space is much larger
  • 10. Query Optimization Issues – Optimization Timing • Static • Compilation  optimize prior to the execution • Difficult to estimate the size of the intermediate results⇒error propagation • Can amortize over many executions • R* • Dynamic • Run time optimization • Exact information on the intermediate relation sizes • Have to reoptimize for multiple executions • Distributed INGRES • Hybrid • Compile using a static algorithm • If the error in estimate sizes > threshold, reoptimize at run time • Mermaid
  • 11. Query Optimization Issues – Statistics • Relation • Cardinality • Size of a tuple • Fraction of tuples participating in a join with another relation • Attribute • Cardinality of domain • Actual number of distinct values • Common assumptions • Independence between different attribute values • Uniform distribution of attribute values within their domain
  • 12. Query Optimization Issues – Decision Sites • Centralized • Single site determines the “best” schedule • Simple • Need knowledge about the entire distributed database • Distributed • Cooperation among sites to determine the schedule • Need only local information • Cost of cooperation • Hybrid • One site determines the global schedule • Each site optimizes the local subqueries
  • 13. Query Optimization Issues – Network Topology • Wide area networks (WAN) – point-to-point • Characteristics • Low bandwidth • Low speed • High protocol overhead • Communication cost will dominate; ignore all other cost factors • Global schedule to minimize communication cost • Local schedules according to centralized query optimization • Local area networks (LAN) • Communication cost not that dominant • Total cost function should be considered • Broadcasting can be exploited (joins) • Special algorithms exist for star networks
  • 14. Distributed Query Processing Methodology Calculus Query on Distributed Relations CONTROL SITE LOCAL SITES Query Decomposition Data Localization Algebraic Query on Distributed Relations Global Optimization Fragment Query Local Optimization Optimized Fragment Query with Communication Operations Optimized Local Queries GLOBAL SCHEMA FRAGMENT SCHEMA STATS ON FRAGMENTS LOCAL SCHEMAS
  • 15. Query Optimization Process Search Space Generation Search Strategy Equivalent QEP Input Query Transformation Rules Cost Model Best QEP