SlideShare a Scribd company logo
Join Ordering in
Fragment Queries
By Shehab Uddin and Ifzal Hussain
Join Ordering in Fragment Queries
• Join ordering is important in centralized DB, and is more important in
distributed DB.
Join Ordering in Fragment Queries (cont.)
• R  site j: “relation R is transferred to site j”
• 1. EMP  site 2; site 2 computes EMP’
• EMP’->site 3; site 3 computes the result.
• 2.ASG->site 1: site 1 computes EMP’, EMP’->site
3; site 3 computes the result
• 3. ASG->site 3; computeASG’;ASG’->site 1
• 4. PROJ->site 2; compute PROJ’; PROJ’->site 1
• 5. EMP->site 2; PROJ->site 2; site 2 compute the
join.
Join Ordering in Fragment Queries (cont.)
• Join ordering
• Distributed INGRES
• System R*
• Semijoin ordering
• SDD-1
Join Ordering
• Consider two relations only
• R ⋈ S
• Transfer the smaller size
• Multiple relations more difficult because too many alternatives
• Compute the cost of all alternatives and select the best one
• Necessary to compute the size of intermediate relations which is difficult.
• Use heuristics
Join Ordering - Example
• Consider: PROJ ⋈PNO ASG ⋈ENO EMP
Join Ordering – Example (cont.)
• Execution alternatives:
• 1. EMP  Site 2
• Site 2 computes EMP’=EMP⋈ASG
• EMP’  Site 3
• Site 3 computes EMP’⋈PROJ
• 2.ASG  Site 1
• Site 1 computes EMP’=EMP⋈ASG
• EMP’  Site 3
• Site 3 computes EMP’⋈PROJ
Join Ordering – Example (cont.)
3. ASG  Site 3
Site 3 computes ASG’=ASG⋈PROJ
ASG’  Site 1
Site 1 computes ASG’⋈EMP
4. PROJ  Site 2
Site 2 computes PROJ’=PROJ⋈ASG
PROJ’  Site 1
Site 1 computes PROJ’ ⋈ EMP
cont,d
5. EMP  Site 2
PROJ  Site 2
Site 2 computes EMP⋈ PROJ⋈ASG
Semijoin Algorithms
• Shortcoming of the joining method
• Transfer the entire relation which may contain some useless tuples
• Semi-join reduces the size of operand relation to be transferred
• Semi-join is beneficial if the cost to produce and send to the other site is less than
sending the whole relation.
Semijoin Algorithms (cont.)
• Consider the join of two relations
• R[A] (located at site 1)
• S[A] (located at site 2)
• Alternatives
• 1. Do the join R ⋈A S
• 2. Perform one of the semijoin equivalents
( ) ( )
( ) ( )
A A A A A
A A A
R S R S S R S R
R S S R
   
  
Cnt,d
• Perform the join
• Send R to site 2
• Site 2 computes R ⋈A S
• Consider semijoin
• S’ = A(S)
• S’  Site 1
• Site 1 computes
• R’  Site 2
• Site 2 computes
• Semijoin is better if
( )
A A
R S S

' '
A
R R S
 
' A
R S
( ( ( )) ( )) ( )
A A
size S size R S size R
   
Join ordering in fragment queries

More Related Content

What's hot

data replication
data replicationdata replication
data replication
Hassanein Alwan
 
Centralised and distributed database
Centralised and distributed databaseCentralised and distributed database
Centralised and distributed database
Santosh Singh
 
Methods for handling deadlocks
Methods for handling deadlocksMethods for handling deadlocks
Methods for handling deadlocks
A. S. M. Shafi
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
Dabbal Singh Mahara
 
Segmentation in Operating Systems.
Segmentation in Operating Systems.Segmentation in Operating Systems.
Segmentation in Operating Systems.
Muhammad SiRaj Munir
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
Ravinder Kamboj
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
Sunita Sahu
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
Hafiz faiz
 
Query processing
Query processingQuery processing
Query processing
Ravinder Kamboj
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
Md. Mahedi Mahfuj
 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed System
Ashish KC
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
Sulemang
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
Sunita Sahu
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
Shashank Kapoor
 
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS ArchitectureDistributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Gyanmanjari Institute Of Technology
 
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & RecoveryDistributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Gyanmanjari Institute Of Technology
 
Database replication
Database replicationDatabase replication
Database replication
Arslan111
 
4. system models
4. system models4. system models
4. system models
AbDul ThaYyal
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
Meghaj Mallick
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
Ritu Ranjan Shrivastwa
 

What's hot (20)

data replication
data replicationdata replication
data replication
 
Centralised and distributed database
Centralised and distributed databaseCentralised and distributed database
Centralised and distributed database
 
Methods for handling deadlocks
Methods for handling deadlocksMethods for handling deadlocks
Methods for handling deadlocks
 
Temporal databases
Temporal databasesTemporal databases
Temporal databases
 
Segmentation in Operating Systems.
Segmentation in Operating Systems.Segmentation in Operating Systems.
Segmentation in Operating Systems.
 
Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 
Query Decomposition and data localization
Query Decomposition and data localization Query Decomposition and data localization
Query Decomposition and data localization
 
Query processing
Query processingQuery processing
Query processing
 
Distributed deadlock
Distributed deadlockDistributed deadlock
Distributed deadlock
 
Design Goals of Distributed System
Design Goals of Distributed SystemDesign Goals of Distributed System
Design Goals of Distributed System
 
Distributed Database System
Distributed Database SystemDistributed Database System
Distributed Database System
 
RPC: Remote procedure call
RPC: Remote procedure callRPC: Remote procedure call
RPC: Remote procedure call
 
Multi processor scheduling
Multi  processor schedulingMulti  processor scheduling
Multi processor scheduling
 
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS ArchitectureDistributed DBMS - Unit 3 - Distributed DBMS Architecture
Distributed DBMS - Unit 3 - Distributed DBMS Architecture
 
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & RecoveryDistributed DBMS - Unit 9 - Distributed Deadlock & Recovery
Distributed DBMS - Unit 9 - Distributed Deadlock & Recovery
 
Database replication
Database replicationDatabase replication
Database replication
 
4. system models
4. system models4. system models
4. system models
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 

Recently uploaded

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
Matthew Sinclair
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Zilliz
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 
20240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 202420240609 QFM020 Irresponsible AI Reading List May 2024
20240609 QFM020 Irresponsible AI Reading List May 2024
 
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
Introducing Milvus Lite: Easy-to-Install, Easy-to-Use vector database for you...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

Join ordering in fragment queries

  • 1. Join Ordering in Fragment Queries By Shehab Uddin and Ifzal Hussain
  • 2. Join Ordering in Fragment Queries • Join ordering is important in centralized DB, and is more important in distributed DB.
  • 3. Join Ordering in Fragment Queries (cont.) • R  site j: “relation R is transferred to site j” • 1. EMP  site 2; site 2 computes EMP’ • EMP’->site 3; site 3 computes the result. • 2.ASG->site 1: site 1 computes EMP’, EMP’->site 3; site 3 computes the result • 3. ASG->site 3; computeASG’;ASG’->site 1 • 4. PROJ->site 2; compute PROJ’; PROJ’->site 1 • 5. EMP->site 2; PROJ->site 2; site 2 compute the join.
  • 4. Join Ordering in Fragment Queries (cont.) • Join ordering • Distributed INGRES • System R* • Semijoin ordering • SDD-1
  • 5. Join Ordering • Consider two relations only • R ⋈ S • Transfer the smaller size • Multiple relations more difficult because too many alternatives • Compute the cost of all alternatives and select the best one • Necessary to compute the size of intermediate relations which is difficult. • Use heuristics
  • 6. Join Ordering - Example • Consider: PROJ ⋈PNO ASG ⋈ENO EMP
  • 7. Join Ordering – Example (cont.) • Execution alternatives: • 1. EMP  Site 2 • Site 2 computes EMP’=EMP⋈ASG • EMP’  Site 3 • Site 3 computes EMP’⋈PROJ • 2.ASG  Site 1 • Site 1 computes EMP’=EMP⋈ASG • EMP’  Site 3 • Site 3 computes EMP’⋈PROJ
  • 8. Join Ordering – Example (cont.) 3. ASG  Site 3 Site 3 computes ASG’=ASG⋈PROJ ASG’  Site 1 Site 1 computes ASG’⋈EMP 4. PROJ  Site 2 Site 2 computes PROJ’=PROJ⋈ASG PROJ’  Site 1 Site 1 computes PROJ’ ⋈ EMP
  • 9. cont,d 5. EMP  Site 2 PROJ  Site 2 Site 2 computes EMP⋈ PROJ⋈ASG
  • 10. Semijoin Algorithms • Shortcoming of the joining method • Transfer the entire relation which may contain some useless tuples • Semi-join reduces the size of operand relation to be transferred • Semi-join is beneficial if the cost to produce and send to the other site is less than sending the whole relation.
  • 11. Semijoin Algorithms (cont.) • Consider the join of two relations • R[A] (located at site 1) • S[A] (located at site 2) • Alternatives • 1. Do the join R ⋈A S • 2. Perform one of the semijoin equivalents ( ) ( ) ( ) ( ) A A A A A A A A R S R S S R S R R S S R       
  • 12. Cnt,d • Perform the join • Send R to site 2 • Site 2 computes R ⋈A S • Consider semijoin • S’ = A(S) • S’  Site 1 • Site 1 computes • R’  Site 2 • Site 2 computes • Semijoin is better if ( ) A A R S S  ' ' A R R S   ' A R S ( ( ( )) ( )) ( ) A A size S size R S size R    