SlideShare a Scribd company logo
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/1
Outline
• Introduction
• Background
• Distributed Database Design
• Database Integration
• Semantic Data Control
• Distributed Query Processing
➡ Overview
➡ Query decomposition and localization
➡ Distributed query optimization
• Multidatabase Query Processing
• Distributed Transaction Management
• Data Replication
• Parallel Database Systems
• Distributed Object DBMS
• Peer-to-Peer Data Management
• Web Data Management
• Current Issues
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/2
Query Processing in a DDBMS
high level user query
query
processor
Low-level data manipulation
commands for D-DBMS
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/3
Query Processing Components
•Query language that is used
➡ SQL: “intergalactic dataspeak”
•Query execution methodology
➡ The steps that one goes through in executing high-level (declarative) user
queries.
•Query optimization
➡ How do we determine the “best” execution plan?
•We assume a homogeneous D-DBMS
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/4
SELECT ENAME
FROM EMP,ASG
WHERE EMP.ENO = ASG.ENO
AND RESP = "Manager"
Strategy 1
ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG))
Strategy 2
 ENAME(EMP ⋈ENO (RESP=“Manager” (ASG))
Strategy 2 avoids Cartesian product, so may be “better”
Selecting Alternatives
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/5
What is the Problem?
Site 1 Site 2 Site 3 Site 4 Site 5
EMP1= ENO≤“E3”(EMP) EMP2= ENO>“E3”(EMP)
ASG2= ENO>“E3”(ASG)
ASG1=ENO≤“E3”(ASG) Result
Site 5
Site 1 Site 2 Site 3 Site 4
ASG1 EMP1 EMP2
ASG2
Site 4
Site 3
Site 1 Site 2
Site 5
EMP’
1=EMP1 ⋈ENO ASG’
1
'
2
EMP
EMP
result 
= '
1
1
Manager"
"
RESP
1 ASG
σ
ASG =
=
'
2
Manager"
"
RESP
2 ASG
σ
ASG =
=
'
'
1
ASG '
2
ASG
'
1
EMP '
2
EMP
result= (EMP1 × EMP2)⋈ENOσRESP=“Manager”(ASG1× ASG2)
EMP’
2=EMP2 ⋈ENO ASG’
2
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/6
Cost of Alternatives
•Assume
➡ size(EMP) = 400, size(ASG) = 1000
➡ tuple access cost = 1 unit; tuple transfer cost = 10 units
•Strategy 1
➡ produce ASG': (10+10)  tuple access cost 20
➡ transfer ASG' to the sites of EMP: (10+10)  tuple transfer cost 200
➡ produce EMP': (10+10)  tuple access cost  2 40
➡ transfer EMP' to result site: (10+10)  tuple transfer cost 200
Total Cost 460
•Strategy 2
➡ transfer EMP to site 5: 400  tuple transfer cost 4,000
➡ transfer ASG to site 5: 1000  tuple transfer cost 10,000
➡ produce ASG': 1000  tuple access cost 1,000
➡ join EMP and ASG': 400  20  tuple access cost 8,000
Total Cost 23,000
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/7
Query Optimization Objectives
•Minimize a cost function
I/O cost + CPU cost + communication cost
These might have different weights in different distributed environments
•Wide area networks
➡ communication cost may dominate or vary much
✦ bandwidth
✦ speed
✦ high protocol overhead
•Local area networks
➡ communication cost not that dominant
➡ total cost function should be considered
•Can also maximize throughput
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/8
Complexity of Relational Operations
•Assume
➡ relations of cardinality n
➡ sequential scan
Operation Complexity
Select
Project
(without duplicate elimination)
O(n)
Project
(with duplicate elimination)
Group
O(n  log n)
Join
Semi-join
Division
Set Operators
O(n  log n)
Cartesian Product O(n2)
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/9
Query Optimization Issues – Types
Of Optimizers
• Exhaustive search
➡ Cost-based
➡ Optimal
➡ Combinatorial complexity in the number of relations
• Heuristics
➡ Not optimal
➡ Regroup common sub-expressions
➡ Perform selection, projection first
➡ Replace a join by a series of semijoins
➡ Reorder operations to reduce intermediate relation size
➡ Optimize individual operations
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/10
Query Optimization Issues –
Optimization Granularity
• Single query at a time
➡ Cannot use common intermediate results
• Multiple queries at a time
➡ Efficient if many similar queries
➡ Decision space is much larger
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/11
Query Optimization Issues –
Optimization Timing
•Static
➡ Compilation ➔ optimize prior to the execution
➡ Difficult to estimate the size of the intermediate resultserror
propagation
➡ Can amortize over many executions
➡ R*
•Dynamic
➡ Run time optimization
➡ Exact information on the intermediate relation sizes
➡ Have to reoptimize for multiple executions
➡ Distributed INGRES
•Hybrid
➡ Compile using a static algorithm
➡ If the error in estimate sizes > threshold, reoptimize at run time
➡ Mermaid
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/12
Query Optimization Issues –
Statistics
•Relation
➡ Cardinality
➡ Size of a tuple
➡ Fraction of tuples participating in a join with another relation
•Attribute
➡ Cardinality of domain
➡ Actual number of distinct values
•Common assumptions
➡ Independence between different attribute values
➡ Uniform distribution of attribute values within their domain
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/13
Query Optimization Issues –
Decision Sites
•Centralized
➡ Single site determines the “best” schedule
➡ Simple
➡ Need knowledge about the entire distributed database
•Distributed
➡ Cooperation among sites to determine the schedule
➡ Need only local information
➡ Cost of cooperation
•Hybrid
➡ One site determines the global schedule
➡ Each site optimizes the local subqueries
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/14
Query Optimization Issues –
Network Topology
• Wide area networks (WAN) – point-to-point
➡ Characteristics
✦ Low bandwidth
✦ Low speed
✦ High protocol overhead
➡ Communication cost will dominate; ignore all other cost factors
➡ Global schedule to minimize communication cost
➡ Local schedules according to centralized query optimization
• Local area networks (LAN)
➡ Communication cost not that dominant
➡ Total cost function should be considered
➡ Broadcasting can be exploited (joins)
➡ Special algorithms exist for star networks
Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/15
Distributed Query Processing
Methodology
Calculus Query on Distributed Relations
CONTROL
SITE
LOCAL
SITES
Query
Decomposition
Data
Localization
AlgebraicQuery on Distributed
Relations
Global
Optimization
Fragment Query
Local
Optimization
Optimized Fragment Query
with CommunicationOperations
Optimized Local Queries
GLOBAL
SCHEMA
FRAGMENT
SCHEMA
STATS ON
FRAGMENTS
LOCAL
SCHEMAS

More Related Content

Similar to 6-Query_Intro (5).pdf

Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
J Singh
 
HFM vs Essbase BSO: A Comparative Anatomy
HFM vs Essbase BSO: A Comparative AnatomyHFM vs Essbase BSO: A Comparative Anatomy
HFM vs Essbase BSO: A Comparative Anatomy
aa026593
 
Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data IntegrationAli Usman
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarKognitio
 
1 introduction DDBS
1 introduction DDBS1 introduction DDBS
1 introduction DDBS
naimanighat
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 IntroductionAli Usman
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Sumeet Singh
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
DataWorks Summit
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
Mukesh Singh
 
Database , 17 Web
Database , 17 WebDatabase , 17 Web
Database , 17 WebAli Usman
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 ReplicationAli Usman
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
John Wood
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
DataStax Academy
 
try
trytry
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
Kumari Surabhi
 
Tools and best practices for sustainable software
Tools and best practices for sustainable softwareTools and best practices for sustainable software
Tools and best practices for sustainable software
Green Software Development
 
Tools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdfTools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdf
GeorgMolz
 
Tools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdfTools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdf
GeorgMolz
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Caserta
 
Manjeet Singh.pptx
Manjeet Singh.pptxManjeet Singh.pptx
Manjeet Singh.pptx
RAMCHANDRASHARMA7
 

Similar to 6-Query_Intro (5).pdf (20)

Designing analytics for big data
Designing analytics for big dataDesigning analytics for big data
Designing analytics for big data
 
HFM vs Essbase BSO: A Comparative Anatomy
HFM vs Essbase BSO: A Comparative AnatomyHFM vs Essbase BSO: A Comparative Anatomy
HFM vs Essbase BSO: A Comparative Anatomy
 
Database , 4 Data Integration
Database , 4 Data IntegrationDatabase , 4 Data Integration
Database , 4 Data Integration
 
Meta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinarMeta scale kognitio hadoop webinar
Meta scale kognitio hadoop webinar
 
1 introduction DDBS
1 introduction DDBS1 introduction DDBS
1 introduction DDBS
 
Database , 1 Introduction
 Database , 1 Introduction Database , 1 Introduction
Database , 1 Introduction
 
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
Hadoop Summit Brussels 2015: Architecting a Scalable Hadoop Platform - Top 10...
 
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for successArchitecting a Scalable Hadoop Platform: Top 10 considerations for success
Architecting a Scalable Hadoop Platform: Top 10 considerations for success
 
Ledingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lkLedingkart Meetup #4: Data pipeline @ lk
Ledingkart Meetup #4: Data pipeline @ lk
 
Database , 17 Web
Database , 17 WebDatabase , 17 Web
Database , 17 Web
 
Database , 13 Replication
Database , 13 ReplicationDatabase , 13 Replication
Database , 13 Replication
 
Polyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great TogetherPolyglot Persistence - Two Great Tastes That Taste Great Together
Polyglot Persistence - Two Great Tastes That Taste Great Together
 
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra MigrationInfosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
Infosys Ltd: Performance Tuning - A Key to Successful Cassandra Migration
 
try
trytry
try
 
BigData Hadoop
BigData Hadoop BigData Hadoop
BigData Hadoop
 
Tools and best practices for sustainable software
Tools and best practices for sustainable softwareTools and best practices for sustainable software
Tools and best practices for sustainable software
 
Tools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdfTools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdf
 
Tools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdfTools and best practices for sustainable software.pdf
Tools and best practices for sustainable software.pdf
 
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
Stinger Initiative: Leveraging Hive & Yarn for High-Performance/Interactive Q...
 
Manjeet Singh.pptx
Manjeet Singh.pptxManjeet Singh.pptx
Manjeet Singh.pptx
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
manishkhaire30
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
vikram sood
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Learn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queriesLearn SQL from basic queries to Advance queries
Learn SQL from basic queries to Advance queries
 
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
Global Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headedGlobal Situational Awareness of A.I. and where its headed
Global Situational Awareness of A.I. and where its headed
 

6-Query_Intro (5).pdf

  • 1. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/1 Outline • Introduction • Background • Distributed Database Design • Database Integration • Semantic Data Control • Distributed Query Processing ➡ Overview ➡ Query decomposition and localization ➡ Distributed query optimization • Multidatabase Query Processing • Distributed Transaction Management • Data Replication • Parallel Database Systems • Distributed Object DBMS • Peer-to-Peer Data Management • Web Data Management • Current Issues
  • 2. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/2 Query Processing in a DDBMS high level user query query processor Low-level data manipulation commands for D-DBMS
  • 3. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/3 Query Processing Components •Query language that is used ➡ SQL: “intergalactic dataspeak” •Query execution methodology ➡ The steps that one goes through in executing high-level (declarative) user queries. •Query optimization ➡ How do we determine the “best” execution plan? •We assume a homogeneous D-DBMS
  • 4. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/4 SELECT ENAME FROM EMP,ASG WHERE EMP.ENO = ASG.ENO AND RESP = "Manager" Strategy 1 ENAME(RESP=“Manager”EMP.ENO=ASG.ENO(EMP×ASG)) Strategy 2  ENAME(EMP ⋈ENO (RESP=“Manager” (ASG)) Strategy 2 avoids Cartesian product, so may be “better” Selecting Alternatives
  • 5. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/5 What is the Problem? Site 1 Site 2 Site 3 Site 4 Site 5 EMP1= ENO≤“E3”(EMP) EMP2= ENO>“E3”(EMP) ASG2= ENO>“E3”(ASG) ASG1=ENO≤“E3”(ASG) Result Site 5 Site 1 Site 2 Site 3 Site 4 ASG1 EMP1 EMP2 ASG2 Site 4 Site 3 Site 1 Site 2 Site 5 EMP’ 1=EMP1 ⋈ENO ASG’ 1 ' 2 EMP EMP result  = ' 1 1 Manager" " RESP 1 ASG σ ASG = = ' 2 Manager" " RESP 2 ASG σ ASG = = ' ' 1 ASG ' 2 ASG ' 1 EMP ' 2 EMP result= (EMP1 × EMP2)⋈ENOσRESP=“Manager”(ASG1× ASG2) EMP’ 2=EMP2 ⋈ENO ASG’ 2
  • 6. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/6 Cost of Alternatives •Assume ➡ size(EMP) = 400, size(ASG) = 1000 ➡ tuple access cost = 1 unit; tuple transfer cost = 10 units •Strategy 1 ➡ produce ASG': (10+10)  tuple access cost 20 ➡ transfer ASG' to the sites of EMP: (10+10)  tuple transfer cost 200 ➡ produce EMP': (10+10)  tuple access cost  2 40 ➡ transfer EMP' to result site: (10+10)  tuple transfer cost 200 Total Cost 460 •Strategy 2 ➡ transfer EMP to site 5: 400  tuple transfer cost 4,000 ➡ transfer ASG to site 5: 1000  tuple transfer cost 10,000 ➡ produce ASG': 1000  tuple access cost 1,000 ➡ join EMP and ASG': 400  20  tuple access cost 8,000 Total Cost 23,000
  • 7. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/7 Query Optimization Objectives •Minimize a cost function I/O cost + CPU cost + communication cost These might have different weights in different distributed environments •Wide area networks ➡ communication cost may dominate or vary much ✦ bandwidth ✦ speed ✦ high protocol overhead •Local area networks ➡ communication cost not that dominant ➡ total cost function should be considered •Can also maximize throughput
  • 8. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/8 Complexity of Relational Operations •Assume ➡ relations of cardinality n ➡ sequential scan Operation Complexity Select Project (without duplicate elimination) O(n) Project (with duplicate elimination) Group O(n  log n) Join Semi-join Division Set Operators O(n  log n) Cartesian Product O(n2)
  • 9. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/9 Query Optimization Issues – Types Of Optimizers • Exhaustive search ➡ Cost-based ➡ Optimal ➡ Combinatorial complexity in the number of relations • Heuristics ➡ Not optimal ➡ Regroup common sub-expressions ➡ Perform selection, projection first ➡ Replace a join by a series of semijoins ➡ Reorder operations to reduce intermediate relation size ➡ Optimize individual operations
  • 10. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/10 Query Optimization Issues – Optimization Granularity • Single query at a time ➡ Cannot use common intermediate results • Multiple queries at a time ➡ Efficient if many similar queries ➡ Decision space is much larger
  • 11. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/11 Query Optimization Issues – Optimization Timing •Static ➡ Compilation ➔ optimize prior to the execution ➡ Difficult to estimate the size of the intermediate resultserror propagation ➡ Can amortize over many executions ➡ R* •Dynamic ➡ Run time optimization ➡ Exact information on the intermediate relation sizes ➡ Have to reoptimize for multiple executions ➡ Distributed INGRES •Hybrid ➡ Compile using a static algorithm ➡ If the error in estimate sizes > threshold, reoptimize at run time ➡ Mermaid
  • 12. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/12 Query Optimization Issues – Statistics •Relation ➡ Cardinality ➡ Size of a tuple ➡ Fraction of tuples participating in a join with another relation •Attribute ➡ Cardinality of domain ➡ Actual number of distinct values •Common assumptions ➡ Independence between different attribute values ➡ Uniform distribution of attribute values within their domain
  • 13. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/13 Query Optimization Issues – Decision Sites •Centralized ➡ Single site determines the “best” schedule ➡ Simple ➡ Need knowledge about the entire distributed database •Distributed ➡ Cooperation among sites to determine the schedule ➡ Need only local information ➡ Cost of cooperation •Hybrid ➡ One site determines the global schedule ➡ Each site optimizes the local subqueries
  • 14. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/14 Query Optimization Issues – Network Topology • Wide area networks (WAN) – point-to-point ➡ Characteristics ✦ Low bandwidth ✦ Low speed ✦ High protocol overhead ➡ Communication cost will dominate; ignore all other cost factors ➡ Global schedule to minimize communication cost ➡ Local schedules according to centralized query optimization • Local area networks (LAN) ➡ Communication cost not that dominant ➡ Total cost function should be considered ➡ Broadcasting can be exploited (joins) ➡ Special algorithms exist for star networks
  • 15. Distributed DBMS © M. T. Özsu & P. Valduriez Ch.6/15 Distributed Query Processing Methodology Calculus Query on Distributed Relations CONTROL SITE LOCAL SITES Query Decomposition Data Localization AlgebraicQuery on Distributed Relations Global Optimization Fragment Query Local Optimization Optimized Fragment Query with CommunicationOperations Optimized Local Queries GLOBAL SCHEMA FRAGMENT SCHEMA STATS ON FRAGMENTS LOCAL SCHEMAS