SlideShare a Scribd company logo
1 of 23
Load Balancing in
Distributed Database
Md. Shamsur Rahim 14-98181-3 Student, MScCS, AIUB
AZM Ehtesham Chowdhury 15-98451-1 Student, MScCS, AIUB
Saiful Akhter 15-98502-1 Student, MScCS, AIUB
Load Balancing:
 Means distributing transaction and queries among different nodes.
 The goal is to maximize the throughput.
 Parallel Execution Problems
 1. Initialization
 2. Interference
 3. Skew
Parallel Execution Problems : Initialization
 Initialization is necessary before execution.
 This sequential steps includes
 Process/ Thread Creation and initialization
 Communication Initialization etc.
 The duration is proportional to the degree of parallelism
 The degree of parallelism should be fixed according to query complexity.
 Formula for finding response time for an Operator:
𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒𝑇𝑖𝑚𝑒 = 𝑎 ∗ 𝑛 +
𝑐∗𝑁
𝑛
 The equation can be further derived to obtain:
𝑁 = 𝑡𝑜𝑢𝑝𝑙𝑒𝑠, 𝑐 = 𝑎𝑣𝑔 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑡𝑖𝑚𝑒
n = No. Of Processors
optimal number of processors to allocate (n) maximal achievable speedup (S)
𝑛 = √
𝑐 ∗ 𝑁
𝑎
𝑆 =
𝑛
2
Parallel Execution Problems : Interferences
 Parallel execution can be slowed down by interference.
 Interference occurs when several processors simultaneously access the same
resource,
 Hardware
 Solution: Duplicate Shared resource
 Software.
 Solution: Partition the shared resource into several independent resources
Parallel Execution Problems : Skew
 Problem appears with intra- operator parallelism (variation in partition size) is known as data
skew.
 Classification of Skew:
 Attribute Value Skew : inherent in the dataset
 e.g., there are more citizens in Paris than in Waterloo
 Tuple Placement Skew: introduced when the data are initially partitioned
 e.g., with range partitioning
 Selectivity Skew
 introduced when there is variation in the selectivity of select predicates on each node
 Redistribution Skew
 occurs in the redistribution step between two operators.
 Join Product Skew
 occurs because the join selectivity may vary between nodes
Inter-Query Parallelism
 Form of parallelism where many different Queries or Transactions are
executed in parallel with one another on many processors.
 Advantages:
 Increases Transaction Throughput.
 Scales up the Transaction processing system
 Easy to implement in Shared Memory Parallel System.
 Example: Oracle 8 & Oracle Rdb.
Intra-Query Parallelism
 Form of parallelism where Single Query is executed in parallel on many
processors.
 2 Types.
 Intra-operation parallelism
 Inter-operation parallelism
 Advantages:
 speed up a single complex long running queries.
 Best suited for complex scientific calculations (queries).
 Example: Informix, Terradata.
Intra-operation parallelism
 The process of speeding up a query through parallelizing the execution of
individual operations.
 The operations which can be parallelized are Sort, Join, Projection, Selection
and so on.
Inter-operation parallelism
 The process of speeding up a query through parallelizing various operations
which are part of the query.
 Example Step:
 A query which involves join of 4 tables executed in two processors
 Each processor shall join two relations locally and the result1 and result2 can be
joined further to produce the final result.
Intra-Operator Load Balancing
 Depends on
 The degree of parallelism.
 Allocation of processors for the operator.
 The home of the operator (the set of processors where it is executed) must be
carefully decided.
 The skew problem makes it hard for a parallel query optimizer to make this
decision statically.
 Require a very accurate and detailed cost model.
 Two Solutions incorporated in a hybrid query optimizer.
 Adaptive
 Specialized
Adaptive Technique
 The main idea is to statically decide on an initial allocation of the
processors to the operator (using a cost model).
 Adapt to skew using load reallocation.
 Load reallocation is to detect the oversized partitions.
 Partition them again onto several processors.
Adaptive Technique(Continued)
 Advantage:
 More dynamic adjustment of the degree of parallelism.
 useful to improve intra-operator load balancing in all kinds of parallel
architectures.
 By reducing processor interference
 Excellent load balancing for intra-operator parallelism
Adaptive Technique(Continued)
 specific control operators.
 Detect whether the static estimates for intermediate result sizes differ from
the run-time values.
 Relation redistribution in order to prevent join product skew and
redistribution skew.
 Depends on difference between the estimate and the real value is sufficiently
high.
Specialized techniques
 Two main techniques.
 Range partitioning
 Sampling
 Avoid redistribution skew of the building relation.
 Processors can get partitions of equal numbers of tuples, corresponding to
different ranges of join attribute values.
Specialized techniques(Continued)
 To deal with skew as follows:
 Sample the building relation to determine the partitioning
ranges.
 Redistribute the building relation to the processors using the
ranges. Each processor builds a hash table containing the
incoming tuples.
 Redistribute the probing relation using the same ranges to
the processors. For each tuple received, each processor
probes the hash table to perform the join.
Inter-Operator Load Balancing
 Important to Choose for each operator
 How many and which processors to assign for its execution.
 Taking into account pipeline parallelism, which requires inter-operator
communication.
 Harder to achieve in shared-nothing for this Reasons:
 Choice of the degree of parallelism cause to errors
 Reason: Both processors and operators are discrete entities.
Inter-Operator Load
Balancing(Continued)
 Processors associated with the latest operators in a pipeline
chain may remain idle a significant time.
 Shared-memory allows the parallel execution of independent
pipeline chains
 It is known as Tasks.
 Dynamically adjusting the degree of intra-operator parallelism
of the tasks in order to reach maximum resource utilization.
Activations
 Represents a sequential unit of work
 Can be executed by any thread
 Self-contained
 Can only be executed in the same SM(shared memory)-node
Activation Queues
Moving data activation along pipeline chains
Also called table queues
Threads have unrestricted access to the same SM-node queues
Small number of queue results interference
A thread a queue
Thread
 Simple strategy for good load balancing if number of threads are higher than
the processors
 One thread per processor per query reduce the overhead of interference
 Thread will consume activation as much as possible to limit thread
interference
THANK YOU
Reference:
 M. Tamer Özsu • Patrick Valduriez, Principles of Distributed Database Systems,
Third Edition

More Related Content

What's hot

Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)Ravinder Kamboj
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Gyanmanjari Institute Of Technology
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency ControlDilum Bandara
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Meghaj Mallick
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query OptimizationAli Usman
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 ReliabilityAli Usman
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed systemSunita Sahu
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management SystemJanki Shah
 
Distributed concurrency control
Distributed concurrency controlDistributed concurrency control
Distributed concurrency controlBinte fatima
 
16. Concurrency Control in DBMS
16. Concurrency Control in DBMS16. Concurrency Control in DBMS
16. Concurrency Control in DBMSkoolkampus
 
Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance SHIKHA GAUTAM
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed SystemsOptimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systemsmridul mishra
 
Concurrency control
Concurrency  controlConcurrency  control
Concurrency controlJaved Khan
 
Deadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddikDeadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddikSaeed Siddik
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating SystemsDr Sandeep Kumar Poonia
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed SystemSunita Sahu
 

What's hot (20)

Query processing and optimization (updated)
Query processing and optimization (updated)Query processing and optimization (updated)
Query processing and optimization (updated)
 
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
Distributed DBMS - Unit 8 - Distributed Transaction Management & Concurrency ...
 
Transactions and Concurrency Control
Transactions and Concurrency ControlTransactions and Concurrency Control
Transactions and Concurrency Control
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
 
Database , 8 Query Optimization
Database , 8 Query OptimizationDatabase , 8 Query Optimization
Database , 8 Query Optimization
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
Clock synchronization in distributed system
Clock synchronization in distributed systemClock synchronization in distributed system
Clock synchronization in distributed system
 
Distributed Transaction
Distributed TransactionDistributed Transaction
Distributed Transaction
 
Concurrency Control in Database Management System
Concurrency Control in Database Management SystemConcurrency Control in Database Management System
Concurrency Control in Database Management System
 
Distributed database
Distributed databaseDistributed database
Distributed database
 
Distributed concurrency control
Distributed concurrency controlDistributed concurrency control
Distributed concurrency control
 
16. Concurrency Control in DBMS
16. Concurrency Control in DBMS16. Concurrency Control in DBMS
16. Concurrency Control in DBMS
 
Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance Distributed Systems Introduction and Importance
Distributed Systems Introduction and Importance
 
Optimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed SystemsOptimistic concurrency control in Distributed Systems
Optimistic concurrency control in Distributed Systems
 
Concurrency control
Concurrency  controlConcurrency  control
Concurrency control
 
Deadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddikDeadlock in distribute system by saeed siddik
Deadlock in distribute system by saeed siddik
 
DDBMS Paper with Solution
DDBMS Paper with SolutionDDBMS Paper with Solution
DDBMS Paper with Solution
 
Replication in Distributed Systems
Replication in Distributed SystemsReplication in Distributed Systems
Replication in Distributed Systems
 
8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems8. mutual exclusion in Distributed Operating Systems
8. mutual exclusion in Distributed Operating Systems
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
 

Viewers also liked

Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMSAli Usman
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceJ Singh
 
The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3vjaquez
 
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
Introduction to Parallel Processing Algorithms in Shared Nothing DatabasesIntroduction to Parallel Processing Algorithms in Shared Nothing Databases
Introduction to Parallel Processing Algorithms in Shared Nothing DatabasesOfir Manor
 
Log based and Recovery with concurrent transaction
Log based and Recovery with concurrent transactionLog based and Recovery with concurrent transaction
Log based and Recovery with concurrent transactionnikunjandy
 
Best practices for DB2 for z/OS log based recovery
Best practices for DB2 for z/OS log based recoveryBest practices for DB2 for z/OS log based recovery
Best practices for DB2 for z/OS log based recoveryFlorence Dubois
 
Object Oriented Dbms
Object Oriented DbmsObject Oriented Dbms
Object Oriented Dbmsmaryeem
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques Kalhan Liyanage
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMSkoolkampus
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMSkoolkampus
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMSkoolkampus
 
Disaster recovery and the cloud
Disaster recovery and the cloudDisaster recovery and the cloud
Disaster recovery and the cloudJason Dea
 
Database management system
Database management systemDatabase management system
Database management systemRizwanHafeez
 

Viewers also liked (20)

Database ,14 Parallel DBMS
Database ,14 Parallel DBMSDatabase ,14 Parallel DBMS
Database ,14 Parallel DBMS
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
 
The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3The DSP/BIOS Bridge - OMAP3
The DSP/BIOS Bridge - OMAP3
 
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
Introduction to Parallel Processing Algorithms in Shared Nothing DatabasesIntroduction to Parallel Processing Algorithms in Shared Nothing Databases
Introduction to Parallel Processing Algorithms in Shared Nothing Databases
 
Log based and Recovery with concurrent transaction
Log based and Recovery with concurrent transactionLog based and Recovery with concurrent transaction
Log based and Recovery with concurrent transaction
 
Chapter24
Chapter24Chapter24
Chapter24
 
Best practices for DB2 for z/OS log based recovery
Best practices for DB2 for z/OS log based recoveryBest practices for DB2 for z/OS log based recovery
Best practices for DB2 for z/OS log based recovery
 
Database and different types of databases available in market
Database and different types of databases available in marketDatabase and different types of databases available in market
Database and different types of databases available in market
 
Disaster Recovery in the Cloud
Disaster Recovery in the CloudDisaster Recovery in the Cloud
Disaster Recovery in the Cloud
 
Field study 1
Field study 1Field study 1
Field study 1
 
Database recovery
Database recoveryDatabase recovery
Database recovery
 
Object Oriented Dbms
Object Oriented DbmsObject Oriented Dbms
Object Oriented Dbms
 
Data recovery
Data recoveryData recovery
Data recovery
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
 
13. Query Processing in DBMS
13. Query Processing in DBMS13. Query Processing in DBMS
13. Query Processing in DBMS
 
Disaster recovery and the cloud
Disaster recovery and the cloudDisaster recovery and the cloud
Disaster recovery and the cloud
 
Database management system
Database management systemDatabase management system
Database management system
 
Data Base Management System
Data Base Management SystemData Base Management System
Data Base Management System
 

Similar to Load Balancing in Parallel and Distributed Database

Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed SystemsRicha Singh
 
Enhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computingEnhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computingeSAT Journals
 
Enhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computingEnhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computingeSAT Publishing House
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed EnvironmentIRJET Journal
 
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...IRJET Journal
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxjohnsmith96441
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...Soumya Banerjee
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxfaithxdunce63732
 
The Concept of Load Balancing Server in Secured and Intelligent Network
The Concept of Load Balancing Server in Secured and Intelligent NetworkThe Concept of Load Balancing Server in Secured and Intelligent Network
The Concept of Load Balancing Server in Secured and Intelligent NetworkIJAEMSJORNAL
 
Modified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud ComputingModified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud Computingijsrd.com
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET Journal
 
Benchmark methods to analyze embedded processors and systems
Benchmark methods to analyze embedded processors and systemsBenchmark methods to analyze embedded processors and systems
Benchmark methods to analyze embedded processors and systemsXMOS
 

Similar to Load Balancing in Parallel and Distributed Database (20)

Load balancing in Distributed Systems
Load balancing in Distributed SystemsLoad balancing in Distributed Systems
Load balancing in Distributed Systems
 
Enhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computingEnhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computing
 
Enhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computingEnhanced equally distributed load balancing algorithm for cloud computing
Enhanced equally distributed load balancing algorithm for cloud computing
 
IRJET - Efficient Load Balancing in a Distributed Environment
IRJET -  	  Efficient Load Balancing in a Distributed EnvironmentIRJET -  	  Efficient Load Balancing in a Distributed Environment
IRJET - Efficient Load Balancing in a Distributed Environment
 
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
Static Load Balancing of Parallel Mining Efficient Algorithm with PBEC in Fre...
 
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptxICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
ICS 2410.Parallel.Sytsems.Lecture.Week 3.week5.pptx
 
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
A Multi-Agent System Approach to Load-Balancing and Resource Allocation for D...
 
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docxCS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
CS 301 Computer ArchitectureStudent # 1 EID 09Kingdom of .docx
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
1.prallelism
1.prallelism1.prallelism
1.prallelism
 
The Concept of Load Balancing Server in Secured and Intelligent Network
The Concept of Load Balancing Server in Secured and Intelligent NetworkThe Concept of Load Balancing Server in Secured and Intelligent Network
The Concept of Load Balancing Server in Secured and Intelligent Network
 
Modified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud ComputingModified Active Monitoring Load Balancing with Cloud Computing
Modified Active Monitoring Load Balancing with Cloud Computing
 
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
IRJET-A Review on Trends in Multicore Processor Based on Cache and Power Diss...
 
G216063
G216063G216063
G216063
 
Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Resource management
Resource managementResource management
Resource management
 
Csc concepts
Csc conceptsCsc concepts
Csc concepts
 
Benchmark methods to analyze embedded processors and systems
Benchmark methods to analyze embedded processors and systemsBenchmark methods to analyze embedded processors and systems
Benchmark methods to analyze embedded processors and systems
 
Module3 part1
Module3 part1Module3 part1
Module3 part1
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 

More from Md. Shamsur Rahim

Software Quality Assurance & Testing
Software Quality Assurance & TestingSoftware Quality Assurance & Testing
Software Quality Assurance & TestingMd. Shamsur Rahim
 
National Operating System for Bangladesh
National Operating System for BangladeshNational Operating System for Bangladesh
National Operating System for BangladeshMd. Shamsur Rahim
 
Slide #2: Setup Apache Storm
Slide #2: Setup Apache StormSlide #2: Setup Apache Storm
Slide #2: Setup Apache StormMd. Shamsur Rahim
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormMd. Shamsur Rahim
 
Slide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMSlide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMMd. Shamsur Rahim
 
NASA Space App Challenge-Team: Hello World
NASA Space App Challenge-Team: Hello WorldNASA Space App Challenge-Team: Hello World
NASA Space App Challenge-Team: Hello WorldMd. Shamsur Rahim
 

More from Md. Shamsur Rahim (7)

Software Quality Assurance & Testing
Software Quality Assurance & TestingSoftware Quality Assurance & Testing
Software Quality Assurance & Testing
 
National Operating System for Bangladesh
National Operating System for BangladeshNational Operating System for Bangladesh
National Operating System for Bangladesh
 
Slide #2: Setup Apache Storm
Slide #2: Setup Apache StormSlide #2: Setup Apache Storm
Slide #2: Setup Apache Storm
 
Slide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache StormSlide #1:Introduction to Apache Storm
Slide #1:Introduction to Apache Storm
 
Slide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROMSlide #2: How to Setup Apache STROM
Slide #2: How to Setup Apache STROM
 
1 storm-intro
1 storm-intro1 storm-intro
1 storm-intro
 
NASA Space App Challenge-Team: Hello World
NASA Space App Challenge-Team: Hello WorldNASA Space App Challenge-Team: Hello World
NASA Space App Challenge-Team: Hello World
 

Recently uploaded

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Zilliz
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAnitaRaj43
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....rightmanforbloodline
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)Samir Dash
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 

Recently uploaded (20)

Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
TEST BANK For Principles of Anatomy and Physiology, 16th Edition by Gerard J....
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 

Load Balancing in Parallel and Distributed Database

  • 1. Load Balancing in Distributed Database Md. Shamsur Rahim 14-98181-3 Student, MScCS, AIUB AZM Ehtesham Chowdhury 15-98451-1 Student, MScCS, AIUB Saiful Akhter 15-98502-1 Student, MScCS, AIUB
  • 2. Load Balancing:  Means distributing transaction and queries among different nodes.  The goal is to maximize the throughput.  Parallel Execution Problems  1. Initialization  2. Interference  3. Skew
  • 3. Parallel Execution Problems : Initialization  Initialization is necessary before execution.  This sequential steps includes  Process/ Thread Creation and initialization  Communication Initialization etc.  The duration is proportional to the degree of parallelism  The degree of parallelism should be fixed according to query complexity.  Formula for finding response time for an Operator: 𝑅𝑒𝑠𝑝𝑜𝑛𝑠𝑒𝑇𝑖𝑚𝑒 = 𝑎 ∗ 𝑛 + 𝑐∗𝑁 𝑛  The equation can be further derived to obtain: 𝑁 = 𝑡𝑜𝑢𝑝𝑙𝑒𝑠, 𝑐 = 𝑎𝑣𝑔 𝑝𝑟𝑜𝑐𝑒𝑠𝑠𝑖𝑛𝑔 𝑡𝑖𝑚𝑒 n = No. Of Processors optimal number of processors to allocate (n) maximal achievable speedup (S) 𝑛 = √ 𝑐 ∗ 𝑁 𝑎 𝑆 = 𝑛 2
  • 4. Parallel Execution Problems : Interferences  Parallel execution can be slowed down by interference.  Interference occurs when several processors simultaneously access the same resource,  Hardware  Solution: Duplicate Shared resource  Software.  Solution: Partition the shared resource into several independent resources
  • 5. Parallel Execution Problems : Skew  Problem appears with intra- operator parallelism (variation in partition size) is known as data skew.  Classification of Skew:  Attribute Value Skew : inherent in the dataset  e.g., there are more citizens in Paris than in Waterloo  Tuple Placement Skew: introduced when the data are initially partitioned  e.g., with range partitioning  Selectivity Skew  introduced when there is variation in the selectivity of select predicates on each node  Redistribution Skew  occurs in the redistribution step between two operators.  Join Product Skew  occurs because the join selectivity may vary between nodes
  • 6. Inter-Query Parallelism  Form of parallelism where many different Queries or Transactions are executed in parallel with one another on many processors.  Advantages:  Increases Transaction Throughput.  Scales up the Transaction processing system  Easy to implement in Shared Memory Parallel System.  Example: Oracle 8 & Oracle Rdb.
  • 7. Intra-Query Parallelism  Form of parallelism where Single Query is executed in parallel on many processors.  2 Types.  Intra-operation parallelism  Inter-operation parallelism  Advantages:  speed up a single complex long running queries.  Best suited for complex scientific calculations (queries).  Example: Informix, Terradata.
  • 8. Intra-operation parallelism  The process of speeding up a query through parallelizing the execution of individual operations.  The operations which can be parallelized are Sort, Join, Projection, Selection and so on.
  • 9. Inter-operation parallelism  The process of speeding up a query through parallelizing various operations which are part of the query.  Example Step:  A query which involves join of 4 tables executed in two processors  Each processor shall join two relations locally and the result1 and result2 can be joined further to produce the final result.
  • 10. Intra-Operator Load Balancing  Depends on  The degree of parallelism.  Allocation of processors for the operator.  The home of the operator (the set of processors where it is executed) must be carefully decided.  The skew problem makes it hard for a parallel query optimizer to make this decision statically.  Require a very accurate and detailed cost model.
  • 11.  Two Solutions incorporated in a hybrid query optimizer.  Adaptive  Specialized
  • 12. Adaptive Technique  The main idea is to statically decide on an initial allocation of the processors to the operator (using a cost model).  Adapt to skew using load reallocation.  Load reallocation is to detect the oversized partitions.  Partition them again onto several processors.
  • 13. Adaptive Technique(Continued)  Advantage:  More dynamic adjustment of the degree of parallelism.  useful to improve intra-operator load balancing in all kinds of parallel architectures.  By reducing processor interference  Excellent load balancing for intra-operator parallelism
  • 14. Adaptive Technique(Continued)  specific control operators.  Detect whether the static estimates for intermediate result sizes differ from the run-time values.  Relation redistribution in order to prevent join product skew and redistribution skew.  Depends on difference between the estimate and the real value is sufficiently high.
  • 15. Specialized techniques  Two main techniques.  Range partitioning  Sampling  Avoid redistribution skew of the building relation.  Processors can get partitions of equal numbers of tuples, corresponding to different ranges of join attribute values.
  • 16. Specialized techniques(Continued)  To deal with skew as follows:  Sample the building relation to determine the partitioning ranges.  Redistribute the building relation to the processors using the ranges. Each processor builds a hash table containing the incoming tuples.  Redistribute the probing relation using the same ranges to the processors. For each tuple received, each processor probes the hash table to perform the join.
  • 17. Inter-Operator Load Balancing  Important to Choose for each operator  How many and which processors to assign for its execution.  Taking into account pipeline parallelism, which requires inter-operator communication.  Harder to achieve in shared-nothing for this Reasons:  Choice of the degree of parallelism cause to errors  Reason: Both processors and operators are discrete entities.
  • 18. Inter-Operator Load Balancing(Continued)  Processors associated with the latest operators in a pipeline chain may remain idle a significant time.  Shared-memory allows the parallel execution of independent pipeline chains  It is known as Tasks.  Dynamically adjusting the degree of intra-operator parallelism of the tasks in order to reach maximum resource utilization.
  • 19. Activations  Represents a sequential unit of work  Can be executed by any thread  Self-contained  Can only be executed in the same SM(shared memory)-node
  • 20. Activation Queues Moving data activation along pipeline chains Also called table queues Threads have unrestricted access to the same SM-node queues Small number of queue results interference A thread a queue
  • 21. Thread  Simple strategy for good load balancing if number of threads are higher than the processors  One thread per processor per query reduce the overhead of interference  Thread will consume activation as much as possible to limit thread interference
  • 23. Reference:  M. Tamer Özsu • Patrick Valduriez, Principles of Distributed Database Systems, Third Edition