SlideShare a Scribd company logo
Presented By:- Shobhit Saxena
Why Parallel Access To
Data?
1 Terabyte
10 MB/s
At 10 MB/s
1.2 days to scan
1 Terabyte
1,000 x parallel
1.5 minute to scan.
Parallelism:
divide a big problem
into many smaller ones
to be solved in parallel.
Bandwidth
Parallel DBMS: Introduction
 Parallelism natural to DBMS processing
Pipeline parallelism: many machines each
doing one step in a multi-step process.
Partition parallelism: many machines doing
the same thing to different pieces of data.
Both are natural in DBMS!
Pipeline
Partition
Any
Sequential
Program
Any
Sequential
Program
SequentialSequential SequentialSequential
Any
Sequential
Program
Any
Sequential
Program
outputs split N ways, inputs merge M ways
DBMS: The || Success Story
 DBMSs are among most (only?) successful
application of parallelism.
Teradata, Tandem vs. Thinking Machines,
Every major DBMS vendor has some || server
Workstation manufacturers depend on || DB server
sales.
 Reasons for success:
Bulk-processing (= partitioned ||-ism).
Natural pipelining.
Inexpensive hardware can do the trick!
Users/app-prog. don’t need to think in ||
Some || Terminology
 Speed-Up
More resources means
proportionally less time
for given amount of
data.
 Scale-Up
If resources increased
in proportion to increase
in data size, time is
constant.
degree of ||-ism
Xact/sec.
(throughput)
Ideal
degree of ||-ism
sec./Xact
(responsetime)
Ideal
Architecture Issue:
Shared What?
Shared Memory
(SMP)
Shared Disk Shared Nothing
(network)
CLIENTS CLIENTSCLIENTS
Memory
Processors
Easy to program
Expensive to build
Difficult to scaleup
Hard to program
Cheap to build
Easy to scaleup
Sequent, SGI, Sun VMScluster Tandem, Teradata
Different Types of DBMS ||-ism
 Intra-operator parallelism
get all machines working to compute a given
operation (scan, sort, join)
 Inter-operator parallelism
each operator may run concurrently on a different site
(exploits pipelining)
 Inter-query parallelism
different queries run on different site
We’ll focus on intra-operator ||-ism
Automatic Data
Partitioning
Partitioning a table:
Range Hash Round Robin
Shared-disk and -memory less sensitive to partitioning,
Shared nothing benefits from "good" partitioning !
A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z
Good for group-by,
range queries, and
also equip-join
Good for equijoins Good to spread load;
Most flexible - but
Parallel Scans
 Scan in parallel, and merge.
 Selection may not require access of all sites
for range or hash partitioning.
 Indexes can be built at each partition.
 Question: How do indexes differ in the
different schemes?
Think about both lookups and inserts!
Parallel Sorting
 New records again and again:
8.5 Gb/minute, shared-nothing; Datamation benchmark
in 2.41 secs (UCB students)
 Idea:
Scan in parallel, and range-partition as you go.
As tuples come in, begin “local” sorting on each
Resulting data is sorted, and range-partitioned.
Problem: skew!
Solution: “sample” data at start to determine partition
points.
Parallel Joins
 Nested loop:
Each outer tuple must be compared with each inner
tuple that might join.
Easy for range partitioning on join cols, hard
otherwise!
 Sort-Merge (or plain Merge-Join):
Sorting gives range-partitioning.
Merging partitioned tables is local.
Parallel Hash Join
 In first phase, partitions get distributed to different
sites:
A good hash function distributes work evenly
 Do second phase at each site.
 Almost always the winner for equi-join.
Original Relations
(R then S)
OUTPUT
2
B main memory buffers DiskDisk
INPUT
1
hash
function
h
B-1
Partitions
1
2
B-1
. . .
Phase1
Dataflow Network for || Join
 Good use of split/merge makes it easier to build
parallel versions of sequential join code.
Complex Parallel Query Plans
 Complex Queries: Inter-Operator parallelism
Pipelining between operators:
○ sort and phase 1 of hash-join block the pipeline!!
Bushy Trees
A B R S
Sites 1-4 Sites 5-8
Sites 1-8
N×M-way
Parallelism
A...E F...J K...N O...S T...Z
Merge
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Join
Sort
Merge
Merge
N inputs, M outputs, no bottlenecks.
Partitioned Data
Partitioned and Pipelined Data Flows
Observations
 It is relatively easy to build a fast parallel
query executor
 It is hard to write a robust high-quality
parallel query optimizer:
There are many tricks.
One quickly hits the complexity barrier.
Parallel Query Optimization
 Common approach: 2 phases
Pick best sequential plan (System R algo)
Pick degree of parallelism based on current
system parameters.
 “Bind” operators to processors
Take query tree, “decorate” with assigned
processor
What’s Wrong With That?
st serial plan != Best || plan !
Why?
Trivial counter-example:
Table partitioned with local secondary index at two nodes
Range query: all data of node 1 and 1% of node 2.
Node 1 should do a scan of its partition.
Node 2 should use secondary index.
SELECT *
FROM telephone_book
WHERE name < “NoGood”;
N..Z
Table
Scan
A..M
Index
Scan
Parallel DBMS Summary
 ||-ism natural to query processing:
Both pipeline and partition ||-ism!
 Shared-Nothing vs. Shared-Mem
Shared-disk too, but less standard
Shared-memory easy, costly.
○ But doesn’t scaleup.
Shared-nothing cheap, scales well.
○ But harder to implement.
 Intra-op, Inter-op & Inter-query ||-ism possible.
Ch22 parallel d_bs_cs561

More Related Content

What's hot

Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
Stanley Wang
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
MLconf
 
Parallel processing coa
Parallel processing coaParallel processing coa
Parallel processing coa
Bala Vignesh
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
MLconf
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
Ram Sriharsha
 
Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?
Timetrix
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
DataStax Academy
 
Parallel computing
Parallel computingParallel computing
Parallel computing
Engr Zardari Saddam
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
Yu Liu
 
Parallel computing
Parallel computingParallel computing
Parallel computing
virend111
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
Ameya Waghmare
 
EC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed ProcessingEC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed Processing
Jonathan Dahl
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
Murtadha Alsabbagh
 
Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017
Sam Witteveen
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
Shitalkumar Sukhdeve
 
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMS
Vipul Thakur
 
The Art&Science of tuning SAP HANA models
The Art&Science of tuning SAP HANA modelsThe Art&Science of tuning SAP HANA models
The Art&Science of tuning SAP HANA models
Michal Korzen
 
Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
David Chou
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
Adam Gibson
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
asimkadav
 

What's hot (20)

Distributed machine learning
Distributed machine learningDistributed machine learning
Distributed machine learning
 
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
Braxton McKee, Founder & CEO, Ufora at MLconf SF - 11/13/15
 
Parallel processing coa
Parallel processing coaParallel processing coa
Parallel processing coa
 
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
Mathias Brandewinder, Software Engineer & Data Scientist, Clear Lines Consult...
 
Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016Online learning with structured streaming, spark summit brussels 2016
Online learning with structured streaming, spark summit brussels 2016
 
Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?Functional?
 Reactive? 
Why?
Functional?
 Reactive? 
Why?
 
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
IBM Spark Technology Center: Real-time Advanced Analytics and Machine Learnin...
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
On Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and ExperimentsOn Extending MapReduce - Survey and Experiments
On Extending MapReduce - Survey and Experiments
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
EC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed ProcessingEC2, MapReduce, and Distributed Processing
EC2, MapReduce, and Distributed Processing
 
Parallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and DisadvantagesParallel Algorithms Advantages and Disadvantages
Parallel Algorithms Advantages and Disadvantages
 
Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017Tensor flow intro and summit info feb 2017
Tensor flow intro and summit info feb 2017
 
Research Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel ProgrammingResearch Scope in Parallel Computing And Parallel Programming
Research Scope in Parallel Computing And Parallel Programming
 
CASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMSCASSANDRA - Next to RDBMS
CASSANDRA - Next to RDBMS
 
The Art&Science of tuning SAP HANA models
The Art&Science of tuning SAP HANA modelsThe Art&Science of tuning SAP HANA models
The Art&Science of tuning SAP HANA models
 
Patterns For Parallel Computing
Patterns For Parallel ComputingPatterns For Parallel Computing
Patterns For Parallel Computing
 
Brief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep LearningBrief introduction to Distributed Deep Learning
Brief introduction to Distributed Deep Learning
 
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
MALT: Distributed Data-Parallelism for Existing ML Applications (Distributed ...
 

Viewers also liked

The study pro-and anti- oncogenic activity of the cellular origin biologica...
 The study pro-and anti- oncogenic activity  of the cellular origin biologica... The study pro-and anti- oncogenic activity  of the cellular origin biologica...
The study pro-and anti- oncogenic activity of the cellular origin biologica...
Татьяна Гергелюк
 
Scalar PP
Scalar PPScalar PP
Scalar PP
Jordan Williams
 
Elumalai_2.5_Year
Elumalai_2.5_YearElumalai_2.5_Year
Elumalai_2.5_Year
Elumalai Tholan
 
mindfulness
mindfulnessmindfulness
Architecture & Functionality for Learning Spaces Website
Architecture & Functionality for Learning Spaces WebsiteArchitecture & Functionality for Learning Spaces Website
Architecture & Functionality for Learning Spaces Website
Pragati Kunwer
 
Marcus Green
Marcus GreenMarcus Green
Marcus Green
Greg Simmons
 
Parking page-handout-graphic
Parking page-handout-graphicParking page-handout-graphic
Parking page-handout-graphic
Safe Rise
 
Accountor - Sanctions & Business in Russia: Today & Tomorrow
Accountor - Sanctions & Business in Russia: Today & TomorrowAccountor - Sanctions & Business in Russia: Today & Tomorrow
Accountor - Sanctions & Business in Russia: Today & Tomorrow
Accountor Russia and Ukraine
 
Disruption in Wealth Management
Disruption in Wealth ManagementDisruption in Wealth Management
Disruption in Wealth Management
Greg Simmons
 
Савонькина Ольга.Ecology
Савонькина Ольга.EcologyСавонькина Ольга.Ecology
Савонькина Ольга.Ecology
Irina Korkina
 
احمد الغامدي
احمد الغامدياحمد الغامدي
احمد الغامدي
ahmed alghamdi
 
Ionic app
Ionic appIonic app
Ionic app
Shobhit Saxena
 
Олег Антонов "Возможности нового кабинета Prom.ua"
Олег Антонов "Возможности нового кабинета Prom.ua"Олег Антонов "Возможности нового кабинета Prom.ua"
Олег Антонов "Возможности нового кабинета Prom.ua"
Prom
 
Community outreach portfolio shobhit
Community outreach portfolio shobhitCommunity outreach portfolio shobhit
Community outreach portfolio shobhit
Shobhit Saxena
 
Certificate of Participation
Certificate of ParticipationCertificate of Participation
Certificate of Participation
Nataliia Andrieieva
 
CCNA 1 Chapter - 4
CCNA 1  Chapter - 4CCNA 1  Chapter - 4
CCNA 1 Chapter - 4
Muteb Al-Malki
 

Viewers also liked (17)

The study pro-and anti- oncogenic activity of the cellular origin biologica...
 The study pro-and anti- oncogenic activity  of the cellular origin biologica... The study pro-and anti- oncogenic activity  of the cellular origin biologica...
The study pro-and anti- oncogenic activity of the cellular origin biologica...
 
Scalar PP
Scalar PPScalar PP
Scalar PP
 
Elumalai_2.5_Year
Elumalai_2.5_YearElumalai_2.5_Year
Elumalai_2.5_Year
 
mindfulness
mindfulnessmindfulness
mindfulness
 
NICET II PM
NICET II PMNICET II PM
NICET II PM
 
Architecture & Functionality for Learning Spaces Website
Architecture & Functionality for Learning Spaces WebsiteArchitecture & Functionality for Learning Spaces Website
Architecture & Functionality for Learning Spaces Website
 
Marcus Green
Marcus GreenMarcus Green
Marcus Green
 
Parking page-handout-graphic
Parking page-handout-graphicParking page-handout-graphic
Parking page-handout-graphic
 
Accountor - Sanctions & Business in Russia: Today & Tomorrow
Accountor - Sanctions & Business in Russia: Today & TomorrowAccountor - Sanctions & Business in Russia: Today & Tomorrow
Accountor - Sanctions & Business in Russia: Today & Tomorrow
 
Disruption in Wealth Management
Disruption in Wealth ManagementDisruption in Wealth Management
Disruption in Wealth Management
 
Савонькина Ольга.Ecology
Савонькина Ольга.EcologyСавонькина Ольга.Ecology
Савонькина Ольга.Ecology
 
احمد الغامدي
احمد الغامدياحمد الغامدي
احمد الغامدي
 
Ionic app
Ionic appIonic app
Ionic app
 
Олег Антонов "Возможности нового кабинета Prom.ua"
Олег Антонов "Возможности нового кабинета Prom.ua"Олег Антонов "Возможности нового кабинета Prom.ua"
Олег Антонов "Возможности нового кабинета Prom.ua"
 
Community outreach portfolio shobhit
Community outreach portfolio shobhitCommunity outreach portfolio shobhit
Community outreach portfolio shobhit
 
Certificate of Participation
Certificate of ParticipationCertificate of Participation
Certificate of Participation
 
CCNA 1 Chapter - 4
CCNA 1  Chapter - 4CCNA 1  Chapter - 4
CCNA 1 Chapter - 4
 

Similar to Ch22 parallel d_bs_cs561

Graph processing
Graph processingGraph processing
Graph processing
yeahjs
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
Turi, Inc.
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
huguk
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
koolkampus
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
Shani729
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
VAISHNAVI MADHAN
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
Data Con LA
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
KRamasamy2
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented Model
Nikhil Sharma
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
David Martínez Rego
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
jins0618
 
ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMS
chandugoswami
 
Manjeet Singh.pptx
Manjeet Singh.pptxManjeet Singh.pptx
Manjeet Singh.pptx
RAMCHANDRASHARMA7
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
guestdfd1ec
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Jon Meredith
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
Zvi Avraham
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
Renato Lucindo
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
Alex Robinson
 

Similar to Ch22 parallel d_bs_cs561 (20)

Graph processing
Graph processingGraph processing
Graph processing
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
Making Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and DistributedMaking Machine Learning Scale: Single Machine and Distributed
Making Machine Learning Scale: Single Machine and Distributed
 
GraphChi big graph processing
GraphChi big graph processingGraphChi big graph processing
GraphChi big graph processing
 
20. Parallel Databases in DBMS
20. Parallel Databases in DBMS20. Parallel Databases in DBMS
20. Parallel Databases in DBMS
 
Lecture 24
Lecture 24Lecture 24
Lecture 24
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Impala presentation ahad rana
Impala presentation ahad ranaImpala presentation ahad rana
Impala presentation ahad rana
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
 
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
Floating Point Operations , Memory Chip Organization , Serial Bus Architectur...
 
Data Parallel and Object Oriented Model
Data Parallel and Object Oriented ModelData Parallel and Object Oriented Model
Data Parallel and Object Oriented Model
 
Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Ling liu part 02:big graph processing
Ling liu part 02:big graph processingLing liu part 02:big graph processing
Ling liu part 02:big graph processing
 
ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMS
 
Manjeet Singh.pptx
Manjeet Singh.pptxManjeet Singh.pptx
Manjeet Singh.pptx
 
Design Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational DatabasesDesign Patterns for Distributed Non-Relational Databases
Design Patterns for Distributed Non-Relational Databases
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
Distributed Systems: scalability and high availability
Distributed Systems: scalability and high availabilityDistributed Systems: scalability and high availability
Distributed Systems: scalability and high availability
 
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
The Hows and Whys of a Distributed SQL Database - Strange Loop 2017
 

More from Shobhit Saxena

***भारत की जय जय कार ***
***भारत की जय जय कार *** ***भारत की जय जय कार ***
***भारत की जय जय कार ***
Shobhit Saxena
 
DFD For E-learning Project
DFD For E-learning ProjectDFD For E-learning Project
DFD For E-learning Project
Shobhit Saxena
 
Index page of lab file
Index page of lab fileIndex page of lab file
Index page of lab file
Shobhit Saxena
 
Write a program to perform Perspective projection
Write a program to perform Perspective projectionWrite a program to perform Perspective projection
Write a program to perform Perspective projection
Shobhit Saxena
 
Write a program to perform Oblique projection
Write a program to perform Oblique projectionWrite a program to perform Oblique projection
Write a program to perform Oblique projection
Shobhit Saxena
 
Write a program to perform Orthographic projection.
Write a program to perform Orthographic  projection.Write a program to perform Orthographic  projection.
Write a program to perform Orthographic projection.
Shobhit Saxena
 
Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve. Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve.
Shobhit Saxena
 
Write a program to perform translation.
 Write a program to perform translation. Write a program to perform translation.
Write a program to perform translation.
Shobhit Saxena
 
Graphics and Graphics Hardware System
Graphics and Graphics Hardware SystemGraphics and Graphics Hardware System
Graphics and Graphics Hardware System
Shobhit Saxena
 
Cover letter
Cover letterCover letter
Cover letter
Shobhit Saxena
 
WEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATIONWEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATION
Shobhit Saxena
 
Progress report for research paper
Progress report for research paperProgress report for research paper
Progress report for research paper
Shobhit Saxena
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
Shobhit Saxena
 
Window to viewprt
Window to viewprtWindow to viewprt
Window to viewprt
Shobhit Saxena
 
Write a program to perform translation
Write a program to perform translationWrite a program to perform translation
Write a program to perform translation
Shobhit Saxena
 
Surface rendering
Surface renderingSurface rendering
Surface rendering
Shobhit Saxena
 
Shobhit portfolio
Shobhit portfolioShobhit portfolio
Shobhit portfolio
Shobhit Saxena
 

More from Shobhit Saxena (17)

***भारत की जय जय कार ***
***भारत की जय जय कार *** ***भारत की जय जय कार ***
***भारत की जय जय कार ***
 
DFD For E-learning Project
DFD For E-learning ProjectDFD For E-learning Project
DFD For E-learning Project
 
Index page of lab file
Index page of lab fileIndex page of lab file
Index page of lab file
 
Write a program to perform Perspective projection
Write a program to perform Perspective projectionWrite a program to perform Perspective projection
Write a program to perform Perspective projection
 
Write a program to perform Oblique projection
Write a program to perform Oblique projectionWrite a program to perform Oblique projection
Write a program to perform Oblique projection
 
Write a program to perform Orthographic projection.
Write a program to perform Orthographic  projection.Write a program to perform Orthographic  projection.
Write a program to perform Orthographic projection.
 
Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve. Write a program to draw a cubic Bezier curve.
Write a program to draw a cubic Bezier curve.
 
Write a program to perform translation.
 Write a program to perform translation. Write a program to perform translation.
Write a program to perform translation.
 
Graphics and Graphics Hardware System
Graphics and Graphics Hardware SystemGraphics and Graphics Hardware System
Graphics and Graphics Hardware System
 
Cover letter
Cover letterCover letter
Cover letter
 
WEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATIONWEEKLY PROGRESS REPORT (WPR) for DISSERTATION
WEEKLY PROGRESS REPORT (WPR) for DISSERTATION
 
Progress report for research paper
Progress report for research paperProgress report for research paper
Progress report for research paper
 
Opticalcharacter recognition
Opticalcharacter recognition Opticalcharacter recognition
Opticalcharacter recognition
 
Window to viewprt
Window to viewprtWindow to viewprt
Window to viewprt
 
Write a program to perform translation
Write a program to perform translationWrite a program to perform translation
Write a program to perform translation
 
Surface rendering
Surface renderingSurface rendering
Surface rendering
 
Shobhit portfolio
Shobhit portfolioShobhit portfolio
Shobhit portfolio
 

Recently uploaded

ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
Rahul
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
co23btech11018
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
Dr Ramhari Poudyal
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
mahammadsalmanmech
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
IJECEIAES
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Sinan KOZAK
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
171ticu
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
enizeyimana36
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
Hitesh Mohapatra
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
jpsjournal1
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
KrishnaveniKrishnara1
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
Aditya Rajan Patra
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
MIGUELANGEL966976
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
NidhalKahouli2
 

Recently uploaded (20)

ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024ACEP Magazine edition 4th launched on 05.06.2024
ACEP Magazine edition 4th launched on 05.06.2024
 
Computational Engineering IITH Presentation
Computational Engineering IITH PresentationComputational Engineering IITH Presentation
Computational Engineering IITH Presentation
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
Literature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptxLiterature Review Basics and Understanding Reference Management.pptx
Literature Review Basics and Understanding Reference Management.pptx
 
Question paper of renewable energy sources
Question paper of renewable energy sourcesQuestion paper of renewable energy sources
Question paper of renewable energy sources
 
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样官方认证美国密歇根州立大学毕业证学位证书原版一模一样
官方认证美国密歇根州立大学毕业证学位证书原版一模一样
 
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball playEric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
Eric Nizeyimana's document 2006 from gicumbi to ttc nyamata handball play
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
Generative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of contentGenerative AI leverages algorithms to create various forms of content
Generative AI leverages algorithms to create various forms of content
 
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTCHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECT
 
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.pptUnit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
Unit-III-ELECTROCHEMICAL STORAGE DEVICES.ppt
 
Recycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part IIIRecycled Concrete Aggregate in Construction Part III
Recycled Concrete Aggregate in Construction Part III
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdfBPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
BPV-GUI-01-Guide-for-ASME-Review-Teams-(General)-10-10-2023.pdf
 
basic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdfbasic-wireline-operations-course-mahmoud-f-radwan.pdf
basic-wireline-operations-course-mahmoud-f-radwan.pdf
 

Ch22 parallel d_bs_cs561

  • 2. Why Parallel Access To Data? 1 Terabyte 10 MB/s At 10 MB/s 1.2 days to scan 1 Terabyte 1,000 x parallel 1.5 minute to scan. Parallelism: divide a big problem into many smaller ones to be solved in parallel. Bandwidth
  • 3. Parallel DBMS: Introduction  Parallelism natural to DBMS processing Pipeline parallelism: many machines each doing one step in a multi-step process. Partition parallelism: many machines doing the same thing to different pieces of data. Both are natural in DBMS! Pipeline Partition Any Sequential Program Any Sequential Program SequentialSequential SequentialSequential Any Sequential Program Any Sequential Program outputs split N ways, inputs merge M ways
  • 4. DBMS: The || Success Story  DBMSs are among most (only?) successful application of parallelism. Teradata, Tandem vs. Thinking Machines, Every major DBMS vendor has some || server Workstation manufacturers depend on || DB server sales.  Reasons for success: Bulk-processing (= partitioned ||-ism). Natural pipelining. Inexpensive hardware can do the trick! Users/app-prog. don’t need to think in ||
  • 5. Some || Terminology  Speed-Up More resources means proportionally less time for given amount of data.  Scale-Up If resources increased in proportion to increase in data size, time is constant. degree of ||-ism Xact/sec. (throughput) Ideal degree of ||-ism sec./Xact (responsetime) Ideal
  • 6. Architecture Issue: Shared What? Shared Memory (SMP) Shared Disk Shared Nothing (network) CLIENTS CLIENTSCLIENTS Memory Processors Easy to program Expensive to build Difficult to scaleup Hard to program Cheap to build Easy to scaleup Sequent, SGI, Sun VMScluster Tandem, Teradata
  • 7. Different Types of DBMS ||-ism  Intra-operator parallelism get all machines working to compute a given operation (scan, sort, join)  Inter-operator parallelism each operator may run concurrently on a different site (exploits pipelining)  Inter-query parallelism different queries run on different site We’ll focus on intra-operator ||-ism
  • 8. Automatic Data Partitioning Partitioning a table: Range Hash Round Robin Shared-disk and -memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning ! A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z Good for group-by, range queries, and also equip-join Good for equijoins Good to spread load; Most flexible - but
  • 9. Parallel Scans  Scan in parallel, and merge.  Selection may not require access of all sites for range or hash partitioning.  Indexes can be built at each partition.  Question: How do indexes differ in the different schemes? Think about both lookups and inserts!
  • 10. Parallel Sorting  New records again and again: 8.5 Gb/minute, shared-nothing; Datamation benchmark in 2.41 secs (UCB students)  Idea: Scan in parallel, and range-partition as you go. As tuples come in, begin “local” sorting on each Resulting data is sorted, and range-partitioned. Problem: skew! Solution: “sample” data at start to determine partition points.
  • 11. Parallel Joins  Nested loop: Each outer tuple must be compared with each inner tuple that might join. Easy for range partitioning on join cols, hard otherwise!  Sort-Merge (or plain Merge-Join): Sorting gives range-partitioning. Merging partitioned tables is local.
  • 12. Parallel Hash Join  In first phase, partitions get distributed to different sites: A good hash function distributes work evenly  Do second phase at each site.  Almost always the winner for equi-join. Original Relations (R then S) OUTPUT 2 B main memory buffers DiskDisk INPUT 1 hash function h B-1 Partitions 1 2 B-1 . . . Phase1
  • 13. Dataflow Network for || Join  Good use of split/merge makes it easier to build parallel versions of sequential join code.
  • 14. Complex Parallel Query Plans  Complex Queries: Inter-Operator parallelism Pipelining between operators: ○ sort and phase 1 of hash-join block the pipeline!! Bushy Trees A B R S Sites 1-4 Sites 5-8 Sites 1-8
  • 15. N×M-way Parallelism A...E F...J K...N O...S T...Z Merge Join Sort Join Sort Join Sort Join Sort Join Sort Merge Merge N inputs, M outputs, no bottlenecks. Partitioned Data Partitioned and Pipelined Data Flows
  • 16. Observations  It is relatively easy to build a fast parallel query executor  It is hard to write a robust high-quality parallel query optimizer: There are many tricks. One quickly hits the complexity barrier.
  • 17. Parallel Query Optimization  Common approach: 2 phases Pick best sequential plan (System R algo) Pick degree of parallelism based on current system parameters.  “Bind” operators to processors Take query tree, “decorate” with assigned processor
  • 18. What’s Wrong With That? st serial plan != Best || plan ! Why? Trivial counter-example: Table partitioned with local secondary index at two nodes Range query: all data of node 1 and 1% of node 2. Node 1 should do a scan of its partition. Node 2 should use secondary index. SELECT * FROM telephone_book WHERE name < “NoGood”; N..Z Table Scan A..M Index Scan
  • 19. Parallel DBMS Summary  ||-ism natural to query processing: Both pipeline and partition ||-ism!  Shared-Nothing vs. Shared-Mem Shared-disk too, but less standard Shared-memory easy, costly. ○ But doesn’t scaleup. Shared-nothing cheap, scales well. ○ But harder to implement.  Intra-op, Inter-op & Inter-query ||-ism possible.

Editor's Notes

  1. The slides for this text are organized into chapters. This lecture covers the material on parallel databases from Chapter 22. The material on distributed databases can be found in Chapter 22, Part B.