SlideShare a Scribd company logo
Data WarehousingData Warehousing
11
Data WarehousingData Warehousing
Lecture-25Lecture-25
Need for Speed: Parallelism MethodologiesNeed for Speed: Parallelism Methodologies
Virtual University of PakistanVirtual University of Pakistan
Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan1010@yahoo.com
Data Warehousing
2
MotivationMotivation
 No need of parallelism if perfect computerNo need of parallelism if perfect computer
 with single infinitely fast processorwith single infinitely fast processor
 with an infinite memory with infinite bandwidthwith an infinite memory with infinite bandwidth
 and its infinitely cheap too (free!)and its infinitely cheap too (free!)
 Technology is not delivering (going to Moon analogy)Technology is not delivering (going to Moon analogy)
 The Challenge is to buildThe Challenge is to build
 infinitely fast processor out of infinitely manyinfinitely fast processor out of infinitely many
processors ofprocessors of finite speedfinite speed
 Infinitely large memory with infinite memoryInfinitely large memory with infinite memory
bandwidth from infinite manybandwidth from infinite many finite storage unitsfinite storage units ofof
finite speedfinite speed
No text goes to graphics
Data Warehousing
3
Data Parallelism: ConceptData Parallelism: Concept
 Parallel execution of a single data manipulationParallel execution of a single data manipulation
task across multiple partitions of data.task across multiple partitions of data.
 Partitions static or dynamicPartitions static or dynamic
 Tasks executed almost-independently acrossTasks executed almost-independently across
partitions.partitions.
 ““Query coordinator” must coordinate between theQuery coordinator” must coordinate between the
independently executing processes.independently executing processes.
No text goes to graphics
Data Warehousing
4
Data Parallelism: ExampleData Parallelism: Example
Emp Table
Partition 1Partition-1
Partition-2
Partition-k
.
.
.
62
440
1,123
Query
Server-1
Query
Server-2
Query
Server-k
.
.
.
Query
Coordinator
Select count (*)
from Emp
where age > 50
AND
sal > 10,000’;
Ans = 62 + 440 + ... + 1,123 = 99,000
Data Warehousing
5
To get a speed-up of N with N partitions, it must beTo get a speed-up of N with N partitions, it must be
ensured that:ensured that:
 There are enough computing resources.There are enough computing resources.
 Query-coordinator is very fast as compared to queryQuery-coordinator is very fast as compared to query
servers.servers.
 Work done in each partition almost same to avoidWork done in each partition almost same to avoid
performance bottlenecks.performance bottlenecks.
 Same number of records in each partition would notSame number of records in each partition would not
suffice.suffice.
 Need to have uniform distribution of records w.r.t filterNeed to have uniform distribution of records w.r.t filter
criterion across partitions.criterion across partitions.
Data Parallelism: Ensuring Speed-UPData Parallelism: Ensuring Speed-UP
No text will go to graphics
Data Warehousing
6
Temporal Parallelism (pipelining)Temporal Parallelism (pipelining)
Involves taking a complex task and breaking it down intoInvolves taking a complex task and breaking it down into
independentindependent subtasks for parallel execution on a streamsubtasks for parallel execution on a stream
of data inputs.of data inputs.
Time = T/3 Time = T/3 Time = T/3
[] [] [][]
Task Execution Time = T
[] [] [] [] [] []
No text goes to graphics
Data Warehousing
7
Pipelining: Time ChartPipelining: Time Chart
Time = T/3
[][]
Time = T/3 Time = T/3
Time = T/3
[][]
Time = T/3 Time = T/3
Time = T/3
[]
Time = T/3 Time = T/3
T = 0 T = 1 T = 2
Time = T/3
[]
Time = T/3
T = 3
Data Warehousing
8
Pipelining: Speed-Up CalculationPipelining: Speed-Up Calculation
Time for sequential execution of 1 taskTime for sequential execution of 1 task = T= T
Time for sequential execution of N tasks = N * TTime for sequential execution of N tasks = N * T
(Ideal) time for pipelined execution of one task using an M stage pipeline(Ideal) time for pipelined execution of one task using an M stage pipeline
= T= T
(Ideal) time for pipelined execution of N tasks using an M stage pipeline(Ideal) time for pipelined execution of N tasks using an M stage pipeline
= T + ((N-1)= T + ((N-1) ×× (T/M))(T/M))
Speed-up (S) =Speed-up (S) =
Pipeline parallelism focuses on increasingPipeline parallelism focuses on increasing throughputthroughput of task execution,of task execution,
NOT on decreasing sub-taskNOT on decreasing sub-task execution timeexecution time..
Data Warehousing
9
Example: Bottling soft drinks in a factoryExample: Bottling soft drinks in a factory
1010 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES
Sequential executionSequential execution = 10= 10 ×× TT
Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (10-1)/3 = 4(10-1)/3 = 4 ×× TT
Speed-up = 2.50Speed-up = 2.50
2020 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES
Sequential executionSequential execution = 20= 20 ×× TT
Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (20-1)/3 = 7.3(20-1)/3 = 7.3 ×× TT
Speed-up = 2.72Speed-up = 2.72
4040 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES
Sequential executionSequential execution = 40= 40 ×× TT
Fill bottle, Seal bottle, Label Bottle pipeline = T + TFill bottle, Seal bottle, Label Bottle pipeline = T + T ×× (40-1)/3 = 14.0(40-1)/3 = 14.0 ×× TT
Speed-up = 2.85Speed-up = 2.85
Pipelining: Speed-Up ExamplePipelining: Speed-Up Example
Only 1st
two examples will go to graphics
Data Warehousing
10
Pipelining: Input vs Speed-UpPipelining: Input vs Speed-Up
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
2.8
3
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Input (N)
Speed-up(S)
Asymptotic limit on speed-up for M stage pipeline is M.Asymptotic limit on speed-up for M stage pipeline is M.
The speed-up will NEVER be M, as initially filling theThe speed-up will NEVER be M, as initially filling the
pipeline took T time units.pipeline took T time units.
Data Warehousing
11
Pipelining: LimitationsPipelining: Limitations
 Relational pipelines are rarely very longRelational pipelines are rarely very long
 Even a chain of length ten is unusual.Even a chain of length ten is unusual.
 Some relational operators do not produce firstSome relational operators do not produce first
output until consumed all their inputs.output until consumed all their inputs.
 Aggregate and sort operators have this property. OneAggregate and sort operators have this property. One
cannot pipeline these operators.cannot pipeline these operators.
 Often, execution cost of one operator is muchOften, execution cost of one operator is much
greater than others hence skew.greater than others hence skew.
 e.g. Sum() or count() vs Group-by() or Join.e.g. Sum() or count() vs Group-by() or Join.
No text goes to graphics
Data Warehousing
12
Partitioning & QueriesPartitioning & Queries
 Let’s evaluate how well different partitioningLet’s evaluate how well different partitioning
techniques support the following types oftechniques support the following types of
data access:data access:
 Full Table Scan:Full Table Scan: Scanning the entire relationScanning the entire relation
 Point Queries:Point Queries: Locating a tuple, e.g. whereLocating a tuple, e.g. where r.Ar.A
= 313= 313
 Range Queries:Range Queries: Locating all tuples such thatLocating all tuples such that
the value of a given attribute lies within athe value of a given attribute lies within a
specified range. e.g., where 313specified range. e.g., where 313 ≤≤ r.Ar.A < 786.< 786.
yellow goes to graphics
Data Warehousing
13
Round RobinRound Robin
 AdvantagesAdvantages
 Best suited for sequential scan of entireBest suited for sequential scan of entire
relation on each query.relation on each query.
 All disks have almost an equal number ofAll disks have almost an equal number of
tuples; retrieval work is thus well balancedtuples; retrieval work is thus well balanced
between disks.between disks.
 Range queries are difficult to processRange queries are difficult to process
 No clustering -- tuples are scattered acrossNo clustering -- tuples are scattered across
all disksall disks
Partitioning & QueriesPartitioning & Queries
yellow goes to graphics
Data Warehousing
14
Hash PartitioningHash Partitioning
 Good for sequential accessGood for sequential access
 With uniform hashing and using partitioning attributes asWith uniform hashing and using partitioning attributes as
a key, tuples will be equally distributed between disks.a key, tuples will be equally distributed between disks.
 Good for point queries on partitioning attributeGood for point queries on partitioning attribute
 Can lookup single disk, leaving others available forCan lookup single disk, leaving others available for
answering other queries.answering other queries.
 Index on partitioning attribute can be local to disk, makingIndex on partitioning attribute can be local to disk, making
lookup and update very efficient even joins.lookup and update very efficient even joins.
• Range queries are difficult to processRange queries are difficult to process
No clustering -- tuples are scattered across allNo clustering -- tuples are scattered across all
disksdisks
Partitioning & QueriesPartitioning & Queries
yellow goes to graphics
Data Warehousing
15
Range PartitioningRange Partitioning
 Provides data clustering by partitioning attribute value.Provides data clustering by partitioning attribute value.
 Good for sequential accessGood for sequential access
 Good for point queries on partitioning attribute: only oneGood for point queries on partitioning attribute: only one
disk needs to be accessed.disk needs to be accessed.
 For range queries on partitioning attribute, one or a fewFor range queries on partitioning attribute, one or a few
disks may need to be accesseddisks may need to be accessed
− Remaining disks are available for other queries.Remaining disks are available for other queries.
− Good if result tuples are from one to a few blocks.Good if result tuples are from one to a few blocks.
− If many blocks are to be fetched, they are still fetched from one to aIf many blocks are to be fetched, they are still fetched from one to a
few disks, then potential parallelism in disk access is wastedfew disks, then potential parallelism in disk access is wasted
Partitioning & QueriesPartitioning & Queries
yellow goes to graphics
Data Warehousing
16
Parallel SortingParallel Sorting
 Scan in parallel, and range partition on the go.Scan in parallel, and range partition on the go.
 As partitioned data becomes available, performAs partitioned data becomes available, perform
“local” sorting.“local” sorting.
 Resulting data is sorted and again range partitioned.Resulting data is sorted and again range partitioned.
 Problem:Problem: skew or “hot spot”.skew or “hot spot”.
 Solution:Solution: Sample the data at start to determineSample the data at start to determine
partition pointspartition points.
data
Processors
1 2 3 4 5
Hot spot
P1 P2 P3 P4 P5
1 4 1 2 1
Data Warehousing
17
Skew in PartitioningSkew in Partitioning
 The distribution of tuples to disks may beThe distribution of tuples to disks may be skewedskewed
 i.e. some disks have many tuples, while others may have fewer tuples.i.e. some disks have many tuples, while others may have fewer tuples.
 Types of skew:Types of skew:
 Attribute-value skew.Attribute-value skew.
 Some values appear in the partitioning attributes of many tuples; allSome values appear in the partitioning attributes of many tuples; all
the tuples with the same value for the partitioning attribute end up inthe tuples with the same value for the partitioning attribute end up in
the same partition.the same partition.
 Can occur with range-partitioning and hash-partitioning.Can occur with range-partitioning and hash-partitioning.
 Partition skewPartition skew..
 With range-partitioning, badly chosen partition vector may assignWith range-partitioning, badly chosen partition vector may assign
too many tuples to some partitions and too few to others.too many tuples to some partitions and too few to others.
 Less likely with hash-partitioning if a good hash-function is chosen.Less likely with hash-partitioning if a good hash-function is chosen.
yellow goes to graphics
Data Warehousing
18
Handling Skew in Range-PartitioningHandling Skew in Range-Partitioning
 To create a balanced partitioning vectorTo create a balanced partitioning vector
 SortSort the relation on the partitioning attribute.the relation on the partitioning attribute.
 Construct the partition vectorConstruct the partition vector by scanning theby scanning the
relation in sorted order as follows.relation in sorted order as follows.
 After every 1/After every 1/nnthth
of the relation has been read, the value ofof the relation has been read, the value of
the partitioning attribute of the next tuple is added to thethe partitioning attribute of the next tuple is added to the
partition vector.partition vector.
 nn denotes the number of partitions to be constructed.denotes the number of partitions to be constructed.
 Duplicate entries or imbalancesDuplicate entries or imbalances can result ifcan result if
duplicates are present in partitioning attributes.duplicates are present in partitioning attributes.
yellow goes to graphics
Data Warehousing
19
Barriers to Linear Speedup & Scale-upBarriers to Linear Speedup & Scale-up
 Amdahal’ LawAmdahal’ Law
 StartupStartup
 Time needed to start a large number of processors.Time needed to start a large number of processors.
 Increase with increase in number of individual processors.Increase with increase in number of individual processors.
 May also include time spent in opening files etc.May also include time spent in opening files etc.
 InterferenceInterference
 Slow down that each processor imposes on all others when sharing aSlow down that each processor imposes on all others when sharing a
common pool of resources “(e.g. memory).common pool of resources “(e.g. memory).
 SkewSkew
 Variance dominating the mean.Variance dominating the mean.
 Service time of the job is service time of its slowest components.Service time of the job is service time of its slowest components.
yellow goes to graphics
Data Warehousing
20
Comparison of Partitioning TechniquesComparison of Partitioning Techniques
Shared disk/memory less sensitive to partitioning.
Shared nothing can benefit from good partitioning.
A…E F…J K…NO…S T…Z
Range
Good for equijoins, range
queries, group-by clauses,
can result in “hot spots”.
UsersUsers
A…E F…J K…NO…S T…Z
Round Robin
Good for load balancing,
but impervious to nature of
queries.
UsersUsers
A…E F…J K…NO…S T…Z
Hash
Good for equijoins, can
results in uneven data
distribution
UsersUsers
Data Warehousing
21
Parallel AggregatesParallel Aggregates
For each aggregate function, need a decomposition:
Count(S) = Σ count(s1) + Σ count(s2) + ….
Average(S) = Σ Avg(s1) + Σ Avg(s2) + ….
For groups:
Distribute data using hashing.
Sub aggregate groups close to the source.
Pass each sub-aggregate to its group’s site.
A…E F…J K…NO…S T…Z
Data Warehousing
22
 When to use Range Partitioning?When to use Range Partitioning?
 When to Use Hash Partitioning?When to Use Hash Partitioning?
 When to Use List Partitioning?When to Use List Partitioning?
 When to use Round-Robin Partitioning?When to use Round-Robin Partitioning?
When to use which partitioning Tech?When to use which partitioning Tech?
Data Warehousing
23
Parallelism Goals and MetricsParallelism Goals and Metrics
 Speedup: TheSpeedup: The GoodGood, The, The BadBad & The& The UglyUgly
OldTime
NewTimeSpeedup=
Processors & Discs
The ideal
Speedup Curve
Linearity
 Scale-up:Scale-up:
 Transactional Scale-up: Fit for OLTP systemsTransactional Scale-up: Fit for OLTP systems
 Batch Scale-up: Fit for Data Warehouse and OLAPBatch Scale-up: Fit for Data Warehouse and OLAP
Processors & Discs
A Bad Speedup Curve
Non-linear
Min Parallelism
Benefit
Processors & Discs
A Bad Speedup Curve
3-Factors
Startup
Interference
Skew

More Related Content

What's hot

KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
Kyong-Ha Lee
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Kelly Technologies
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
Xavier Constant
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
Franck Pachot
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
eldariof
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
Giovanna Roda
 
Spark vstez
Spark vstezSpark vstez
Spark vstez
David Groozman
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Giovanna Roda
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
Asad Masood Qazi
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
Paladion Networks
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Zhe Zhang
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
barbie0909
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
t3rmin4t0r
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
Geoff Hendrey
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory database
Alexander Talac
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
Kyong-Ha Lee
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Michel Bruley
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
Shreyansh Ajit kumar
 

What's hot (20)

KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.KIISE:SIGDB Workshop presentation.
KIISE:SIGDB Workshop presentation.
 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
 
Big data presentation (2014)
Big data presentation (2014)Big data presentation (2014)
Big data presentation (2014)
 
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
CBO choice between Index and Full Scan:  the good, the bad and the ugly param...CBO choice between Index and Full Scan:  the good, the bad and the ugly param...
CBO choice between Index and Full Scan: the good, the bad and the ugly param...
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
 
A sql implementation on the map reduce framework
A sql implementation on the map reduce frameworkA sql implementation on the map reduce framework
A sql implementation on the map reduce framework
 
Introduction to Hadoop part1
Introduction to Hadoop part1Introduction to Hadoop part1
Introduction to Hadoop part1
 
Spark vstez
Spark vstezSpark vstez
Spark vstez
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 
Hadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questionsHadoop 31-frequently-asked-interview-questions
Hadoop 31-frequently-asked-interview-questions
 
Analysing of big data using map reduce
Analysing of big data using map reduceAnalysing of big data using map reduce
Analysing of big data using map reduce
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
 
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure CodingLess is More: 2X Storage Efficiency with HDFS Erasure Coding
Less is More: 2X Storage Efficiency with HDFS Erasure Coding
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
 
Hive+Tez: A performance deep dive
Hive+Tez: A performance deep diveHive+Tez: A performance deep dive
Hive+Tez: A performance deep dive
 
Real time hadoop + mapreduce intro
Real time hadoop + mapreduce introReal time hadoop + mapreduce intro
Real time hadoop + mapreduce intro
 
Sap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory databaseSap technical deep dive in a column oriented in memory database
Sap technical deep dive in a column oriented in memory database
 
Parallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A SurveyParallel Data Processing with MapReduce: A Survey
Parallel Data Processing with MapReduce: A Survey
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Hadoop Research
Hadoop Research Hadoop Research
Hadoop Research
 

Similar to Lecture 25

ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMS
chandugoswami
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Intel Software Brasil
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
Abhirup Mallik
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
Intel® Software
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data Structure
OllieShoresna
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
Eman magdy
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
Geoffrey Fox
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptx
ShivamKrPathak
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
Spark Summit
 
The Need for Async @ ScalaWorld
The Need for Async @ ScalaWorldThe Need for Async @ ScalaWorld
The Need for Async @ ScalaWorld
Konrad Malawski
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
AntareepMajumder
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
Databricks
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
Jenny Liu
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
NoSQLmatters
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
Paolo Platter
 
Performance measures
Performance measuresPerformance measures
Performance measures
Divya Tiwari
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
SagarDR5
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
Alpha474815
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
Justin Borgman
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
Arumugam90
 

Similar to Lecture 25 (20)

ADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMSADBS_parallel Databases in Advanced DBMS
ADBS_parallel Databases in Advanced DBMS
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
Parallel Computing with R
Parallel Computing with RParallel Computing with R
Parallel Computing with R
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
Data structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data StructureData structures assignmentweek4b.pdfCI583 Data Structure
Data structures assignmentweek4b.pdfCI583 Data Structure
 
Basics in algorithms and data structure
Basics in algorithms and data structure Basics in algorithms and data structure
Basics in algorithms and data structure
 
Parallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel applicationParallel Computing 2007: Bring your own parallel application
Parallel Computing 2007: Bring your own parallel application
 
DATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptxDATA STRUCTURES unit 1.pptx
DATA STRUCTURES unit 1.pptx
 
Spark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer AgarwalSpark Summit EU talk by Sameer Agarwal
Spark Summit EU talk by Sameer Agarwal
 
The Need for Async @ ScalaWorld
The Need for Async @ ScalaWorldThe Need for Async @ ScalaWorld
The Need for Async @ ScalaWorld
 
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
FALLSEM2022-23_BCSE202L_TH_VL2022230103292_Reference_Material_I_25-07-2022_Fu...
 
Cooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache SparkCooperative Task Execution for Apache Spark
Cooperative Task Execution for Apache Spark
 
A Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in ParallelA Tale of Data Pattern Discovery in Parallel
A Tale of Data Pattern Discovery in Parallel
 
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
Uwe Friedrichsen – Extreme availability and self-healing data with CRDTs - No...
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Performance measures
Performance measuresPerformance measures
Performance measures
 
lecture1.ppt
lecture1.pptlecture1.ppt
lecture1.ppt
 
C++ Notes PPT.ppt
C++ Notes PPT.pptC++ Notes PPT.ppt
C++ Notes PPT.ppt
 
Presto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop MeetupPresto at Tivo, Boston Hadoop Meetup
Presto at Tivo, Boston Hadoop Meetup
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 

More from Shani729

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012
Shani729
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
Shani729
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interaction
Shani729
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)
Shani729
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
Shani729
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
Shani729
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
Shani729
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
Shani729
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
Shani729
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
Shani729
 
Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Dwh lecture slides-week3&4
Dwh lecture slides-week3&4
Shani729
 
Dwh lecture slides-week2
Dwh lecture slides-week2Dwh lecture slides-week2
Dwh lecture slides-week2
Shani729
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1
Shani729
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13
Shani729
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13
Shani729
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
Shani729
 
Lecture 40
Lecture 40Lecture 40
Lecture 40
Shani729
 
Lecture 39
Lecture 39Lecture 39
Lecture 39
Shani729
 
Lecture 38
Lecture 38Lecture 38
Lecture 38
Shani729
 
Lecture 37
Lecture 37Lecture 37
Lecture 37
Shani729
 

More from Shani729 (20)

Python tutorialfeb152012
Python tutorialfeb152012Python tutorialfeb152012
Python tutorialfeb152012
 
Python tutorial
Python tutorialPython tutorial
Python tutorial
 
Interaction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interactionInteraction design _beyond_human_computer_interaction
Interaction design _beyond_human_computer_interaction
 
Fm lecturer 13(final)
Fm lecturer 13(final)Fm lecturer 13(final)
Fm lecturer 13(final)
 
Lecture slides week14-15
Lecture slides week14-15Lecture slides week14-15
Lecture slides week14-15
 
Frequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth methodFrequent itemset mining using pattern growth method
Frequent itemset mining using pattern growth method
 
Dwh lecture slides-week15
Dwh lecture slides-week15Dwh lecture slides-week15
Dwh lecture slides-week15
 
Dwh lecture slides-week10
Dwh lecture slides-week10Dwh lecture slides-week10
Dwh lecture slides-week10
 
Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8Dwh lecture slidesweek7&8
Dwh lecture slidesweek7&8
 
Dwh lecture slides-week5&6
Dwh lecture slides-week5&6Dwh lecture slides-week5&6
Dwh lecture slides-week5&6
 
Dwh lecture slides-week3&4
Dwh lecture slides-week3&4Dwh lecture slides-week3&4
Dwh lecture slides-week3&4
 
Dwh lecture slides-week2
Dwh lecture slides-week2Dwh lecture slides-week2
Dwh lecture slides-week2
 
Dwh lecture slides-week1
Dwh lecture slides-week1Dwh lecture slides-week1
Dwh lecture slides-week1
 
Dwh lecture slides-week 13
Dwh lecture slides-week 13Dwh lecture slides-week 13
Dwh lecture slides-week 13
 
Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13Dwh lecture slides-week 12&13
Dwh lecture slides-week 12&13
 
Data warehousing and mining furc
Data warehousing and mining furcData warehousing and mining furc
Data warehousing and mining furc
 
Lecture 40
Lecture 40Lecture 40
Lecture 40
 
Lecture 39
Lecture 39Lecture 39
Lecture 39
 
Lecture 38
Lecture 38Lecture 38
Lecture 38
 
Lecture 37
Lecture 37Lecture 37
Lecture 37
 

Recently uploaded

Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
Massimo Talia
 
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUESAN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
drshikhapandey2022
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
Paris Salesforce Developer Group
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
Divyanshu
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
PreethaV16
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
sydezfe
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
Pallavi Sharma
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
pvpriya2
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
MadhavJungKarki
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Transcat
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
OKORIE1
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
21UME003TUSHARDEB
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
ijseajournal
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
q30122000
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
vmspraneeth
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
VANDANAMOHANGOUDA
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
uqyfuc
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
snaprevwdev
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
Lubi Valves
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
Atif Razi
 

Recently uploaded (20)

Levelised Cost of Hydrogen (LCOH) Calculator Manual
Levelised Cost of Hydrogen  (LCOH) Calculator ManualLevelised Cost of Hydrogen  (LCOH) Calculator Manual
Levelised Cost of Hydrogen (LCOH) Calculator Manual
 
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUESAN INTRODUCTION OF AI & SEARCHING TECHIQUES
AN INTRODUCTION OF AI & SEARCHING TECHIQUES
 
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
AI + Data Community Tour - Build the Next Generation of Apps with the Einstei...
 
Null Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAMNull Bangalore | Pentesters Approach to AWS IAM
Null Bangalore | Pentesters Approach to AWS IAM
 
OOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming languageOOPS_Lab_Manual - programs using C++ programming language
OOPS_Lab_Manual - programs using C++ programming language
 
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
一比一原版(uoft毕业证书)加拿大多伦多大学毕业证如何办理
 
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdfSELENIUM CONF -PALLAVI SHARMA - 2024.pdf
SELENIUM CONF -PALLAVI SHARMA - 2024.pdf
 
Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...Determination of Equivalent Circuit parameters and performance characteristic...
Determination of Equivalent Circuit parameters and performance characteristic...
 
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
1FIDIC-CONSTRUCTION-CONTRACT-2ND-ED-2017-RED-BOOK.pdf
 
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
Tools & Techniques for Commissioning and Maintaining PV Systems W-Animations ...
 
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
DESIGN AND MANUFACTURE OF CEILING BOARD USING SAWDUST AND WASTE CARTON MATERI...
 
Mechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdfMechanical Engineering on AAI Summer Training Report-003.pdf
Mechanical Engineering on AAI Summer Training Report-003.pdf
 
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...Call For Paper -3rd International Conference on Artificial Intelligence Advan...
Call For Paper -3rd International Conference on Artificial Intelligence Advan...
 
Height and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdfHeight and depth gauge linear metrology.pdf
Height and depth gauge linear metrology.pdf
 
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICSUNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
UNIT 4 LINEAR INTEGRATED CIRCUITS-DIGITAL ICS
 
ITSM Integration with MuleSoft.pptx
ITSM  Integration with MuleSoft.pptxITSM  Integration with MuleSoft.pptx
ITSM Integration with MuleSoft.pptx
 
一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理一比一原版(USF毕业证)旧金山大学毕业证如何办理
一比一原版(USF毕业证)旧金山大学毕业证如何办理
 
openshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoinopenshift technical overview - Flow of openshift containerisatoin
openshift technical overview - Flow of openshift containerisatoin
 
Butterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdfButterfly Valves Manufacturer (LBF Series).pdf
Butterfly Valves Manufacturer (LBF Series).pdf
 
Applications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdfApplications of artificial Intelligence in Mechanical Engineering.pdf
Applications of artificial Intelligence in Mechanical Engineering.pdf
 

Lecture 25

  • 1. Data WarehousingData Warehousing 11 Data WarehousingData Warehousing Lecture-25Lecture-25 Need for Speed: Parallelism MethodologiesNeed for Speed: Parallelism Methodologies Virtual University of PakistanVirtual University of Pakistan Ahsan Abdullah Assoc. Prof. & Head Center for Agro-Informatics Research www.nu.edu.pk/cairindex.asp National University of Computers & Emerging Sciences, Islamabad Email: ahsan1010@yahoo.com
  • 2. Data Warehousing 2 MotivationMotivation  No need of parallelism if perfect computerNo need of parallelism if perfect computer  with single infinitely fast processorwith single infinitely fast processor  with an infinite memory with infinite bandwidthwith an infinite memory with infinite bandwidth  and its infinitely cheap too (free!)and its infinitely cheap too (free!)  Technology is not delivering (going to Moon analogy)Technology is not delivering (going to Moon analogy)  The Challenge is to buildThe Challenge is to build  infinitely fast processor out of infinitely manyinfinitely fast processor out of infinitely many processors ofprocessors of finite speedfinite speed  Infinitely large memory with infinite memoryInfinitely large memory with infinite memory bandwidth from infinite manybandwidth from infinite many finite storage unitsfinite storage units ofof finite speedfinite speed No text goes to graphics
  • 3. Data Warehousing 3 Data Parallelism: ConceptData Parallelism: Concept  Parallel execution of a single data manipulationParallel execution of a single data manipulation task across multiple partitions of data.task across multiple partitions of data.  Partitions static or dynamicPartitions static or dynamic  Tasks executed almost-independently acrossTasks executed almost-independently across partitions.partitions.  ““Query coordinator” must coordinate between theQuery coordinator” must coordinate between the independently executing processes.independently executing processes. No text goes to graphics
  • 4. Data Warehousing 4 Data Parallelism: ExampleData Parallelism: Example Emp Table Partition 1Partition-1 Partition-2 Partition-k . . . 62 440 1,123 Query Server-1 Query Server-2 Query Server-k . . . Query Coordinator Select count (*) from Emp where age > 50 AND sal > 10,000’; Ans = 62 + 440 + ... + 1,123 = 99,000
  • 5. Data Warehousing 5 To get a speed-up of N with N partitions, it must beTo get a speed-up of N with N partitions, it must be ensured that:ensured that:  There are enough computing resources.There are enough computing resources.  Query-coordinator is very fast as compared to queryQuery-coordinator is very fast as compared to query servers.servers.  Work done in each partition almost same to avoidWork done in each partition almost same to avoid performance bottlenecks.performance bottlenecks.  Same number of records in each partition would notSame number of records in each partition would not suffice.suffice.  Need to have uniform distribution of records w.r.t filterNeed to have uniform distribution of records w.r.t filter criterion across partitions.criterion across partitions. Data Parallelism: Ensuring Speed-UPData Parallelism: Ensuring Speed-UP No text will go to graphics
  • 6. Data Warehousing 6 Temporal Parallelism (pipelining)Temporal Parallelism (pipelining) Involves taking a complex task and breaking it down intoInvolves taking a complex task and breaking it down into independentindependent subtasks for parallel execution on a streamsubtasks for parallel execution on a stream of data inputs.of data inputs. Time = T/3 Time = T/3 Time = T/3 [] [] [][] Task Execution Time = T [] [] [] [] [] [] No text goes to graphics
  • 7. Data Warehousing 7 Pipelining: Time ChartPipelining: Time Chart Time = T/3 [][] Time = T/3 Time = T/3 Time = T/3 [][] Time = T/3 Time = T/3 Time = T/3 [] Time = T/3 Time = T/3 T = 0 T = 1 T = 2 Time = T/3 [] Time = T/3 T = 3
  • 8. Data Warehousing 8 Pipelining: Speed-Up CalculationPipelining: Speed-Up Calculation Time for sequential execution of 1 taskTime for sequential execution of 1 task = T= T Time for sequential execution of N tasks = N * TTime for sequential execution of N tasks = N * T (Ideal) time for pipelined execution of one task using an M stage pipeline(Ideal) time for pipelined execution of one task using an M stage pipeline = T= T (Ideal) time for pipelined execution of N tasks using an M stage pipeline(Ideal) time for pipelined execution of N tasks using an M stage pipeline = T + ((N-1)= T + ((N-1) ×× (T/M))(T/M)) Speed-up (S) =Speed-up (S) = Pipeline parallelism focuses on increasingPipeline parallelism focuses on increasing throughputthroughput of task execution,of task execution, NOT on decreasing sub-taskNOT on decreasing sub-task execution timeexecution time..
  • 9. Data Warehousing 9 Example: Bottling soft drinks in a factoryExample: Bottling soft drinks in a factory 1010 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES Sequential executionSequential execution = 10= 10 ×× TT Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (10-1)/3 = 4(10-1)/3 = 4 ×× TT Speed-up = 2.50Speed-up = 2.50 2020 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES Sequential executionSequential execution = 20= 20 ×× TT Fill bottle, Seal bottle, Label Bottle pipelineFill bottle, Seal bottle, Label Bottle pipeline = T + T= T + T ×× (20-1)/3 = 7.3(20-1)/3 = 7.3 ×× TT Speed-up = 2.72Speed-up = 2.72 4040 CRATES LOADS OF BOTTLESCRATES LOADS OF BOTTLES Sequential executionSequential execution = 40= 40 ×× TT Fill bottle, Seal bottle, Label Bottle pipeline = T + TFill bottle, Seal bottle, Label Bottle pipeline = T + T ×× (40-1)/3 = 14.0(40-1)/3 = 14.0 ×× TT Speed-up = 2.85Speed-up = 2.85 Pipelining: Speed-Up ExamplePipelining: Speed-Up Example Only 1st two examples will go to graphics
  • 10. Data Warehousing 10 Pipelining: Input vs Speed-UpPipelining: Input vs Speed-Up 1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Input (N) Speed-up(S) Asymptotic limit on speed-up for M stage pipeline is M.Asymptotic limit on speed-up for M stage pipeline is M. The speed-up will NEVER be M, as initially filling theThe speed-up will NEVER be M, as initially filling the pipeline took T time units.pipeline took T time units.
  • 11. Data Warehousing 11 Pipelining: LimitationsPipelining: Limitations  Relational pipelines are rarely very longRelational pipelines are rarely very long  Even a chain of length ten is unusual.Even a chain of length ten is unusual.  Some relational operators do not produce firstSome relational operators do not produce first output until consumed all their inputs.output until consumed all their inputs.  Aggregate and sort operators have this property. OneAggregate and sort operators have this property. One cannot pipeline these operators.cannot pipeline these operators.  Often, execution cost of one operator is muchOften, execution cost of one operator is much greater than others hence skew.greater than others hence skew.  e.g. Sum() or count() vs Group-by() or Join.e.g. Sum() or count() vs Group-by() or Join. No text goes to graphics
  • 12. Data Warehousing 12 Partitioning & QueriesPartitioning & Queries  Let’s evaluate how well different partitioningLet’s evaluate how well different partitioning techniques support the following types oftechniques support the following types of data access:data access:  Full Table Scan:Full Table Scan: Scanning the entire relationScanning the entire relation  Point Queries:Point Queries: Locating a tuple, e.g. whereLocating a tuple, e.g. where r.Ar.A = 313= 313  Range Queries:Range Queries: Locating all tuples such thatLocating all tuples such that the value of a given attribute lies within athe value of a given attribute lies within a specified range. e.g., where 313specified range. e.g., where 313 ≤≤ r.Ar.A < 786.< 786. yellow goes to graphics
  • 13. Data Warehousing 13 Round RobinRound Robin  AdvantagesAdvantages  Best suited for sequential scan of entireBest suited for sequential scan of entire relation on each query.relation on each query.  All disks have almost an equal number ofAll disks have almost an equal number of tuples; retrieval work is thus well balancedtuples; retrieval work is thus well balanced between disks.between disks.  Range queries are difficult to processRange queries are difficult to process  No clustering -- tuples are scattered acrossNo clustering -- tuples are scattered across all disksall disks Partitioning & QueriesPartitioning & Queries yellow goes to graphics
  • 14. Data Warehousing 14 Hash PartitioningHash Partitioning  Good for sequential accessGood for sequential access  With uniform hashing and using partitioning attributes asWith uniform hashing and using partitioning attributes as a key, tuples will be equally distributed between disks.a key, tuples will be equally distributed between disks.  Good for point queries on partitioning attributeGood for point queries on partitioning attribute  Can lookup single disk, leaving others available forCan lookup single disk, leaving others available for answering other queries.answering other queries.  Index on partitioning attribute can be local to disk, makingIndex on partitioning attribute can be local to disk, making lookup and update very efficient even joins.lookup and update very efficient even joins. • Range queries are difficult to processRange queries are difficult to process No clustering -- tuples are scattered across allNo clustering -- tuples are scattered across all disksdisks Partitioning & QueriesPartitioning & Queries yellow goes to graphics
  • 15. Data Warehousing 15 Range PartitioningRange Partitioning  Provides data clustering by partitioning attribute value.Provides data clustering by partitioning attribute value.  Good for sequential accessGood for sequential access  Good for point queries on partitioning attribute: only oneGood for point queries on partitioning attribute: only one disk needs to be accessed.disk needs to be accessed.  For range queries on partitioning attribute, one or a fewFor range queries on partitioning attribute, one or a few disks may need to be accesseddisks may need to be accessed − Remaining disks are available for other queries.Remaining disks are available for other queries. − Good if result tuples are from one to a few blocks.Good if result tuples are from one to a few blocks. − If many blocks are to be fetched, they are still fetched from one to aIf many blocks are to be fetched, they are still fetched from one to a few disks, then potential parallelism in disk access is wastedfew disks, then potential parallelism in disk access is wasted Partitioning & QueriesPartitioning & Queries yellow goes to graphics
  • 16. Data Warehousing 16 Parallel SortingParallel Sorting  Scan in parallel, and range partition on the go.Scan in parallel, and range partition on the go.  As partitioned data becomes available, performAs partitioned data becomes available, perform “local” sorting.“local” sorting.  Resulting data is sorted and again range partitioned.Resulting data is sorted and again range partitioned.  Problem:Problem: skew or “hot spot”.skew or “hot spot”.  Solution:Solution: Sample the data at start to determineSample the data at start to determine partition pointspartition points. data Processors 1 2 3 4 5 Hot spot P1 P2 P3 P4 P5 1 4 1 2 1
  • 17. Data Warehousing 17 Skew in PartitioningSkew in Partitioning  The distribution of tuples to disks may beThe distribution of tuples to disks may be skewedskewed  i.e. some disks have many tuples, while others may have fewer tuples.i.e. some disks have many tuples, while others may have fewer tuples.  Types of skew:Types of skew:  Attribute-value skew.Attribute-value skew.  Some values appear in the partitioning attributes of many tuples; allSome values appear in the partitioning attributes of many tuples; all the tuples with the same value for the partitioning attribute end up inthe tuples with the same value for the partitioning attribute end up in the same partition.the same partition.  Can occur with range-partitioning and hash-partitioning.Can occur with range-partitioning and hash-partitioning.  Partition skewPartition skew..  With range-partitioning, badly chosen partition vector may assignWith range-partitioning, badly chosen partition vector may assign too many tuples to some partitions and too few to others.too many tuples to some partitions and too few to others.  Less likely with hash-partitioning if a good hash-function is chosen.Less likely with hash-partitioning if a good hash-function is chosen. yellow goes to graphics
  • 18. Data Warehousing 18 Handling Skew in Range-PartitioningHandling Skew in Range-Partitioning  To create a balanced partitioning vectorTo create a balanced partitioning vector  SortSort the relation on the partitioning attribute.the relation on the partitioning attribute.  Construct the partition vectorConstruct the partition vector by scanning theby scanning the relation in sorted order as follows.relation in sorted order as follows.  After every 1/After every 1/nnthth of the relation has been read, the value ofof the relation has been read, the value of the partitioning attribute of the next tuple is added to thethe partitioning attribute of the next tuple is added to the partition vector.partition vector.  nn denotes the number of partitions to be constructed.denotes the number of partitions to be constructed.  Duplicate entries or imbalancesDuplicate entries or imbalances can result ifcan result if duplicates are present in partitioning attributes.duplicates are present in partitioning attributes. yellow goes to graphics
  • 19. Data Warehousing 19 Barriers to Linear Speedup & Scale-upBarriers to Linear Speedup & Scale-up  Amdahal’ LawAmdahal’ Law  StartupStartup  Time needed to start a large number of processors.Time needed to start a large number of processors.  Increase with increase in number of individual processors.Increase with increase in number of individual processors.  May also include time spent in opening files etc.May also include time spent in opening files etc.  InterferenceInterference  Slow down that each processor imposes on all others when sharing aSlow down that each processor imposes on all others when sharing a common pool of resources “(e.g. memory).common pool of resources “(e.g. memory).  SkewSkew  Variance dominating the mean.Variance dominating the mean.  Service time of the job is service time of its slowest components.Service time of the job is service time of its slowest components. yellow goes to graphics
  • 20. Data Warehousing 20 Comparison of Partitioning TechniquesComparison of Partitioning Techniques Shared disk/memory less sensitive to partitioning. Shared nothing can benefit from good partitioning. A…E F…J K…NO…S T…Z Range Good for equijoins, range queries, group-by clauses, can result in “hot spots”. UsersUsers A…E F…J K…NO…S T…Z Round Robin Good for load balancing, but impervious to nature of queries. UsersUsers A…E F…J K…NO…S T…Z Hash Good for equijoins, can results in uneven data distribution UsersUsers
  • 21. Data Warehousing 21 Parallel AggregatesParallel Aggregates For each aggregate function, need a decomposition: Count(S) = Σ count(s1) + Σ count(s2) + …. Average(S) = Σ Avg(s1) + Σ Avg(s2) + …. For groups: Distribute data using hashing. Sub aggregate groups close to the source. Pass each sub-aggregate to its group’s site. A…E F…J K…NO…S T…Z
  • 22. Data Warehousing 22  When to use Range Partitioning?When to use Range Partitioning?  When to Use Hash Partitioning?When to Use Hash Partitioning?  When to Use List Partitioning?When to Use List Partitioning?  When to use Round-Robin Partitioning?When to use Round-Robin Partitioning? When to use which partitioning Tech?When to use which partitioning Tech?
  • 23. Data Warehousing 23 Parallelism Goals and MetricsParallelism Goals and Metrics  Speedup: TheSpeedup: The GoodGood, The, The BadBad & The& The UglyUgly OldTime NewTimeSpeedup= Processors & Discs The ideal Speedup Curve Linearity  Scale-up:Scale-up:  Transactional Scale-up: Fit for OLTP systemsTransactional Scale-up: Fit for OLTP systems  Batch Scale-up: Fit for Data Warehouse and OLAPBatch Scale-up: Fit for Data Warehouse and OLAP Processors & Discs A Bad Speedup Curve Non-linear Min Parallelism Benefit Processors & Discs A Bad Speedup Curve 3-Factors Startup Interference Skew

Editor's Notes

  1. &amp;lt;number&amp;gt;
  2. &amp;lt;number&amp;gt;
  3. &amp;lt;number&amp;gt;
  4. &amp;lt;number&amp;gt;
  5. &amp;lt;number&amp;gt;
  6. &amp;lt;number&amp;gt;
  7. &amp;lt;number&amp;gt;
  8. &amp;lt;number&amp;gt;
  9. &amp;lt;number&amp;gt;
  10. &amp;lt;number&amp;gt;