SlideShare a Scribd company logo
1 of 25
DECOMPOSITION
TECHNIQUES
By : Mohamed Ramadan
Agenda
1. DecompositionTechniques
1. Data Decomposition Introduction
1. PartitionTypes
2. Examples on PartitioningTypes
2. Exploratory Decomposition
1. The 15-puzzle problem
2. Parallel vs serial
3. Speculative Decomposition
4. Hybrid Decompositions
2. Characteristics ofTasks and Interactions
1.Data Decomposition Introduction
Idea-partitioning of data leads to tasks
1. Powerful and commonly used method for deriving concurrency in algorithms that
operate on large data structures.
2. Decomposition of computations is done in two steps.
1. The data on which the computations are performed is partitioned
2. This data partitioning is used to induce a partitioning of the computations into
tasks.
3. The operations that these tasks perform on different data partitions are usually
similar
1.1 Partition Types
The partitioning of data can be performed in many possible ways,
this critically impacts performance of a parallel algorithm.
1. Output Data Partitioning
2. Input Data Partitioning
3. Partitioning Input and Output Data
4. Intermediate Data Partitioning
1.2.1 Output Data Partitioning
1. Often, each element of the output can be computed independently of others (but
simply as a function of the input).
2. A partition of the output across tasks decomposes the problem naturally.
1.2.1 Output Data Partitioning
Consider the problem of multiplying two n x n matrices A and B to yield matrix C.The
output matrix C can be partitioned into four tasks as follows:
1.2.1 Output Data Partitioning
A partitioning of output data does not result in a unique decomposition into tasks. For
example, for the same problem as in previous example, with identical output data
distribution, we can derive the following two (other) decompositions:
1.2.1 Output Data Partitioning
Consider the problem of counting the instances of given itemset in a database of
transactions. In this case, the output (itemset frequencies) can be partitioned across
tasks.
1.2.1 Output Data Partitioning
From the previous example, the following observations can be made:
• If the database of transactions is replicated across the processes, each task can be
independently accomplished with no communication.
• If the database is partitioned across processes as well (for reasons of memory
utilization), each task first computes partial counts.These counts are then aggregated
at the appropriate task.
1.2.2 Input Data Partitioning
• Divide input into groups
• One task per group
• Get intermediate results
• Create one task to combine intermediate results
• Can partition input also partition input and output
Top-partition input Bottom-partition input and output
1.2.3 Partitioning of Intermediate Data
• Computation can often be viewed as a sequence of transformation from the input to
the output data.
• Good for multi-stage algorithms
Multi-stage computations such that the output of one stage is the input to the subsequent stage.
• In previous example we have a maximum degree of concurrency of four
• We can increase the degree of concurrency by introducing an intermediate stage in
which eight tasks compute their respective product submatrices and store the results in a temporary
three-dimensional matrix D,The submatrix Dk,i,j is the product of Ai,k and Bk,j.
• Dk,i,j = Ai,k * Bk,j
Concurrency Picture
• Max concurrency of 8
• Max concurrency of 4 for output partition ( concurrency degree = 12 tasks )
• Price is storage for D
Concurrency Picture
• The Owner-Computes Rule A decomposition based on partitioning output or input data is also widely
referred to as the owner-computes rule.
• The idea behind this rule is that each partition performs all the computations involving data that it
owns. Depending on the nature of the data or the type of data-partitioning, the owner-computes rule
may mean different things. For instance, when we assign partitions of
• input data to tasks, then the owner-computes rule means that a task performs all the computations
that can be done using these data.
• On the other hand, if we partition the output data, then the owner-computes rule means that a task
computes all the data in the partition assigned to it.
2. Exploratory Decomposition
• For search space type problems
• Partition search space into small parts
• Look for solution in each part
• Include a variety of discrete optimization problems
2.1 Exploratory Decomposition
• Search Space ProblemThe 15 puzzle
• The state space can be explored by generating various successor states of the current state and to view
them as independent tasks.
2.2 Parallel vs serial
It depends on where you find the answer
• The work performed by the parallel formulation can be either smaller or greater than that
performed by the serial algorithm. For example, consider a search space that has been
partitioned into four concurrent tasks as shown
3. Speculative Decomposition
• Takes path before it knows result
• Win big or waste
• While one task is performing the computation whose output is used in deciding the next computation,
other tasks can concurrently start the computations of the next stage.This scenario is similar to
evaluating one or more of the branches of a switch statement in C in parallel before the input for the
switch is available.
3. Speculative Decomposition
Exmple:
Parallel discrete event simulation
Idea: Compute results at c,d,e before output from a is known
4. Hybrid
• Sometimes better to put two ideas together
• Quicksort - Recursion results in O(n) tasks, little concurrency.
• First decompose, then recurse
2. Characteristics of Tasks and Interactions
• Once a problem has been decomposed into independent tasks, the characteristics of these tasks
critically impact choice and performance of parallel algorithms. Relevant task characteristics include:
• Task Generation
• Task Sizes
• Knowledge ofTask Sizes
• Size of Data Associated withTasks

More Related Content

What's hot

Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 ReliabilityAli Usman
 
All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations Syed Zaid Irshad
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactionsNilu Desai
 
Transaction states and properties
Transaction states and propertiesTransaction states and properties
Transaction states and propertiesChetan Mahawar
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applicationsBurhan Ahmed
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...sumithragunasekaran
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating SystemsRitu Ranjan Shrivastwa
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithmRicha Kumari
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastRashiJoshi11
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)swapnac12
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Meghaj Mallick
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques Kalhan Liyanage
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed ComputingRicha Singh
 

What's hot (20)

Lec 7 query processing
Lec 7 query processingLec 7 query processing
Lec 7 query processing
 
Database , 12 Reliability
Database , 12 ReliabilityDatabase , 12 Reliability
Database , 12 Reliability
 
All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations All-Reduce and Prefix-Sum Operations
All-Reduce and Prefix-Sum Operations
 
management of distributed transactions
management of distributed transactionsmanagement of distributed transactions
management of distributed transactions
 
Transaction states and properties
Transaction states and propertiesTransaction states and properties
Transaction states and properties
 
Parallel computing and its applications
Parallel computing and its applicationsParallel computing and its applications
Parallel computing and its applications
 
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
Terminologies Used In Big data Environments,G.Sumithra,II-M.sc(computer scien...
 
Ddbms1
Ddbms1Ddbms1
Ddbms1
 
serializability in dbms
serializability in dbmsserializability in dbms
serializability in dbms
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Parallel searching
Parallel searchingParallel searching
Parallel searching
 
Distributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query ProcessingDistributed DBMS - Unit 6 - Query Processing
Distributed DBMS - Unit 6 - Query Processing
 
Parallel sorting algorithm
Parallel sorting algorithmParallel sorting algorithm
Parallel sorting algorithm
 
Basic communication operations - One to all Broadcast
Basic communication operations - One to all BroadcastBasic communication operations - One to all Broadcast
Basic communication operations - One to all Broadcast
 
Distributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data ControlDistributed DBMS - Unit 5 - Semantic Data Control
Distributed DBMS - Unit 5 - Semantic Data Control
 
Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)Performance analysis(Time & Space Complexity)
Performance analysis(Time & Space Complexity)
 
Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.Concurrency Control in Distributed Database.
Concurrency Control in Distributed Database.
 
database recovery techniques
database recovery techniques database recovery techniques
database recovery techniques
 
Load Balancing In Distributed Computing
Load Balancing In Distributed ComputingLoad Balancing In Distributed Computing
Load Balancing In Distributed Computing
 

Similar to Data decomposition techniques

Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel ComputingAkhila Prabhakaran
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm designDenisAkbar1
 
Analysis of Algorithm II Unit version .pptx
Analysis of Algorithm  II Unit version .pptxAnalysis of Algorithm  II Unit version .pptx
Analysis of Algorithm II Unit version .pptxrajesshs31r
 
Algorithm Using Divide And Conquer
Algorithm Using Divide And ConquerAlgorithm Using Divide And Conquer
Algorithm Using Divide And ConquerUrviBhalani2
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterHarsh Kevadia
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.pptArumugam90
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptRubenGabrielHernande
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduceM Baddar
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptxShimoFcis
 
A load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningA load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningLavanya Vigrahala
 

Similar to Data decomposition techniques (20)

Unit-3.ppt
Unit-3.pptUnit-3.ppt
Unit-3.ppt
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
Chapter 3 principles of parallel algorithm design
Chapter 3   principles of parallel algorithm designChapter 3   principles of parallel algorithm design
Chapter 3 principles of parallel algorithm design
 
Analysis of Algorithm II Unit version .pptx
Analysis of Algorithm  II Unit version .pptxAnalysis of Algorithm  II Unit version .pptx
Analysis of Algorithm II Unit version .pptx
 
Algorithm Using Divide And Conquer
Algorithm Using Divide And ConquerAlgorithm Using Divide And Conquer
Algorithm Using Divide And Conquer
 
Daa unit 2
Daa unit 2Daa unit 2
Daa unit 2
 
Daa unit 2
Daa unit 2Daa unit 2
Daa unit 2
 
Simplified Data Processing On Large Cluster
Simplified Data Processing On Large ClusterSimplified Data Processing On Large Cluster
Simplified Data Processing On Large Cluster
 
FrackingPaper
FrackingPaperFrackingPaper
FrackingPaper
 
CS3114_09212011.ppt
CS3114_09212011.pptCS3114_09212011.ppt
CS3114_09212011.ppt
 
SecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.pptSecondPresentationDesigning_Parallel_Programs.ppt
SecondPresentationDesigning_Parallel_Programs.ppt
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
 
mapreduce.pptx
mapreduce.pptxmapreduce.pptx
mapreduce.pptx
 
Lec1
Lec1Lec1
Lec1
 
A load balancing model based on cloud partitioning
A load balancing model based on cloud partitioningA load balancing model based on cloud partitioning
A load balancing model based on cloud partitioning
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Data decomposition techniques

  • 2. Agenda 1. DecompositionTechniques 1. Data Decomposition Introduction 1. PartitionTypes 2. Examples on PartitioningTypes 2. Exploratory Decomposition 1. The 15-puzzle problem 2. Parallel vs serial 3. Speculative Decomposition 4. Hybrid Decompositions 2. Characteristics ofTasks and Interactions
  • 3. 1.Data Decomposition Introduction Idea-partitioning of data leads to tasks 1. Powerful and commonly used method for deriving concurrency in algorithms that operate on large data structures. 2. Decomposition of computations is done in two steps. 1. The data on which the computations are performed is partitioned 2. This data partitioning is used to induce a partitioning of the computations into tasks. 3. The operations that these tasks perform on different data partitions are usually similar
  • 4. 1.1 Partition Types The partitioning of data can be performed in many possible ways, this critically impacts performance of a parallel algorithm. 1. Output Data Partitioning 2. Input Data Partitioning 3. Partitioning Input and Output Data 4. Intermediate Data Partitioning
  • 5. 1.2.1 Output Data Partitioning 1. Often, each element of the output can be computed independently of others (but simply as a function of the input). 2. A partition of the output across tasks decomposes the problem naturally.
  • 6. 1.2.1 Output Data Partitioning Consider the problem of multiplying two n x n matrices A and B to yield matrix C.The output matrix C can be partitioned into four tasks as follows:
  • 7. 1.2.1 Output Data Partitioning A partitioning of output data does not result in a unique decomposition into tasks. For example, for the same problem as in previous example, with identical output data distribution, we can derive the following two (other) decompositions:
  • 8. 1.2.1 Output Data Partitioning Consider the problem of counting the instances of given itemset in a database of transactions. In this case, the output (itemset frequencies) can be partitioned across tasks.
  • 9.
  • 10. 1.2.1 Output Data Partitioning From the previous example, the following observations can be made: • If the database of transactions is replicated across the processes, each task can be independently accomplished with no communication. • If the database is partitioned across processes as well (for reasons of memory utilization), each task first computes partial counts.These counts are then aggregated at the appropriate task.
  • 11. 1.2.2 Input Data Partitioning • Divide input into groups • One task per group • Get intermediate results • Create one task to combine intermediate results • Can partition input also partition input and output
  • 13. 1.2.3 Partitioning of Intermediate Data • Computation can often be viewed as a sequence of transformation from the input to the output data. • Good for multi-stage algorithms Multi-stage computations such that the output of one stage is the input to the subsequent stage. • In previous example we have a maximum degree of concurrency of four • We can increase the degree of concurrency by introducing an intermediate stage in which eight tasks compute their respective product submatrices and store the results in a temporary three-dimensional matrix D,The submatrix Dk,i,j is the product of Ai,k and Bk,j. • Dk,i,j = Ai,k * Bk,j
  • 14.
  • 15.
  • 16. Concurrency Picture • Max concurrency of 8 • Max concurrency of 4 for output partition ( concurrency degree = 12 tasks ) • Price is storage for D
  • 17. Concurrency Picture • The Owner-Computes Rule A decomposition based on partitioning output or input data is also widely referred to as the owner-computes rule. • The idea behind this rule is that each partition performs all the computations involving data that it owns. Depending on the nature of the data or the type of data-partitioning, the owner-computes rule may mean different things. For instance, when we assign partitions of • input data to tasks, then the owner-computes rule means that a task performs all the computations that can be done using these data. • On the other hand, if we partition the output data, then the owner-computes rule means that a task computes all the data in the partition assigned to it.
  • 18. 2. Exploratory Decomposition • For search space type problems • Partition search space into small parts • Look for solution in each part • Include a variety of discrete optimization problems
  • 19. 2.1 Exploratory Decomposition • Search Space ProblemThe 15 puzzle
  • 20. • The state space can be explored by generating various successor states of the current state and to view them as independent tasks.
  • 21. 2.2 Parallel vs serial It depends on where you find the answer • The work performed by the parallel formulation can be either smaller or greater than that performed by the serial algorithm. For example, consider a search space that has been partitioned into four concurrent tasks as shown
  • 22. 3. Speculative Decomposition • Takes path before it knows result • Win big or waste • While one task is performing the computation whose output is used in deciding the next computation, other tasks can concurrently start the computations of the next stage.This scenario is similar to evaluating one or more of the branches of a switch statement in C in parallel before the input for the switch is available.
  • 23. 3. Speculative Decomposition Exmple: Parallel discrete event simulation Idea: Compute results at c,d,e before output from a is known
  • 24. 4. Hybrid • Sometimes better to put two ideas together • Quicksort - Recursion results in O(n) tasks, little concurrency. • First decompose, then recurse
  • 25. 2. Characteristics of Tasks and Interactions • Once a problem has been decomposed into independent tasks, the characteristics of these tasks critically impact choice and performance of parallel algorithms. Relevant task characteristics include: • Task Generation • Task Sizes • Knowledge ofTask Sizes • Size of Data Associated withTasks