SlideShare a Scribd company logo
1 of 65
Search Algorithms for Discrete Optimization
Problems
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
To accompany the text ``Introduction to Parallel Computing'',
Addison Wesley, 2003.
Topic Overview
• Discrete Optimization - Basics
• Sequential Search Algorithms
• Parallel Depth-First Search
• Parallel Best-First Search
• Speedup Anomalies in Parallel Search Algorithms
Discrete Optimization - Basics
• Discrete optimization forms a class of computationally
expensive problems of significant theoretical and
practical interest.
• Search algorithms systematically search the space of
possible solutions subject to constraints.
Definitions
• A discrete optimization problem can be expressed as a
tuple (S, f). The set is a finite or countably infinite set of
all solutions that satisfy specified constraints.
• The function f is the cost function that maps each
element in set S onto the set of real numbers R.
• The objective of a DOP is to find a feasible solution xopt,
such that f(xopt) ≤ f(x) for all x  S.
• A number of diverse problems such as VLSI layouts,
robot motion planning, test pattern generation, and
facility location can be formulated as DOPs.
Discrete Optimization: Example
• In the 0/1 integer-linear-programming problem, we are given
an m×n matrix A, an m×1 vector b, and an n×1 vector c.
• The objective is to determine an n×1 vector whose
elements can take on only the value 0 or 1.
• The vector must satisfy the constraint
and the function
must be minimized.
Discrete Optimization: Example
• The 8-puzzle problem consists of a 3×3 grid containing
eight tiles, numbered one through eight.
• One of the grid segments (called the ``blank'') is empty. A
tile can be moved into the blank position from a position
adjacent to it, thus creating a blank in the tile's original
position.
• The goal is to move from a given initial position to the final
position in a minimum number of moves.
Discrete Optimization: Example
An 8-puzzle problem instance: (a) initial configuration; (b) final
configuration; and (c) a sequence of moves leading from the
initial to the final configuration.
Discrete Optimization Basics
• The feasible space S is typically very large.
• For this reason, a DOP can be reformulated as the problem
of finding a minimum-cost path in a graph from a designated
initial node to one of several possible goal nodes.
• Each element x in S can be viewed as a path from the initial
node to one of the goal nodes.
• This graph is called a state space.
Discrete Optimization Basics
• Often, it is possible to estimate the cost to reach the goal
state from an intermediate state.
• This estimate, called a heuristic estimate, can be effective in
guiding search to the solution.
• If the estimate is guaranteed to be an underestimate, the
heuristic is called an admissible heuristic.
• Admissible heuristics have desirable properties in terms of
optimality of solution (as we shall see later).
Discrete Optimization: Example
An admissible heuristic for 8-puzzle is as follows:
• Assume that each position in the 8-puzzle grid is
represented as a pair.
• The distance between positions (i,j) and (k,l) is defined as
|i - k| + |j - l|. This distance is called the Manhattan distance.
• The sum of the Manhattan distances between the initial and
final positions of all tiles is an admissible heuristic.
Parallel Discrete Optimization: Motivation
• DOPs are generally NP-hard problems. Does parallelism
really help much?
• For many problems, the average-case runtime is polynomial.
• Often, we can find suboptimal solutions in polynomial time.
• Many problems have smaller state spaces but require real-
time solutions.
• For some other problems, an improvement in objective
function is highly desirable, irrespective of time.
Sequential Search Algorithms
• Is the search space a tree or a graph?
• The space of a 0/1 integer program is a tree, while that of
an 8-puzzle is a graph.
• This has important implications for search since unfolding a
graph into a tree can have significant overheads.
Sequential Search Algorithms
Two examples of unfolding a graph into a tree.
Depth-First Search Algorithms
• Applies to search spaces that are trees.
• DFS begins by expanding the initial node and generating its
successors. In each subsequent step, DFS expands one of
the most recently generated nodes.
• If there exists no success, DFS backtracks to the parent and
explores an alternate child.
• Often, successors of a node are ordered based on their
likelihood of reaching a solution. This is called directed DFS.
• The main advantage of DFS is that its storage requirement
is linear in the depth of the state space being searched.
Depth-First Search Algorithms
States resulting from the first three steps of depth-first
search applied to an instance of the 8-puzzle.
DFS Algorithms: Simple Backtracking
• Simple backtracking performs DFS until it finds the first
feasible solution and terminates.
• Not guaranteed to find a minimum-cost solution.
• Uses no heuristic information to order the successors of
an expanded node.
• Ordered backtracking uses heuristics to order the
successors of an expanded node.
Depth-First Branch-and-Bound (DFBB)
• DFS technique in which upon finding a solution, the
algorithm updates current best solution.
• DFBB does not explore paths that are guaranteed to
lead to solutions worse than current best solution.
• On termination, the current best solution is a globally
optimal solution.
Iterative Deepening Search
• Often, the solution may exist close to the root, but on an
alternate branch.
• Simple backtracking might explore a large space before
finding this.
• Iterative deepening sets a depth bound on the space it
searches (using DFS).
• If no solution is found, the bound is increased and the
process repeated.
Iterative Deepening A* (IDA*)
• Uses a bound on the cost of the path as opposed to the
depth.
• IDA* defines a function for node x in the search space
as l(x) = g(x) + h(x). Here, g(x) is the cost of getting to the
node and h(x) is a heuristic estimate of the cost of getting
from the node to the solution.
• At each failed step, the cost bound is incremented to that
of the node that exceeded the prior cost bound by the
least amount.
• If the heuristic h is admissible, the solution found by IDA*
is optimal.
DFS Storage Requirements and Data Structures
• At each step of DFS, untried alternatives must be stored
for backtracking.
• If m is the amount of storage required to store a state, and
d is the maximum depth, then the total space requirement
of the DFS algorithm is O(md).
• The state-space tree searched by parallel DFS can be
efficiently represented as a stack.
• Memory requirement of the stack is linear in depth of tree.
DFS Storage Requirements and Data Structures
Representing a DFS tree: (a) the DFS tree; Successor nodes shown
with dashed lines have already been explored; (b) the stack storing
untried alternatives only; and (c) the stack storing untried
alternatives along with their parent. The shaded blocks represent the
parent state and the block to the right represents successor states
that have not been explored.
Best-First Search (BFS) Algorithms
• BFS algorithms use a heuristic to guide search.
• The core data structure is a list, called Open list, that stores
unexplored nodes sorted on their heuristic estimates.
• The best node is selected from the list, expanded, and its off-
spring are inserted at the right position.
• If the heuristic is admissible, the BFS finds the optimal
solution.
Best-First Search (BFS) Algorithms
• BFS of graphs must be slightly modified to account for
multiple paths to the same node.
• A closed list stores all the nodes that have been previously
seen.
• If a newly expanded node exists in the open or closed lists
with better heuristic value, the node is not inserted into the
open list.
The A* Algorithm
• A BFS technique that uses admissible heuristics.
• Defines function l(x) for each node x as g(x) + h(x).
• Here, g(x) is the cost of getting to node x and h(x) is an
admissible heuristic estimate of getting from node x to
the solution.
• The open list is sorted on l(x).
The space requirement of BFS is exponential in depth!
Best-First Search: Example
Applying best-first search to the 8-puzzle: (a) initial configuration; (b)
final configuration; and (c) states resulting from the first four steps of
best-first search. Each state is labeled with its -value (that is, the
Manhattan distance from the state to the final state).
Search Overhead Factor
• The amount of work done by serial and parallel formulations
of search algorithms is often different.
• Let W be serial work and WP be parallel work. Search
overhead factor s is defined as WP/W.
• Upper bound on speedup is p×(W/WP).
Parallel Depth-First Search
• How is the search space partitioned across processors?
• Different subtrees can be searched concurrently.
• However, subtrees can be very different in size.
• It is difficult to estimate the size of a subtree rooted at a
node.
• Dynamic load balancing is required.
Parallel Depth-First Search
The unstructured nature of tree search and the imbalance
resulting from static partitioning.
Parallel Depth-First Search: Dynamic Load Balancing
• When a processor runs out of work, it gets more work
from another processor.
• This is done using work requests and responses in
message passing machines and locking and extracting
work in shared address space machines.
• On reaching final state at a processor, all processors
terminate.
• Unexplored states can be conveniently stored as local
stacks at processors.
• The entire space is assigned to one processor to begin
with.
Parallel Depth-First Search: Dynamic Load Balancing
A generic scheme for dynamic load balancing.
Parameters in Parallel DFS: Work Splitting
• Work is split by splitting the stack into two.
• Ideally, we do not want either of the split pieces to be small.
• Select nodes near the bottom of the stack (node splitting), or
• Select some nodes from each level (stack splitting).
• The second strategy generally yields a more even split of
the space.
Parameters in Parallel DFS: Work Splitting
Splitting the DFS tree: the two subtrees along with their
stack representations are shown in (a) and (b).
Load-Balancing Schemes
• Who do you request work from? Note that we would like
to distribute work requests evenly, in a global sense.
• Asynchronous round robin: Each processor maintains a
counter and makes requests in a round-robin fashion.
• Global round robin: The system maintains a global
counter and requests are made in a round-robin fashion,
globally.
• Random polling: Request a randomly selected processor
for work.
Analyzing DFS
• We can’t compute, analytically, the serial work W or parallel
time. Instead, we quantify total overhead To in terms of W to
compute scalability.
• For dynamic load balancing, idling time is subsumed by
communication.
• We must quantify the total number of requests in the system.
Analyzing DFS: Assumptions
• Work at any processor can be partitioned into independent
pieces as long as its size exceeds a threshold ε.
• A reasonable work-splitting mechanism is available.
• If work w at a processor is split into two parts ψw and (1 -
ψ)w, there exists an arbitrarily small constant α (0 < α ≤ 0.5),
such that ψw > αw and (1 - ψ)w > αw.
• The costant α sets a lower bound on the load imbalance
from work splitting.
Analyzing DFS
• If processor Pi initially had work wi, after a single request by
processor Pj and split, neither Pi nor Pj have more than (1 - α)wi
work.
• For each load balancing strategy, we define V(P) as the total
number of work requests after which each processor receives at
least one work request (note that V(p) ≥ p).
• Assume that the largest piece of work at any point is W.
• After V(p) requests, the maximum work remaining at any
processor is less than (1 - α)W; after 2V(p) requests, it is less
than (1 - α)2W.
• After (log1/1(1- α )(W/ε))V(p) requests, the maximum work remaining
at any processor is below a threshold value ε.
• The total number of work requests is O(V(p)log W).
Analyzing DFS
• If tcomm is the time required to communicate a piece of
work, then the communication overhead is given by
T0 = tcommV(p)log W (1)
The corresponding efficiency E is given by
Analyzing DFS: for Various Schemes
• Asynchronous Round Robin: V(p) = O(p2) in the worst
case.
• Global Round Robin: V(p) = p.
• Random Polling: Worst case V(p) is unbounded. We do
average case analysis.
for Random Polling
• Let F(i,p) represent a state in which i of the processors
have been requested, and p - i have not.
• Let f(i,p) denote the average number of trials needed to
change from state F(i,p) to F(p,p) (V(p) = f(0,p)).
for Random Polling
• We have:
• As p becomes large, Hp ≃ 1.69ln p (where ln p denotes the
natural logarithm of p ). Thus, V(p) = O(plog p).
,
1
),0(
1
0


 

p
i ip
ppf
,
1
1


p
i i
p
,pHp
Analysis of Load-Balancing Schemes
If tcomm = O(1) , we have,
T0 = O(V(p)log W). (2)
• Asynchronous Round Robin: Since V(p) = O(p2), T0 =
O(p2log w). It follows that:
W = O(p2log(p2log W)),
= O(p2log p + p2log log W)
= O(p2log p)
Analysis of Load-Balancing Schemes
• Global Round Robin: Since V(p) = O(p), T0 = O(plog W). It
follows that W = O(plog p).
However, there is contention here! The global counter
must be incremented O(plog W) times in O(W/p) time.
From this, we have:
(3)
and W = O(p2log p).
The worse of these two expressions, W = O(p2log p) is the
isoefficiency.
)log( WpO
p
W

Analysis of Load-Balancing Schemes
• Random Polling: We have V(p) = O(plog p),
To = O(plog plogW)
Therefore W = O(plog2p).
Analysis of Load-Balancing Schemes: Conclusions
• Asynchronous round robin has poor performance
because it makes a large number of work requests.
• Global round robin has poor performance because of
contention at counter, although it makes the least
number of requests.
• Random polling strikes a desirable compromise.
Experimental Validation: Satisfiability Problem
Speedups of parallel DFS using ARR, GRR and RP load-balancing
schemes.
Experimental Validation: Satisfiability Problem
Number of work requests generated for RP and GRR and their
expected values ( and respectively).
Experimental Validation: Satisfiability Problem
Experimental isoefficiency curves for RP for different efficiencies.
Termination Detection
• How do you know when everyone's done?
• A number of algorithms have been proposed.
Dijkstra's Token Termination Detection
• Assume that all processors are organized in a logical
ring.
• Assume, for now that work transfers can only happen
from Pi to Pj if j > i.
• Processor P0 initiates a token on the ring when it goes
idle.
• Each intermediate processor receives this token and
forwards it when it becomes idle.
• When the token reaches processor P0, all processors are
done.
Dijkstra's Token Termination Detection
Now, let us do away with the restriction on work transfers.
• When processor P0 goes idle, it colors itself green and initiates a
green token.
• If processor Pj sends work to processor Pi and j > i then
processor Pj becomes red.
• If processor Pi has the token and Pi is idle, it passes the token to
Pi+1. If Pi is red, then the color of the token is set to red before it
is sent to Pi+1. If Pi is green, the token is passed unchanged.
• After Pi passes the token to Pi+1, Pi becomes green .
• The algorithm terminates when processor P0 receives a green
token and is itself idle.
Tree-Based Termination Detection
• Associate weights with individual workpieces. Initially,
processor P0 has all the work and a weight of one.
• Whenever work is partitioned, the weight is split into half
and sent with the work.
• When a processor gets done with its work, it sends its
parent the weight back.
• Termination is signaled when the weight at processor P0
becomes 1 again.
• Note that underflow and finite precision are important
factors associated with this scheme.
Tree-Based Termination Detection
Tree-based termination detection. Steps 1-6 illustrate the weights
at various processors after each work transfer
Parallel Formulations of Depth-First Branch-and-Bound
• Parallel formulations of depth-first branch-and-bound
search (DFBB) are similar to those of DFS.
• Each processor has a copy of the current best solution.
This is used as a local bound.
• If a processor detects another solution, it compares the
cost with current best solution. If the cost is better, it
broadcasts this cost to all processors.
• If a processor's current best solution path is worse than
the globally best solution path, only the efficiency of the
search is affected, not its correctness.
Parallel Formulations of IDA*
Two formulations are intuitive.
• Common Cost Bound: Each processor is given the same
cost bound. Processors use parallel DFS on the tree
within the cost bound. The drawback of this scheme is
that there might not be enough concurrency.
• Variable Cost Bound: Each processor works on a
different cost bound. The major drawback here is that a
solution is not guaranteed to be optimal until all lower
cost bounds have been exhausted.
In each case, parallel DFS is the search kernel.
Parallel Best-First Search
• The core data structure is the Open list (typically implemented as
a priority queue).
• Each processor locks this queue, extracts the best node, unlocks
it.
• Successors of the node are generated, their heuristic functions
estimated, and the nodes inserted into the open list as necessary
after appropriate locking.
• Termination signaled when we find a solution whose cost is
better than the best heuristic value in the open list.
• Since we expand more than one node at a time, we may expand
nodes that would not be expanded by a sequential algorithm.
Parallel Best-First Search
A general schematic for parallel best-first search using a
centralized strategy. The locking operation is used here to
serialize queue access by various processors.
Parallel Best-First Search
• The open list is a point of contention.
• Let texp be the average time to expand a single node, and
taccess be the average time to access the open list for a
single-node expansion.
• If there are n nodes to be expanded by both the sequential
and parallel formulations (assuming that they do an equal
amount of work), then the sequential run time is given by
n(taccess+ texp).
• The parallel run time will be at least ntaccess.
• Upper bound on the speedup is (taccess+ texp)/taccess
Parallel Best-First Search
• Avoid contention by having multiple open lists.
• Initially, the search space is statically divided across these
open lists.
• Processors concurrently operate on these open lists.
• Since the heuristic values of nodes in these lists may
diverge significantly, we must periodically balance the
quality of nodes in each list.
• A number of balancing strategies based on ring,
blackboard, or random communications are possible.
Parallel Best-First Search
A message-passing implementation of parallel best-first search
using the ring communication strategy.
Parallel Best-First Search
An implementation of parallel best-first search using the
blackboard communication strategy.
Parallel Best-First Graph Search
• Graph search involves a closed list, where the major operation is
a lookup (on a key corresponding to the state).
• The classic data structure is a hash.
• Hashing can be parallelized by using two functions - the first one
hashes each node to a processor, and the second one hashes
within the processor.
• This strategy can be combined with the idea of multiple open
lists.
• If a node does not exist in a closed list, it is inserted into the
open list at the target of the first hash function.
• In addition to facilitating lookup, randomization also equalizes
quality of nodes in various open lists.
Speedup Anomalies in Parallel Search
• Since the search space explored by processors is
determined dynamically at runtime, the actual work might
vary significantly.
• Executions yielding speedups greater than p by using p
processors are referred to as acceleration anomalies.
Speedups of less than p using p processors are called
deceleration anomalies.
• Speedup anomalies also manifest themselves in best-first
search algorithms.
• If the heuristic function is good, the work done in parallel
best-first search is typically more than that in its serial
counterpart.
Speedup Anomalies in Parallel Search
The difference in number of nodes searched by sequential and
parallel formulations of DFS. For this example, parallel DFS
reaches a goal node after searching fewer nodes than
sequential DFS.
Speedup Anomalies in Parallel Search
A parallel DFS formulation that searches more nodes than its
sequential counterpart

More Related Content

What's hot

Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsDanish Javed
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updatedVajira Thambawita
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machinesSyed Zaid Irshad
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithmiosrjce
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithmsguest084d20
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computingVajira Thambawita
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepikaguest1f4fb3
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmTarikuDabala1
 

What's hot (20)

Chap2 slides
Chap2 slidesChap2 slides
Chap2 slides
 
Chap3 slides
Chap3 slidesChap3 slides
Chap3 slides
 
Chap6 slides
Chap6 slidesChap6 slides
Chap6 slides
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
 
Chap1 slides
Chap1 slidesChap1 slides
Chap1 slides
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Parallel Algorithms
Parallel AlgorithmsParallel Algorithms
Parallel Algorithms
 
Lecture 4 principles of parallel algorithm design updated
Lecture 4   principles of parallel algorithm design updatedLecture 4   principles of parallel algorithm design updated
Lecture 4 principles of parallel algorithm design updated
 
Communication costs in parallel machines
Communication costs in parallel machinesCommunication costs in parallel machines
Communication costs in parallel machines
 
Chapter 4 pc
Chapter 4 pcChapter 4 pc
Chapter 4 pc
 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
 
Scheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic AlgorithmScheduling Using Multi Objective Genetic Algorithm
Scheduling Using Multi Objective Genetic Algorithm
 
Parallel algorithms
Parallel algorithmsParallel algorithms
Parallel algorithms
 
Lecture 2 more about parallel computing
Lecture 2   more about parallel computingLecture 2   more about parallel computing
Lecture 2 more about parallel computing
 
PRAM algorithms from deepika
PRAM algorithms from deepikaPRAM algorithms from deepika
PRAM algorithms from deepika
 
Cupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithmCupdf.com introduction to-data-structures-and-algorithm
Cupdf.com introduction to-data-structures-and-algorithm
 
Solution(1)
Solution(1)Solution(1)
Solution(1)
 
Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)Grds conferences icst and icbelsh (5)
Grds conferences icst and icbelsh (5)
 
Hadoop map reduce concepts
Hadoop map reduce conceptsHadoop map reduce concepts
Hadoop map reduce concepts
 

Similar to Chap11 slides

Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimizationSally Salem
 
PPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptxPPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptxRaviKiranVarma4
 
uninformed search part 1.pptx
uninformed search part 1.pptxuninformed search part 1.pptx
uninformed search part 1.pptxMUZAMILALI48
 
uninformed search part 2.pptx
uninformed search part 2.pptxuninformed search part 2.pptx
uninformed search part 2.pptxMUZAMILALI48
 
Heuristic Searching Algorithms Artificial Intelligence.pptx
Heuristic Searching Algorithms Artificial Intelligence.pptxHeuristic Searching Algorithms Artificial Intelligence.pptx
Heuristic Searching Algorithms Artificial Intelligence.pptxSwagat Praharaj
 
09_Informed_Search.ppt
09_Informed_Search.ppt09_Informed_Search.ppt
09_Informed_Search.pptrnyau
 
Chap 4 local_search
Chap 4 local_search Chap 4 local_search
Chap 4 local_search Rakhi Gupta
 
Uninformed search /Blind search in AI
Uninformed search /Blind search in AIUninformed search /Blind search in AI
Uninformed search /Blind search in AIKirti Verma
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesHema Kashyap
 

Similar to Chap11 slides (20)

Search algorithms for discrete optimization
Search algorithms for discrete optimizationSearch algorithms for discrete optimization
Search algorithms for discrete optimization
 
NEW-II.pptx
NEW-II.pptxNEW-II.pptx
NEW-II.pptx
 
PPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptxPPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptx
PPT ON INTRODUCTION TO AI- UNIT-1-PART-2.pptx
 
NEW-II.pptx
NEW-II.pptxNEW-II.pptx
NEW-II.pptx
 
AI(Module1).pptx
AI(Module1).pptxAI(Module1).pptx
AI(Module1).pptx
 
uninformed search part 1.pptx
uninformed search part 1.pptxuninformed search part 1.pptx
uninformed search part 1.pptx
 
3.informed search
3.informed search3.informed search
3.informed search
 
uninformed search part 2.pptx
uninformed search part 2.pptxuninformed search part 2.pptx
uninformed search part 2.pptx
 
Heuristic Searching Algorithms Artificial Intelligence.pptx
Heuristic Searching Algorithms Artificial Intelligence.pptxHeuristic Searching Algorithms Artificial Intelligence.pptx
Heuristic Searching Algorithms Artificial Intelligence.pptx
 
09_Informed_Search.ppt
09_Informed_Search.ppt09_Informed_Search.ppt
09_Informed_Search.ppt
 
Chap 4 local_search
Chap 4 local_search Chap 4 local_search
Chap 4 local_search
 
Chapter 3.pptx
Chapter 3.pptxChapter 3.pptx
Chapter 3.pptx
 
Branch and Bound.pptx
Branch and Bound.pptxBranch and Bound.pptx
Branch and Bound.pptx
 
Searching techniques
Searching techniquesSearching techniques
Searching techniques
 
Uninformed search /Blind search in AI
Uninformed search /Blind search in AIUninformed search /Blind search in AI
Uninformed search /Blind search in AI
 
l2.pptx
l2.pptxl2.pptx
l2.pptx
 
Lecture 08 uninformed search techniques
Lecture 08 uninformed search techniquesLecture 08 uninformed search techniques
Lecture 08 uninformed search techniques
 
Ai popular search algorithms
Ai   popular search algorithmsAi   popular search algorithms
Ai popular search algorithms
 
2.uninformed search
2.uninformed search2.uninformed search
2.uninformed search
 
c4.pptx
c4.pptxc4.pptx
c4.pptx
 

More from BaliThorat1

Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereoBaliThorat1
 
Lec13 stereo converted
Lec13 stereo convertedLec13 stereo converted
Lec13 stereo convertedBaliThorat1
 
Lec11 single view-converted
Lec11 single view-convertedLec11 single view-converted
Lec11 single view-convertedBaliThorat1
 
8 operating system concept
8 operating system concept8 operating system concept
8 operating system conceptBaliThorat1
 
6 input output devices
6 input output devices6 input output devices
6 input output devicesBaliThorat1
 
2 windows operating system
2 windows operating system2 windows operating system
2 windows operating systemBaliThorat1
 
5 computer memory
5 computer memory5 computer memory
5 computer memoryBaliThorat1
 
4 computer languages
4 computer languages4 computer languages
4 computer languagesBaliThorat1
 
1 fundamentals of computer
1 fundamentals of computer1 fundamentals of computer
1 fundamentals of computerBaliThorat1
 
1 fundamentals of computer system
1 fundamentals of computer system1 fundamentals of computer system
1 fundamentals of computer systemBaliThorat1
 
Computer generation and classification
Computer generation and classificationComputer generation and classification
Computer generation and classificationBaliThorat1
 
Algorithm and flowchart
Algorithm and flowchartAlgorithm and flowchart
Algorithm and flowchartBaliThorat1
 
6 cpu scheduling
6 cpu scheduling6 cpu scheduling
6 cpu schedulingBaliThorat1
 
5 process synchronization
5 process synchronization5 process synchronization
5 process synchronizationBaliThorat1
 

More from BaliThorat1 (20)

Lec15 sfm
Lec15 sfmLec15 sfm
Lec15 sfm
 
Lec14 multiview stereo
Lec14 multiview stereoLec14 multiview stereo
Lec14 multiview stereo
 
Lec13 stereo converted
Lec13 stereo convertedLec13 stereo converted
Lec13 stereo converted
 
Lec12 epipolar
Lec12 epipolarLec12 epipolar
Lec12 epipolar
 
Lec11 single view-converted
Lec11 single view-convertedLec11 single view-converted
Lec11 single view-converted
 
Lec10 alignment
Lec10 alignmentLec10 alignment
Lec10 alignment
 
Lec09 hough
Lec09 houghLec09 hough
Lec09 hough
 
Lec08 fitting
Lec08 fittingLec08 fitting
Lec08 fitting
 
8 operating system concept
8 operating system concept8 operating system concept
8 operating system concept
 
7 processor
7 processor7 processor
7 processor
 
6 input output devices
6 input output devices6 input output devices
6 input output devices
 
2 windows operating system
2 windows operating system2 windows operating system
2 windows operating system
 
5 computer memory
5 computer memory5 computer memory
5 computer memory
 
4 computer languages
4 computer languages4 computer languages
4 computer languages
 
1 fundamentals of computer
1 fundamentals of computer1 fundamentals of computer
1 fundamentals of computer
 
1 fundamentals of computer system
1 fundamentals of computer system1 fundamentals of computer system
1 fundamentals of computer system
 
Computer generation and classification
Computer generation and classificationComputer generation and classification
Computer generation and classification
 
Algorithm and flowchart
Algorithm and flowchartAlgorithm and flowchart
Algorithm and flowchart
 
6 cpu scheduling
6 cpu scheduling6 cpu scheduling
6 cpu scheduling
 
5 process synchronization
5 process synchronization5 process synchronization
5 process synchronization
 

Recently uploaded

PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdfSoniaTolstoy
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 

Recently uploaded (20)

PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdfBASLIQ CURRENT LOOKBOOK  LOOKBOOK(1) (1).pdf
BASLIQ CURRENT LOOKBOOK LOOKBOOK(1) (1).pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 

Chap11 slides

  • 1. Search Algorithms for Discrete Optimization Problems Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar To accompany the text ``Introduction to Parallel Computing'', Addison Wesley, 2003.
  • 2. Topic Overview • Discrete Optimization - Basics • Sequential Search Algorithms • Parallel Depth-First Search • Parallel Best-First Search • Speedup Anomalies in Parallel Search Algorithms
  • 3. Discrete Optimization - Basics • Discrete optimization forms a class of computationally expensive problems of significant theoretical and practical interest. • Search algorithms systematically search the space of possible solutions subject to constraints.
  • 4. Definitions • A discrete optimization problem can be expressed as a tuple (S, f). The set is a finite or countably infinite set of all solutions that satisfy specified constraints. • The function f is the cost function that maps each element in set S onto the set of real numbers R. • The objective of a DOP is to find a feasible solution xopt, such that f(xopt) ≤ f(x) for all x  S. • A number of diverse problems such as VLSI layouts, robot motion planning, test pattern generation, and facility location can be formulated as DOPs.
  • 5. Discrete Optimization: Example • In the 0/1 integer-linear-programming problem, we are given an m×n matrix A, an m×1 vector b, and an n×1 vector c. • The objective is to determine an n×1 vector whose elements can take on only the value 0 or 1. • The vector must satisfy the constraint and the function must be minimized.
  • 6. Discrete Optimization: Example • The 8-puzzle problem consists of a 3×3 grid containing eight tiles, numbered one through eight. • One of the grid segments (called the ``blank'') is empty. A tile can be moved into the blank position from a position adjacent to it, thus creating a blank in the tile's original position. • The goal is to move from a given initial position to the final position in a minimum number of moves.
  • 7. Discrete Optimization: Example An 8-puzzle problem instance: (a) initial configuration; (b) final configuration; and (c) a sequence of moves leading from the initial to the final configuration.
  • 8. Discrete Optimization Basics • The feasible space S is typically very large. • For this reason, a DOP can be reformulated as the problem of finding a minimum-cost path in a graph from a designated initial node to one of several possible goal nodes. • Each element x in S can be viewed as a path from the initial node to one of the goal nodes. • This graph is called a state space.
  • 9. Discrete Optimization Basics • Often, it is possible to estimate the cost to reach the goal state from an intermediate state. • This estimate, called a heuristic estimate, can be effective in guiding search to the solution. • If the estimate is guaranteed to be an underestimate, the heuristic is called an admissible heuristic. • Admissible heuristics have desirable properties in terms of optimality of solution (as we shall see later).
  • 10. Discrete Optimization: Example An admissible heuristic for 8-puzzle is as follows: • Assume that each position in the 8-puzzle grid is represented as a pair. • The distance between positions (i,j) and (k,l) is defined as |i - k| + |j - l|. This distance is called the Manhattan distance. • The sum of the Manhattan distances between the initial and final positions of all tiles is an admissible heuristic.
  • 11. Parallel Discrete Optimization: Motivation • DOPs are generally NP-hard problems. Does parallelism really help much? • For many problems, the average-case runtime is polynomial. • Often, we can find suboptimal solutions in polynomial time. • Many problems have smaller state spaces but require real- time solutions. • For some other problems, an improvement in objective function is highly desirable, irrespective of time.
  • 12. Sequential Search Algorithms • Is the search space a tree or a graph? • The space of a 0/1 integer program is a tree, while that of an 8-puzzle is a graph. • This has important implications for search since unfolding a graph into a tree can have significant overheads.
  • 13. Sequential Search Algorithms Two examples of unfolding a graph into a tree.
  • 14. Depth-First Search Algorithms • Applies to search spaces that are trees. • DFS begins by expanding the initial node and generating its successors. In each subsequent step, DFS expands one of the most recently generated nodes. • If there exists no success, DFS backtracks to the parent and explores an alternate child. • Often, successors of a node are ordered based on their likelihood of reaching a solution. This is called directed DFS. • The main advantage of DFS is that its storage requirement is linear in the depth of the state space being searched.
  • 15. Depth-First Search Algorithms States resulting from the first three steps of depth-first search applied to an instance of the 8-puzzle.
  • 16. DFS Algorithms: Simple Backtracking • Simple backtracking performs DFS until it finds the first feasible solution and terminates. • Not guaranteed to find a minimum-cost solution. • Uses no heuristic information to order the successors of an expanded node. • Ordered backtracking uses heuristics to order the successors of an expanded node.
  • 17. Depth-First Branch-and-Bound (DFBB) • DFS technique in which upon finding a solution, the algorithm updates current best solution. • DFBB does not explore paths that are guaranteed to lead to solutions worse than current best solution. • On termination, the current best solution is a globally optimal solution.
  • 18. Iterative Deepening Search • Often, the solution may exist close to the root, but on an alternate branch. • Simple backtracking might explore a large space before finding this. • Iterative deepening sets a depth bound on the space it searches (using DFS). • If no solution is found, the bound is increased and the process repeated.
  • 19. Iterative Deepening A* (IDA*) • Uses a bound on the cost of the path as opposed to the depth. • IDA* defines a function for node x in the search space as l(x) = g(x) + h(x). Here, g(x) is the cost of getting to the node and h(x) is a heuristic estimate of the cost of getting from the node to the solution. • At each failed step, the cost bound is incremented to that of the node that exceeded the prior cost bound by the least amount. • If the heuristic h is admissible, the solution found by IDA* is optimal.
  • 20. DFS Storage Requirements and Data Structures • At each step of DFS, untried alternatives must be stored for backtracking. • If m is the amount of storage required to store a state, and d is the maximum depth, then the total space requirement of the DFS algorithm is O(md). • The state-space tree searched by parallel DFS can be efficiently represented as a stack. • Memory requirement of the stack is linear in depth of tree.
  • 21. DFS Storage Requirements and Data Structures Representing a DFS tree: (a) the DFS tree; Successor nodes shown with dashed lines have already been explored; (b) the stack storing untried alternatives only; and (c) the stack storing untried alternatives along with their parent. The shaded blocks represent the parent state and the block to the right represents successor states that have not been explored.
  • 22. Best-First Search (BFS) Algorithms • BFS algorithms use a heuristic to guide search. • The core data structure is a list, called Open list, that stores unexplored nodes sorted on their heuristic estimates. • The best node is selected from the list, expanded, and its off- spring are inserted at the right position. • If the heuristic is admissible, the BFS finds the optimal solution.
  • 23. Best-First Search (BFS) Algorithms • BFS of graphs must be slightly modified to account for multiple paths to the same node. • A closed list stores all the nodes that have been previously seen. • If a newly expanded node exists in the open or closed lists with better heuristic value, the node is not inserted into the open list.
  • 24. The A* Algorithm • A BFS technique that uses admissible heuristics. • Defines function l(x) for each node x as g(x) + h(x). • Here, g(x) is the cost of getting to node x and h(x) is an admissible heuristic estimate of getting from node x to the solution. • The open list is sorted on l(x). The space requirement of BFS is exponential in depth!
  • 25. Best-First Search: Example Applying best-first search to the 8-puzzle: (a) initial configuration; (b) final configuration; and (c) states resulting from the first four steps of best-first search. Each state is labeled with its -value (that is, the Manhattan distance from the state to the final state).
  • 26.
  • 27. Search Overhead Factor • The amount of work done by serial and parallel formulations of search algorithms is often different. • Let W be serial work and WP be parallel work. Search overhead factor s is defined as WP/W. • Upper bound on speedup is p×(W/WP).
  • 28. Parallel Depth-First Search • How is the search space partitioned across processors? • Different subtrees can be searched concurrently. • However, subtrees can be very different in size. • It is difficult to estimate the size of a subtree rooted at a node. • Dynamic load balancing is required.
  • 29. Parallel Depth-First Search The unstructured nature of tree search and the imbalance resulting from static partitioning.
  • 30. Parallel Depth-First Search: Dynamic Load Balancing • When a processor runs out of work, it gets more work from another processor. • This is done using work requests and responses in message passing machines and locking and extracting work in shared address space machines. • On reaching final state at a processor, all processors terminate. • Unexplored states can be conveniently stored as local stacks at processors. • The entire space is assigned to one processor to begin with.
  • 31. Parallel Depth-First Search: Dynamic Load Balancing A generic scheme for dynamic load balancing.
  • 32. Parameters in Parallel DFS: Work Splitting • Work is split by splitting the stack into two. • Ideally, we do not want either of the split pieces to be small. • Select nodes near the bottom of the stack (node splitting), or • Select some nodes from each level (stack splitting). • The second strategy generally yields a more even split of the space.
  • 33. Parameters in Parallel DFS: Work Splitting Splitting the DFS tree: the two subtrees along with their stack representations are shown in (a) and (b).
  • 34. Load-Balancing Schemes • Who do you request work from? Note that we would like to distribute work requests evenly, in a global sense. • Asynchronous round robin: Each processor maintains a counter and makes requests in a round-robin fashion. • Global round robin: The system maintains a global counter and requests are made in a round-robin fashion, globally. • Random polling: Request a randomly selected processor for work.
  • 35. Analyzing DFS • We can’t compute, analytically, the serial work W or parallel time. Instead, we quantify total overhead To in terms of W to compute scalability. • For dynamic load balancing, idling time is subsumed by communication. • We must quantify the total number of requests in the system.
  • 36. Analyzing DFS: Assumptions • Work at any processor can be partitioned into independent pieces as long as its size exceeds a threshold ε. • A reasonable work-splitting mechanism is available. • If work w at a processor is split into two parts ψw and (1 - ψ)w, there exists an arbitrarily small constant α (0 < α ≤ 0.5), such that ψw > αw and (1 - ψ)w > αw. • The costant α sets a lower bound on the load imbalance from work splitting.
  • 37. Analyzing DFS • If processor Pi initially had work wi, after a single request by processor Pj and split, neither Pi nor Pj have more than (1 - α)wi work. • For each load balancing strategy, we define V(P) as the total number of work requests after which each processor receives at least one work request (note that V(p) ≥ p). • Assume that the largest piece of work at any point is W. • After V(p) requests, the maximum work remaining at any processor is less than (1 - α)W; after 2V(p) requests, it is less than (1 - α)2W. • After (log1/1(1- α )(W/ε))V(p) requests, the maximum work remaining at any processor is below a threshold value ε. • The total number of work requests is O(V(p)log W).
  • 38. Analyzing DFS • If tcomm is the time required to communicate a piece of work, then the communication overhead is given by T0 = tcommV(p)log W (1) The corresponding efficiency E is given by
  • 39. Analyzing DFS: for Various Schemes • Asynchronous Round Robin: V(p) = O(p2) in the worst case. • Global Round Robin: V(p) = p. • Random Polling: Worst case V(p) is unbounded. We do average case analysis.
  • 40. for Random Polling • Let F(i,p) represent a state in which i of the processors have been requested, and p - i have not. • Let f(i,p) denote the average number of trials needed to change from state F(i,p) to F(p,p) (V(p) = f(0,p)).
  • 41. for Random Polling • We have: • As p becomes large, Hp ≃ 1.69ln p (where ln p denotes the natural logarithm of p ). Thus, V(p) = O(plog p). , 1 ),0( 1 0      p i ip ppf , 1 1   p i i p ,pHp
  • 42. Analysis of Load-Balancing Schemes If tcomm = O(1) , we have, T0 = O(V(p)log W). (2) • Asynchronous Round Robin: Since V(p) = O(p2), T0 = O(p2log w). It follows that: W = O(p2log(p2log W)), = O(p2log p + p2log log W) = O(p2log p)
  • 43. Analysis of Load-Balancing Schemes • Global Round Robin: Since V(p) = O(p), T0 = O(plog W). It follows that W = O(plog p). However, there is contention here! The global counter must be incremented O(plog W) times in O(W/p) time. From this, we have: (3) and W = O(p2log p). The worse of these two expressions, W = O(p2log p) is the isoefficiency. )log( WpO p W 
  • 44. Analysis of Load-Balancing Schemes • Random Polling: We have V(p) = O(plog p), To = O(plog plogW) Therefore W = O(plog2p).
  • 45. Analysis of Load-Balancing Schemes: Conclusions • Asynchronous round robin has poor performance because it makes a large number of work requests. • Global round robin has poor performance because of contention at counter, although it makes the least number of requests. • Random polling strikes a desirable compromise.
  • 46. Experimental Validation: Satisfiability Problem Speedups of parallel DFS using ARR, GRR and RP load-balancing schemes.
  • 47. Experimental Validation: Satisfiability Problem Number of work requests generated for RP and GRR and their expected values ( and respectively).
  • 48. Experimental Validation: Satisfiability Problem Experimental isoefficiency curves for RP for different efficiencies.
  • 49. Termination Detection • How do you know when everyone's done? • A number of algorithms have been proposed.
  • 50. Dijkstra's Token Termination Detection • Assume that all processors are organized in a logical ring. • Assume, for now that work transfers can only happen from Pi to Pj if j > i. • Processor P0 initiates a token on the ring when it goes idle. • Each intermediate processor receives this token and forwards it when it becomes idle. • When the token reaches processor P0, all processors are done.
  • 51. Dijkstra's Token Termination Detection Now, let us do away with the restriction on work transfers. • When processor P0 goes idle, it colors itself green and initiates a green token. • If processor Pj sends work to processor Pi and j > i then processor Pj becomes red. • If processor Pi has the token and Pi is idle, it passes the token to Pi+1. If Pi is red, then the color of the token is set to red before it is sent to Pi+1. If Pi is green, the token is passed unchanged. • After Pi passes the token to Pi+1, Pi becomes green . • The algorithm terminates when processor P0 receives a green token and is itself idle.
  • 52. Tree-Based Termination Detection • Associate weights with individual workpieces. Initially, processor P0 has all the work and a weight of one. • Whenever work is partitioned, the weight is split into half and sent with the work. • When a processor gets done with its work, it sends its parent the weight back. • Termination is signaled when the weight at processor P0 becomes 1 again. • Note that underflow and finite precision are important factors associated with this scheme.
  • 53. Tree-Based Termination Detection Tree-based termination detection. Steps 1-6 illustrate the weights at various processors after each work transfer
  • 54. Parallel Formulations of Depth-First Branch-and-Bound • Parallel formulations of depth-first branch-and-bound search (DFBB) are similar to those of DFS. • Each processor has a copy of the current best solution. This is used as a local bound. • If a processor detects another solution, it compares the cost with current best solution. If the cost is better, it broadcasts this cost to all processors. • If a processor's current best solution path is worse than the globally best solution path, only the efficiency of the search is affected, not its correctness.
  • 55. Parallel Formulations of IDA* Two formulations are intuitive. • Common Cost Bound: Each processor is given the same cost bound. Processors use parallel DFS on the tree within the cost bound. The drawback of this scheme is that there might not be enough concurrency. • Variable Cost Bound: Each processor works on a different cost bound. The major drawback here is that a solution is not guaranteed to be optimal until all lower cost bounds have been exhausted. In each case, parallel DFS is the search kernel.
  • 56. Parallel Best-First Search • The core data structure is the Open list (typically implemented as a priority queue). • Each processor locks this queue, extracts the best node, unlocks it. • Successors of the node are generated, their heuristic functions estimated, and the nodes inserted into the open list as necessary after appropriate locking. • Termination signaled when we find a solution whose cost is better than the best heuristic value in the open list. • Since we expand more than one node at a time, we may expand nodes that would not be expanded by a sequential algorithm.
  • 57. Parallel Best-First Search A general schematic for parallel best-first search using a centralized strategy. The locking operation is used here to serialize queue access by various processors.
  • 58. Parallel Best-First Search • The open list is a point of contention. • Let texp be the average time to expand a single node, and taccess be the average time to access the open list for a single-node expansion. • If there are n nodes to be expanded by both the sequential and parallel formulations (assuming that they do an equal amount of work), then the sequential run time is given by n(taccess+ texp). • The parallel run time will be at least ntaccess. • Upper bound on the speedup is (taccess+ texp)/taccess
  • 59. Parallel Best-First Search • Avoid contention by having multiple open lists. • Initially, the search space is statically divided across these open lists. • Processors concurrently operate on these open lists. • Since the heuristic values of nodes in these lists may diverge significantly, we must periodically balance the quality of nodes in each list. • A number of balancing strategies based on ring, blackboard, or random communications are possible.
  • 60. Parallel Best-First Search A message-passing implementation of parallel best-first search using the ring communication strategy.
  • 61. Parallel Best-First Search An implementation of parallel best-first search using the blackboard communication strategy.
  • 62. Parallel Best-First Graph Search • Graph search involves a closed list, where the major operation is a lookup (on a key corresponding to the state). • The classic data structure is a hash. • Hashing can be parallelized by using two functions - the first one hashes each node to a processor, and the second one hashes within the processor. • This strategy can be combined with the idea of multiple open lists. • If a node does not exist in a closed list, it is inserted into the open list at the target of the first hash function. • In addition to facilitating lookup, randomization also equalizes quality of nodes in various open lists.
  • 63. Speedup Anomalies in Parallel Search • Since the search space explored by processors is determined dynamically at runtime, the actual work might vary significantly. • Executions yielding speedups greater than p by using p processors are referred to as acceleration anomalies. Speedups of less than p using p processors are called deceleration anomalies. • Speedup anomalies also manifest themselves in best-first search algorithms. • If the heuristic function is good, the work done in parallel best-first search is typically more than that in its serial counterpart.
  • 64. Speedup Anomalies in Parallel Search The difference in number of nodes searched by sequential and parallel formulations of DFS. For this example, parallel DFS reaches a goal node after searching fewer nodes than sequential DFS.
  • 65. Speedup Anomalies in Parallel Search A parallel DFS formulation that searches more nodes than its sequential counterpart