1. Search Algorithms for Discrete
Optimization
Chp13- book: Introduction to Parallel
Computing, Second Edition
eng.sallysalem@gmail.com, 2013
2. Agenda
• Definitions
• Sequential Search Algorithms
– Depth-First Search
– Best-First Search
– Example to DFS
• Parallel Search Algorithms
– Depth-First Search
– Best-First Search
– Example
3. Definitions (cont)
• A discrete optimization problem can be expressed as a
tuple (S, f). The set S is a finite or countably infinite set of
all solutions that satisfy specified constraints.
• The function f is the cost function that maps each element
in set S onto the set of real numbers R.
• The objective of a DOP is to find a feasible solution xopt,
such that f(xopt) ≤ f(x) for all x S.
Rsf :
4. Discrete Optimization: Example
• The 8-puzzle problem consists of a 3×3 grid containing eight
tiles, numbered one through eight.
• One of the grid segments (called the ``blank'') is empty. A tile can
be moved into the blank position from a position adjacent to it,
thus creating a blank in the tile's original position.
• The goal is to move from a given initial position to the final
position in a minimum number of moves.
6. Discrete Optimization: Example
The set S for this
problem is the set of all
sequences of moves that
lead from the initial to
the final configurations.
10. Depth-First Search Algorithms
• Applies to search spaces that are trees.
• DFS begins by expanding the initial node and generating its
successors. In each subsequent step, DFS expands one of
the most recently generated nodes.
• If there exists no success, DFS backtracks to the parent and
explores an alternate child.
• Often, successors of a node are ordered based on their
likelihood of reaching a solution. This is called directed DFS.
• The main advantage of DFS is that its storage requirement is
linear in the depth of the state space being searched.
11. States resulting from the first three steps of depth-first search
applied to an instance of the 8-puzzle.
Depth-First Search Algorithms
13. Depth-First Search Algorithms
1-Simple Backtracking
• Simple backtracking performs DFS until it finds the first
feasible solution and terminates.
• Not guaranteed to find a minimum-cost solution.
14. Depth-First Search Algorithms
2-Depth-First Branch-and-Bound (DFBB)
• Exhaustively searches the state space, that is,
• It continues to search even after finding a solution path.
• Whenever it finds a new solution path, it updates the
current best solution path
15. Depth-First Search Algorithms
3- Iterative deepening depth-first search (ID-DFS):
Select depth = 0
F is not the goal
then back and
increase the
depth
16. Depth-First Search Algorithms
3- Iterative deepening depth-first search (ID-DFS):
Select depth = 1
B is not the goal
the move to G
F is not the goal
then back and
increase the
depth
….. and so on
until get the
goal
17. Depth-First Search Algorithms
3- Iterative deepening depth-first search (ID-DFS):
• Used for trees of deep search space, we impose a bound on
the depth to which the DFS algorithm searches.
• If a solution is not found, then the entire state space is
searched again using a larger depth bound.
18. Representing a DFS tree: (a) the DFS tree; Successor nodes shown with dashed lines
have already been explored; (b) the stack storing untried alternatives only; and (c)
the stack storing untried alternatives along with their parent. The shaded blocks
represent the parent state and the block to the right represents successor states that
have not been explored.
Depth-First Search Algorithms
19. Best-First Search Algorithms
• We have total weight and weight to each step, depend on the
weight we select next node
First only have F no weight , so
expand the F
W5
W4W3W2
W1
W0
If W0 and W1 not the goal and
both weights < total weight
then expand the B, G
Calculate
W0 + W2 < total weight
If ( W0 + W 3 ) < Total Weight
Then
Move to I ..
Calculate
W0 + W3 > total weight ,
Then remove D
20. Best-First Search Algorithms
• Can search both graphs and trees
• BFS algorithms use a heuristic to guide search.
• BFS maintains two lists: open and closed.
• At the beginning, the initial node is placed on the open list.
• This list is sorted according to a heuristic evaluation function
• In each step of the search, the most promising node from the
open list is removed.
• If this node is a goal node, then the algorithm terminates.
• Otherwise, the node is expanded. The expanded node is placed
on the closed list.
21. Search Overhead Factor
• The search overhead factor of the parallel system is defined as the
ratio of the work done by the parallel formulation to that done by the
sequential formulation
Wp/ W
• Where W be the amount of work done by a single processor and Wp be
the total amount of work done by p processors
• Upper bound on speedup is p x (W/ Wp)
– Speedup may be less due to other parallel processing overhead.
24. Parallel Depth-First Search
• The critical issue in parallel depth-first search algorithms is the distribution of
the search space among the processors.
In Case use 2 processors (A , B) :
one processor is idle for a significant
amount of time, reducing efficiency.
In Case Large number of processors add C,
D, E, F :
The static partitioning of unstructured trees
yields poor performance
25. Parallel Depth-First Search
• Dynamic load balancing: when a processor runs out of
work, it gets more work from another processor that has
work.
27. Parallel Depth-First Search:
Dynamic Load Balancing
• Steps
– each processor can be in one of two states: active or idle.
– In message passing architectures, an idle processor selects a donor
processor and sends it a work request.
– If the idle processor receives work (part of the state space to be searched)
from the donor processor, it becomes active.
– If it receives a reject message (because the donor has no work), it selects
another donor and sends a work request to that donor.
– This process repeats until the processor gets work or all the processors
become idle.
28. Important Parameters of Parallel DFS
• Two characteristics of parallel DFS are critical to
determining its performance
1. The method for splitting work at a processor
2. The scheme to determine the donor processor when a processor
becomes idle.
29. Splitting the DFS tree: the two subtrees along with their stack representations are
shown in (a) and (b).
Parameters in Parallel DFS:
Work Splitting
30. Parameters in Parallel DFS:
Work Splitting
• Work is split by splitting the stack into two
• If too little work is sent, the recipient quickly becomes idle
• if too much, the donor becomes idle
• Ideally, we do not want either of the split pieces to be small
• Using a half-split the stack is split into two equal pieces
• Cutoff depth: To avoid sending very small amounts of work, nodes beyond a
specified stack depth are not given away
31. Parameters in Parallel DFS:
Load-Balancing Schemes
1. Asynchronous round robin
2. Global round robin
3. Random polling
• Each of these schemes can be coded for message passing
as well as shared address space machines.
32. Parameters in Parallel DFS:
Load-Balancing Schemes
• Asynchronous round robin (ARR): Each processor maintains a
counter and makes requests in a round-robin fashion.
• Global round robin (GRR): The system maintains a global
counter and requests are made in a round-robin fashion,
globally.
• Random polling: Request a randomly selected processor for
work.
33. A General Framework for Analysis of
Parallel DFS
• For each load balancing strategy, we define V(P) as the total number of work requests
after which each processor receives at least one work request (note that V(p) ≥ p).
• Assume that the largest piece of work at any point is W.
• After V(p) requests, the maximum work remaining at any processor is less than (1 - α)W;
after 2V(p) requests, it is less than (1 - α)2 W.
• After (log1/1(1- α )(W/ε))V(p) requests, the maximum work remaining at any processor is
below a threshold value ε.
• The total number of work requests is O(V(p)log W).
34. A General Framework for Analysis of
Parallel DFS
• If tcomm is the time required to communicate a piece of work, then the
communication overhead To is given by
T0 = tcommV(p)log W
The corresponding efficiency E is given by
35. A General Framework for Analysis of
Parallel DFS
• To depends on two values: tcomm and V(p). The value of tcomm is determined by
the underlying architecture, and the function V(p) is determined by the load-
balancing scheme.
• So lets Computation of V(p) for Various Load-Balancing Schemes
36. A General Framework for Analysis of
Parallel DFS
• Asynchronous Round Robin: when all processors issue work requests at the
same time to the same processor.
V(p) = O(p2) in the worst case.
• Global Round Robin: all processors receive requests in sequence V(p) = p.
• Random Polling: Worst case V(p) is unbounded V (p) = O (p log p).
37. • Asynchronous round robin has poor performance because it makes a
large number of work requests.
• Global round robin has poor performance because of contention at
counter, although it makes the least number of requests.
• Random polling strikes a desirable compromise.
Analysis of Load-Balancing Schemes:
Conclusions
38. Termination Detection
• How do you know when everyone's done?
• A number of algorithms have been proposed.
• Techniques
1. Dijkstra's Token Termination Detection
2. Tree-Based Termination Detection
39. Dijkstra's Token Termination Detection
P0 goes idle, it colors
itself green and
initiates a green token.
If pi is ideal Move to
next p
The algorithm
terminates when
processor P0 receives a
green token and is
itself idle.
40. Dijkstra's Token Termination Detection
• Now, let us do away with the restriction on work transfers.
• When processor P0 goes idle, it colors itself green and initiates a green token.
• If processor Pj sends work to processor Pi and j > i then processor Pj becomes red.
• If processor Pi has the token and Pi is idle, it passes the token to Pi+1. If Pi is red, then the color
of the token is set to red before it is sent to Pi+1. If Pi is green, the token is passed unchanged.
• After Pi passes the token to Pi+1, Pi becomes green .
• The algorithm terminates when processor P0 receives a green token and is itself idle.
41. Tree-Based Termination Detection
• Associate weights with individual workpieces. Initially, processor P0 has all the
work and a weight of one.
• Whenever work is partitioned, the weight is split into half and sent with the
work.
• When a processor gets done with its work, it sends its parent the weight back.
• Termination is signaled when the weight at processor P0 becomes 1 again.
• Note that underflow and finite precision are important factors associated with
this scheme.
43. Parallel Best-First Search
• The core data structure is the Open list (typically implemented as a priority
queue).
• Each processor locks this queue, extracts the best node, unlocks it.
• Successors of the node are generated, their heuristic functions estimated, and
the nodes inserted into the open list as necessary after appropriate locking.
• Termination signaled when we find a solution whose cost is better than the
best heuristic value in the open list.
• Since we expand more than one node at a time, we may expand nodes that
would not be expanded by a sequential algorithm.
44. Parallel Best-First Search
A general schematic for parallel best-first search using a centralized
strategy. The locking operation is used here to serialize queue access
by various processors.
45. Parallel Best-First tree Search
Three strategies for tree search:
1. Random communication strategy
2. Ring communication
3. Blackboard communication.
46. Parallel Best-First tree Search
1. Random communication strategy
• each processor periodically sends some of its best nodes to the
open list of a randomly selected processor.
• This strategy ensures that, if a processor stores a good part of
the search space, the others get part of it.
• The cost of communication determines the node transfer
frequency.
47. Parallel Best-First tree Search
2. Ring communication strategy
• In the ring communication strategy, the processors are mapped in a virtual
ring.
• Each processor periodically exchanges some of its best nodes with the open
lists of its neighbors in the ring.
• This strategy can be implemented on message passing as well as shared
address space machines with the processors organized into a logical ring.
• The cost of communication determines the node transfer frequency.
48. Parallel Best-First tree Search
2. Ring communication strategy
A message-passing implementation of parallel best-first search using the
ring communication strategy.
49. Parallel Best-First tree Search
3. Blackboard communication
• there is a shared blackboard through which nodes are switched among processors as follows.
• After selecting the best node from its local open list, a processor expands the node only if its value is
within a tolerable limit of the best node on the blackboard.
• If the selected node is much better than the best node on the blackboard, the processor sends some of
its best nodes to the blackboard before expanding the current node.
• If the selected node is much worse than the best node on the blackboard, the processor retrieves
some good nodes from the blackboard and reselects a node for expansion.
• The blackboard strategy is suited only to shared-address-space computers, because the value of the
best node in the blackboard has to be checked after each node expansion.
50. Parallel Best-First tree Search
3. Blackboard communication
An implementation of parallel best-first search using the blackboard
communication strategy.
51. Speedup Anomalies in Parallel Search
• Since the search space explored by processors is determined
dynamically at runtime, the actual work might vary significantly.
• Executions yielding speedups greater than p by using p processors are
referred to as acceleration anomalies. Speedups of less than p using p
processors are called deceleration anomalies.
• Speedup anomalies also manifest themselves in best-first search
algorithms.
• If the heuristic function is good, the work done in parallel best-first
search is typically more than that in its serial counterpart.
52. Speedup Anomalies in Parallel Search
The difference in number of nodes searched by sequential and parallel formulations of DFS.
For this example, parallel DFS reaches a goal node after searching fewer nodes than
sequential DFS.
53. Speedup Anomalies in Parallel Search
A parallel DFS formulation that searches more nodes than its sequential counterpart