Search algorithms master

1,528 views

Published on

•Common Problems Needs Computers
•The Search Problem
•Basic Search Algorithms
–Algorithms used for searching the contents of an array
•Linear or Sequential Search
•Binary Search
•Comparison Between Linear and Binary Search
•Algorithms for solving shortest path problems
–Sequential Search Algorithms
•Depth-First Search
•Breadth First Search
–Parallel or distributed Search Algorithms
•Parallel Depth-First Search
•Parallel Breadth First Search

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,528
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
70
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Several search algorithms can determine an optimal solution by searching only a portion of the graph.For some problems, it is possible to estimate the cost to reach the goal state from an intermediate state. This cost is called a heuristic estimate.
  • If you want to go from Point A to Point B, you are employing some kind of search. For a direction finder, going from Point A to Point B literally means finding a path between where you are now and your intended destination. For a chess game, Point A to Point B might be two points between its current position and its position 5 moves from now. For a genome sequencer, Points A and B could be a link between two DNA sequences.As you can tell, going from Point A to Point B is different for every situation. If there is a vast amount of interconnected data, and you are trying to find a relation between few such pieces of data, you would use search. In this tutorial, you will learn about two forms of searching, depth first and breadth first.Searching falls under Artificial Intelligence (AI). A major goal of AI is to give computers the ability to think, or in other words, mimic human behavior. The problem is, unfortunately, computers don't function in the same way our minds do. They require a series of well-reasoned out steps before finding a solution. Your goal, then, is to take a complicated task and convert it into simpler steps that your computer can handle. That conversion from something complex to something simple is what this tutorial is primarily about. Learning how to use two search algorithms is just a welcome side-effect.
  • UNINFORMED GRAPH SEARCH ALGORITHMSIn implicit graph search, no graph representation is available at the beginning; only while the search progresses, a partial picture of it evolves from those nodes that are actually explored. In each iteration, a node is expanded by generating all adjacent nodes that are reachable via edges of the implicit graph (the possible edges can be described for example by a set of transition rules). This means applying all allowed actions to the state. Nodes that have been generated earlier in the search can be kept trackof; however, we have no access to nodes that have not been generated so far. All nodes have to be reached at least once on a path from the initial node through successor generation. Consequently, we can divide the set of reached nodes into the set of expanded nodes and the set of generated nodes that are not yet expanded.
  • Maps. A person who is planning a trip may need to answer questions such as “What is the shortest route from Providence to Princeton?” A seasoned traveler who has experienced traffic delays on the shortest route may ask the question “What is the fastest way to get from Providence to Princeton?” To answer such questions, we process information about connections (roads) between items (intersections).Web content. When we browse the web, we encounter pages that contain references (links) to other pages and we move from page to page by clicking on the links. The entire web is a graph, where the items are pages and the connections are links. Graph processing algorithms are essential components of the search engines that help us locate information on the web.Circuits. An electric circuit comprises devices such as transistors, resistors, and capacitors that are intricately wired together. We use computers to control machines that make circuits and to check that the circuits perform desired functions. We need to answer simple questions such as “Is a short-circuit present?” as well as complicated questions such as “Can we lay out this circuit on a chip without making any wires cross?” The answer to the first question depends on only the properties of the connections (wires), whereas the answer to the second question requires detailed information about the wires, the devices that those wires connect, and the physical constraints of the chip.Schedules. A manufacturing process requires a variety of jobs to be performed, under a set of constraints that specify that certain tasks cannot be started until certain other tasks have been completed. How do we schedule the tasks such that we both respect the given constraints and complete the whole process in the least amount of time? Commerce. Retailers and financial instututions track buy/sell orders in a market. A connection in this situation represents the transfer of cash and goods between an institution and a customer. Knowledge of the nature of the connection structure in this instance may enhance our understanding of the nature of the market.Matching. Students apply for positions in selective institutions such as social clubs, universities, or medical schools. Items correspond to the students and the institutions; connections correspond to the applications. We want to discover methods for matching interested students with available positions.Computer networks. A computer network consists of interconnected sites that send, forward, and receive messages of various types. We are interested in knowing about the nature of the interconnection structure because we want to lay wires and build switches that can handle the traffic efficiently.Software. A compiler builds graphs to represent relationships among modules in a large software system. The items are the various classes or modules that comprise the system; connections are associated either with the possibility that a method in one class might call another (static analysis) or with actual calls while the system is in operation (dynamic analysis). We need to analyze the graph to determine how best to allocate resources to the program most efficiently.Social networks. When you use a social network, you build explicit connections with your friends. Items correspond to people; connections are to friends or followers. Understanding the properties of these networks is a modern graph-processing applications of intense interest not just to compaines that support such networks, but also in politics, diplomacy, entertainment, education, marketing, and many other domains.
  • Adjacency matrix. n-by-n matrix with Auv = 1 if (u, v) is an edge.Two representations of each edge (symmetric matrix for undirected graphs; not for directed graphs).Space: proportional to n2.Not efficient for sparse graphs (small number of edges compared to the maximum possible number of edges in the graph), Algorithms might have longer running time if this representation usedChecking if (u, v) is an edge takes (1) time. Identifying all edges takes (n2) time.
  • DFS applications:Determines whether G is connectedComputes the connected components of G (strongly connected components of a digraph = directed graph)Path / cycle findingTopological sort (ordering of vertices of digraph G(V,E) such that for every edge (u,v) in E, u appears before v in the ordering)Linear running timeBFS applications:Computes the distance from s to each reachable vertex in unewighted GFinds shortest paths from s to all other nodes in unweighted GFinds a simple cycle, if there is oneComputes the connected components of G
  • The most suitable sequential search algorithm to apply to a state space depends on whether the space forms a graph or a tree.In a tree, each new successor leads to an unexplored part of the search space.0/1 (Binary) integer-programming problem.In a graph, a state can be reached along multiple paths. 8-puzzle.
  • In a breadth first search, you start at the root node, and then scan each node in the first level starting from the leftmost node, moving towards the right. Then you continue scanning the second level (starting from the left) and the third level, and so on until you’ve scanned all the nodes, or until you find the actual node that you were searching for. In a BFS, when traversing one level, we need some way of knowing which nodes to traverse once we get to the next level. The way this is done is by storing the pointers to a level’s child nodes while searching that level. The pointers are stored in FIFO (First-In-First-Out) queue. This, in turn, means that BFS uses a large amount of memory because we have to store the pointers.
  • BFS algorithms use a heuristic to guide search. The core data structure is a list, called Open list, that stores unexplored nodes sorted on their heuristic estimates. The best node is selected from the list, expanded, and its off-spring are inserted at the right position. If the heuristic is admissible, the BFS finds the optimal solution. ************************************************************************************************In a breadth first search, you start at the root node, and then scan each node in the first level starting from the leftmost node, moving towards the right. Then you continue scanning the second level (starting from the left) and the third level, and so on until you’ve scanned all the nodes, or until you find the actual node that you were searching for. In a BFS, when traversing one level, we need some way of knowing which nodes to traverse once we get to the next level. The way this is done is by storing the pointers to a level’s child nodes while searching that level. The pointers are stored in FIFO (First-In-First-Out) queue. This, in turn, means that BFS uses a large amount of memory because we have to store the pointers.
  • In a depth first search,Start at the root, and follow one of the branches of the tree as far as possible until either the node looking for is found or hit a leaf node ( a node with no children).If hit a leaf node, then continue the search at the nearest ancestor with unexplored children.It is an algorithm for traversing or searching a tree, tree structure, or graph.One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking.DFS begins by expanding the initial node and generating its successors. In each subsequent step, DFS expands one of the most recently generated nodes. If this node has no successors (or cannot lead to any solutions), then DFS backtracks and expands a different node. In some DFS algorithms, successors of a node are expanded in an order determined by their heuristic values. A major advantage of DFS is that its storage requirement is linear in the depth of the state space being searched. The following sections discuss three algorithms based on depth-first search.Depth-first search (DFS) is an algorithm for traversing or searching a tree, tree structure, or graph. One starts at the root (selecting some node as the root in the graph case) and explores as far as possible along each branch before backtracking.A version of depth-first search was investigated in the 19th century by French mathematician Charles Pierre Trémaux[1] as a strategy forsolving mazes.[2][3]Formal definitionFormally, DFS is an uninformed search that progresses by expanding the first child node of the search tree that appears and thus going deeper and deeper until a goal node is found, or until it hits a node that has no children. Then the search backtracks, returning to the most recent node it hasn't finished exploring. In a non-recursive implementation, all freshly expanded nodes are added to a stack for exploration.PropertiesThe time and space analysis of DFS differs according to its application area. In theoretical computer science, DFS is typically used to traverse an entire graph, and takes time O(|E|), linear in the size of the graph. In these applications it also uses space O(|V|) in the worst case to store the stack of vertices on the current search path as well as the set of already-visited vertices. Thus, in this setting, the time and space bounds are the same as for breadth-first search and the choice of which of these two algorithms to use depends less on their complexity and more on the different properties of the vertex orderings the two algorithms produce.For applications of DFS to search problems in artificial intelligence, however, the graph to be searched is often either too large to visit in its entirety or even infinite, and DFS may suffer from non-termination when the length of a path in the search tree is infinite. Therefore, the search is only performed to a limited depth, and due to limited memory availability one typically does not use data structures that keep track of the set of all previously visited vertices. In this case, the time is still linear in the number of expanded vertices and edges (although this number is not the same as the size of the entire graph because some vertices may be searched more than once and others not at all) but the space complexity of this variant of DFS is only proportional to the depth limit, much smaller than the space needed for searching to the same depth using breadth-first search. For such applications, DFS also lends itself much better to heuristic methods of choosing a likely-looking branch. When an appropriate depth limit is not known a priori, iterative deepening depth-first search applies DFS repeatedly with a sequence of increasing limits; in the artificial intelligence mode of analysis, with a branching factor greater than one, iterative deepening increases the running time by only a constant factor over the case in which the correct depth limit is known due to the geometric growth of the number of nodes per level.DFS may be also used to collect a sample of graph nodes. However, incomplete DFS, similarly to incomplete BFS, is biased towards nodes of high degree.
  • For example, given a family tree if one were looking for someone on the tree who’s still alive, then it would be safe to assume that person would be on the bottom of the tree. This means that a BFS would take a very long time to reach that last level. A DFS, however, would find the goal faster. But, if one were looking for a family member who died a very long time ago, then that person would be closer to the top of the tree. Then, a BFS would usually be faster than a DFS. So, the advantages of either vary depending on the data and what you’re looking for.Complexity of Breadth First SearchAssume an adjacency list representation, V is the number of vertices, E the number of edges.Each vertex is enqueued and dequeued at most once.Scanning for all adjacent vertices takes O(|E|) time, since sum of lengths of adjacency lists is |E|.Gives a O(|V|+|E|) time complexity.Complexity of Depth First SearchEach vertex is pushed on the stack and popped at most once.For every vertex we check what the next unvisited neighbor is.In our implementation, we traverse the adjacency list only once. This gives O(|V|+|E|) again.
  • Due to economic pressure, parallel computing on an increased number of cores both in central processing units (CPUs) and in graphics processing units (GPUs) will be essential to solve challenging problems in the future.Modern computers exploit parallelism on the hardware level. Parallel or distributed algorithms aredesigned to solve algorithmic problems by using many processing devices (processes, processors, processorcores, nodes, units) simultaneously. The reason that parallel algorithms are required is that it is technically easier to build a system of several communicating slower processors than a single one that is multiple times faster.Multicore and many-core processing units are widely available and often allow fast access to theshared memory area, avoiding slow transfer of data across data links in the cluster. Moreover, the limitof memory addresses has almost disappeared on 64-bit systems. The design of the parallel algorithm thus mainly follows the principle that time is the primary bottleneck. Today’s microprocessors are using several parallel processing techniques like instruction-level parallelism, pipelined instruction fetching, among others.Efficient parallel solutions often require the invention of original and novel algorithms, radically different from those used to solve the same problems sequentially. The speedup compared to a one processor solution depends on the specific properties of the problem at hand. The aspects of general algorithmic problems most frequently encountered in designing parallel algorithms are compatibility with machine architecture, choice of suitable shared data structures, and compromise between processingand communication overhead. An efficient solution can be obtained only if the organization between the different tasks can be optimized and distributed in a way that the working power is effectively used. Parallel algorithms commonly refer to a synchronous scenario, where communication is either performed in regular clock intervals, or even in a fixed architecture of computing elements performing the same processing or communication tasks (single-instruction multiple-data architectures) in contrast to the more common case of multiple-instructions multiple-data computers. On the other hand, the term distributed algorithm is preferably used for an asynchronous setting with looser coupling of the processing elements. The use of terminology, however, is not consistent. In AI literature, the term parallel search is preferred even for a distributed scenario. The exploration (generating the successors, computing the heuristic estimates, etc.) is distributed among different processes, be it workstation clusters or multicore processor environments. In this book we talk about distributed search when more than one search process is invoked, which can be due to partitioning the workload among processes, as in parallel search, or due to starting from different ends of the search space, as addressed in bidirectional search. The most important problem in distributed search is to minimize the communication (overhead) between the search processes.After introducing parallel processing, we turn to parallel state space search algorithms, starting with parallel depth-first search heading toward parallel heuristic search. Early parallel formulations of A* assume that the graph is a tree, so that there is no need to keep a Closed list to avoid duplicates. If the graph is not a tree, most frequently hash functions distribute the workload. We identify differences in shared memory algorithm designs (e.g., multicore processors) and algorithms for distributed memoryarchitectures (e.g., workstation clusters). One distributed data structure that is introduced features the capabilities of both heaps and binary search trees. Newer parallel implementations of A* include frontier search and large amounts of disk space. Effective data structures for concurrent access especially for the search frontier are essential. In external algorithms, parallelism is often necessary for maximizing input/output (I/O) bandwidth. In parallel external search we consider how to integrate external and distributed search.
  • Consider the tree shown in Figure 11.7. Note that the left subtree (rooted at node A) can be searched in parallel with the right subtree (rooted at node B). By statically assigning a node in the tree to a processor, it is possible to expand the whole subtree rooted at that node without communicating with another processor. Thus, it seems that such a static allocation yields a good parallel search algorithm.Let us see what happens if we try to apply this approach to the tree in Figure 11.7. Assume that we have two processors. The root node is expanded to generate two nodes (A and B), and each of these nodes is assigned to one of the processors. Each processor now searches the subtrees rooted at its assigned node independently. At this point, the problem with static node assignment becomes apparent. The processor exploring the subtree rooted at node A expands considerably fewer nodes than does the other processor. Due to this imbalance in the workload, one processor is idle for a significant amount of time, reducing efficiency. Using a larger number of processors worsens the imbalance. Consider the partitioning of the tree for four processors. Nodes A and B are expanded to generate nodes C, D, E, and F. Assume that each of these nodes is assigned to one of the four processors. Now the processor searching the subtree rooted at node E does most of the work, and those searching the subtrees rooted at nodes C and D spend most of their time idle. The static partitioning of unstructured trees yields poor performance because of substantial variation in the size of partitions of the search space rooted at different nodes. Furthermore, since the search space is usually generated dynamically, it is difficult to get a good estimate of the size of the search space beforehand. Therefore, it is necessary to balance the search space among processors dynamically.In dynamic load balancing, when a processor runs out of work, it gets more work from another processor that has work. Consider the two-processor partitioning of the tree in Figure 11.7(a). Assume that nodes A and B are assigned to the two processors as we just described. In this case when the processor searching the subtree rooted at node A runs out of work, it requests work from the other processor. Although the dynamic distribution of work results in communication overhead for work requests and work transfers, it reduces load imbalance among processors. This section explores several schemes for dynamically balancing the load between processors.
  • A parallel formulation of DFS based on dynamic load balancing is as follows. Each processor performs DFS on a disjoint part of the search space. When a processor finishes searching its part of the search space, it requests an unsearched part from other processors. This takes the form of work request and response messages in message passing architectures, and locking and extracting work in shared address space machines. Whenever a processor finds a goal node, all the processors terminate. If the search space is finite and the problem has no solutions, then all the processors eventually run out of work, and the algorithm terminates.Since each processor searches the state space depth-first, unexplored states can be conveniently stored as a stack. Each processor maintains its own local stack on which it executes DFS. When a processor's local stack is empty, it requests (either via explicit messages or by locking) untried alternatives from another processor's stack. In the beginning, the entire search space is assigned to one processor, and other processors are assigned null search spaces (that is, empty stacks). The search space is distributed among the processors as they request work. We refer to the processor that sends work as the donor processor and to the processor that requests and receives work as the recipient processor.
  • As illustrated in Figure 11.8, each processor can be in one of two states: active (that is, it has work) or idle (that is, it is trying to get work). In message passing architectures, an idle processor selects a donor processor and sends it a work request. If the idle processor receives work (part of the state space to be searched) from the donor processor, it becomes active. If it receives a reject message (because the donor has no work), it selects another donor and sends a work request to that donor. This process repeats until the processor gets work or all the processors become idle. When a processor is idle and it receives a work request, that processor returns a reject message. The same process can be implemented on shared address space machines by locking another processors' stack, examining it to see if it has work, extracting work, and unlocking the stack.On message passing architectures, in the active state, a processor does a fixed amount of work (expands a fixed number of nodes) and then checks for pending work requests. When a work request is received, the processor partitions its work into two parts and sends one part to the requesting processor. When a processor has exhausted its own search space, it becomes idle. This process continues until a solution is found or until the entire space has been searched. If a solution is found, a message is broadcast to all processors to stop searching. A termination detection algorithm is used to detect whether all processors have become idle without finding a solution (Section 11.4.4).
  • Recall from Section 11.2.2 that an important component of best-first search (BFS) algorithms is the open list. It maintains the unexpanded nodes in the search graph, ordered according to their l-value. In the sequential algorithm, the most promising node from the open list is removed and expanded, and newly generated nodes are added to the open list.In most parallel formulations of BFS, different processors concurrently expand different nodes from the open list. These formulations differ according to the data structures they use to implement the open list. Given p processors, the simplest strategy assigns each processor to work on one of the current best nodes on the open list. This is called the centralized strategy because each processor gets work from a single global open list. Since this formulation of parallel BFS expands more than one node at a time, it may expand nodes that would not be expanded by a sequential algorithm. Consider the case in which the first node on the open list is a solution. The parallel formulation still expands the first p nodes on the open list. However, since it always picks the best p nodes, the amount of extra work is limited. Figure 11.14 illustrates this strategy.
  • Search algorithms master

    1. 1. CSE503: Parallel Computing Prepared by: Eng. Hossam Fadeel Supervisor: Dr. Ahmed El-Mahdy 12/22/2012
    2. 2. 2
    3. 3. Outlines• Motivation • Algorithms for solving• Common Problems Needs shortest path problems Computers – Introduction to Graphs• Basic Search Algorithms – Sequential Search Algorithms• Algorithms used for • Depth-First Search searching the contents of • Breadth First Search an array – Parallel or distributed – Linear or Sequential Search Search Algorithms – Binary Search • Parallel Searching Algorithms – Comparison Between • Parallel Depth-First Search Linear and Binary Search • Parallel Breadth First Search 3
    4. 4. Motivation• You want to go from Point A to Point B. – You need to employing some kind of search. – Example. • For a direction finder, going from Point A to Point B literally means finding a path between where you are now and your intended destination. – Going from Point A to Point B is different for every situation. 4
    5. 5. Common Problems Needs Computers• There are some very common problems that we use computers to solve: – Searching through a lot of records for a specific record or set of records or for solution for unsolved problem – Sorting, or placing records in a desired order• There are numerous algorithms to perform searches and sorts. 5
    6. 6. Basic Search Algorithms• Algorithms used for searching the contents of an array Like: – Linear or Sequential Search – Binary Search• Algorithms used for solving shortest path problems Like: – Breadth-First Search and – Depth-First Search 6
    7. 7. ALGORITHMS USED FOR SEARCHINGTHE CONTENTS OF AN ARRAY 7
    8. 8. Linear or Sequential Search and Binary SearchLinear or Sequential Search Binary Search This is a very simple algorithm. Requires array elements to be in order 1. Divides the array into three It uses a loop to sequentially step sections: through an array, starting with – middle element the first element. – elements on one side of the middle element It compares each element with – elements on the other side the value being searched for (key) of the middle element 2. If the middle element is the and stops when that value is correct value, done. Otherwise, found or the end of the array is go to step 1. using only the half of the array that may contain the reached. correct value. 3. Continue steps 1. and 2. until either the value is found or there are no more elements to examine 8
    9. 9. Linear or Sequential Search • Start at the beginning and examine each element in turn. • It takes linear time in both the worst and average cases.Let’s Search for “2”: 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Start Here Results “Nothing Found” 9
    10. 10. Linear or Sequential Search • Start at the beginning and examine each element in turn. • It takes linear time in both the worst and average cases.Let’s Search for “72”: 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Start Here Search Success “Found 72” 10
    11. 11. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo hi 11
    12. 12. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo mid hi 12
    13. 13. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo hi Binary Search Example 13
    14. 14. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo mid hi 14
    15. 15. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo hi 15
    16. 16. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo mid hi 16
    17. 17. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo hi 17
    18. 18. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo hi mid 18
    19. 19. Binary Search• Binary search. Given value and sorted array a[], find index I such that a[i] = value, or report that no such index exists.• Invariant. Algorithm maintains a[lo] value a[hi].• Ex. Binary search for 33. 6 13 14 25 33 43 51 53 64 72 84 93 95 96 97 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 lo hi mid 19
    20. 20. Binary SearchNote That:The search pool must be sortedFor large n, a binary search is much faster 20
    21. 21. Comparison Between Linear and Binary Search• Linear search is easy to • Binary search is harder to implement implement but more – Basically, just a for loop efficient – Has O(n) complexity for n – Requires data to be sorted stored items – Has O(log n) complexity for n – More efficient than binary stored items search for small n O(N) O(logN) 21
    22. 22. ALGORITHMS FOR SOLVINGSHORTEST PATH PROBLEMS 22
    23. 23. INTRODUCTION TO GRAPHS 23
    24. 24. Graph traversal• Graph traversal is the problem of visiting all the nodes in a graph in a particular manner, updating and/or checking their values along the way. – Tree traversal is a special case of graph traversal.• Representing a problem as a graph can make a problem much simpler – More accurately, it can provide the appropriate tools for solving the problem• Examples of applications that involve graph processing: – Maps. shortest route between two points. – Web content. Links between pages. – Computer networks. Nature of the interconnection structure. 24
    25. 25. Basics of Graphs Graph – mathematical object consisting of a set of:  V = nodes (vertices, points).  E = edges (links, arcs) between pairs of nodes.  Denoted by G = (V, E).  Captures pairwise relationship between objects.  Graph size parameters: n = |V|, m = |E|. V = { 1, 2, 3, 4, 5, 6, 7, 8 } E = { {1,2}, {1,3}, {2,3}, {2,4}, {2,5}, {3,5}, {3,7}, {3,8}, {4,5}, {5,6} } n=8 m = 11 cycle C = 1-2-4-5-3-1 25
    26. 26. Basics of Graphs Undirected graph Directed graph loop G=(V,E) isolated vertexmultipleedges adjacent Simple graph: an undirected graph without loop or multiple edges Degree of a vertex: number of edges connected , for directed graph (indegree, outdegree)
    27. 27. Basics of Graphs a x y e path: no vertex can be repeated b a-b-c-d-e trail: no edge can be repeat a-b-c-d-e-b-d walk: no restriction a-b-d-a-b-c d c length: number of edges inclosed if x=y this (path,trail,walk)closed trail: circuit (a-b-c-d-b-e-d-a, one draw without lifting pen)closed path: cycle (a-b-c-d-a)
    28. 28. Basics of Graphs• An undirected graph is a tree if it is connected and does not contain a cycle.• Rooted tree. Given a tree T, choose a root node r and orient each edge away from r.• Importance: Models hierarchical structure. root r parent of v v (an inner node) child of v leaves a tree the same tree, rooted at 1 28
    29. 29. Graph Representation: Adjacency Matrix  Adjacency matrix. n-by-n matrix with Auv = 1 if (u, v) is an edge.  Two representations of each edge (symmetric matrix for undirected graphs; not for directed graphs).  Space: proportional to n2.  Not efficient for sparse graphs (small number of edges compared to the maximum possible number of edges in the graph).  Checking if (u, v) is an edge takes (1) time.  Identifying all edges takes (n2) time. 1 2 3 4 5 6 7 8 1 0 1 1 0 0 0 0 0 2 1 0 1 1 1 0 0 0 3 1 1 0 0 1 0 1 1 4 0 1 0 1 1 0 0 0 5 0 1 1 1 0 1 0 0 6 0 0 0 0 1 0 0 0 7 0 0 1 0 0 0 0 1 8 0 0 1 0 0 0 1 0 29
    30. 30. Graph Representation: Adjacency List  Adjacency list. Node indexed array of lists. n = |V|, m = |E|.  Two representations of each edge. degree = number of neighbors of u  Space proportional to m + n.  Checking if (u, v) is an edge takes O(deg(u)) time.  Identifying all edges takes (m+n) time = linear time for G(V,E).  Requires O(m+n) space. Good for dealing with sparse graphs. 1 2 3 2 1 3 4 5 3 1 2 5 7 8 4 2 5 5 2 3 4 6 6 5 7 3 8 8 3 7 30
    31. 31. Graph Traversing  Given a graph G(V,E), explore every vertex and every edge  Using adjacency list is more efficient  Example algorithms: Depth-first search (DFS) Breadth-first search (BFS) 31
    32. 32. SEQUENTIAL SEARCH ALGORITHMS 32
    33. 33. DFS (Depth First Search) and BFS (Breadth First Search)• DFS (Depth First Search) and BFS (Breadth First Search) are search algorithms used for graphs and trees.• When given an ordered tree or graph, like a (Binary Search Tree) BST, it’s quite easy to search the data structure to find the node that you want.• When given an unordered tree or graph, the BFS and DFS search algorithms can come to find node you’re looking for. – The decision to choose one over the other should be based on the type of data that one is dealing with. 33
    34. 34. BFS (Breadth First Search)• In a breadth first search, – Start at the root node, and then scan each node in the first level starting from the leftmost node, moving towards the right. – Then continue scanning the second level (starting from the left) and the third level, and so on until scanned all the nodes, or until find the actual node that searching for. – When traversing one level, we need some way of knowing which nodes to traverse once we get to the next level. – The way this is done is by storing the pointers to a level’s child nodes while searching that level. – The pointers are stored in FIFO (first-in-first-out) queue. This, in turn, means that BFS uses a large amount of memory because we have to store the pointers. 34
    35. 35. BFS (Breadth First Search)• An example of BFS – The numbers represent the order in which the nodes are accessed in a BFS: Example 1: Example 2: 35
    36. 36. BFS (Breadth First Search) 36
    37. 37. DFS (Depth First Search)• In a depth first search, – Start at the root, and follow one of the branches of the tree as far as possible until either the node looking for is found or hit a leaf node ( a node with no children). – If hit a leaf node, then continue the search at the nearest ancestor with unexplored children.• A major advantage of DFS is that its storage requirement is linear in the depth of the state space being searched. 37
    38. 38. DFS (Depth First Search)• An example of DFS – The numbers represent the order in which the nodes are accessed in a DFS: Example 1: Example 2: 38
    39. 39. DFS (Depth First Search) 39
    40. 40. Differences between DFS and BFS• Comparing BFS and DFS, – the big advantage of DFS is that it has much lower memory requirements than BFS, • It’s not necessary to store all of the child pointers at each level.• Depending on the data and what type of problem are looking for, either DFS or BFS could be advantageous. 40
    41. 41. PARALLEL OR DISTRIBUTED SEARCHALGORITHMS 41
    42. 42. Parallel or Distributed Search Algorithms• Modern computers exploit parallelism on the hardware level.• Parallel algorithms are designed to solve algorithmic problems – Using many processing devices (processes, processors, processor cores, nodes, units) simultaneously. – Technically easier to build a system of several communicating slower Parallelism processors than a single one that is multiple times faster. – Often require the invention of original and novel algorithms, radically different from those used to solve the same problems sequentially. – Compatibility with machine architecture, choice of suitable shared data structures, and compromise between processing and communication overhead. – More than one search process is invoked, partitioning the workload among processes. – The most important problem is to minimize the communication (overhead) between the search processes. 42
    43. 43. Parallel Searching Algorithms“Parallel State Space Searching Algorithms” Professor Keyes – APAM 4990 43Hart Lambur, Blake Shaw
    44. 44. Parallel Searching Algorithms“Parallel State Space Searching Algorithms” Professor Keyes – APAM 4990 44Hart Lambur, Blake Shaw
    45. 45. Parallel Searching Algorithms“Parallel State Space Searching Algorithms” Professor Keyes – APAM 4990 45Hart Lambur, Blake Shaw
    46. 46. Parallel Searching Algorithms“Parallel State Space Searching Algorithms” Professor Keyes – APAM 4990 46Hart Lambur, Blake Shaw
    47. 47. Parallel Depth-First Search• The critical issue in parallel depth-first search algorithms is the distribution of the search space among the processors. The unstructured nature of tree search and the imbalance resulting from static partitioning. 47
    48. 48. Parallel Depth-First Search• A parallel formulation of DFS based on dynamic load balancing is as follows: – Each processor performs DFS on a disjoint part of the search space. – When a processor finishes searching its part of the search space, it requests an unsearched part from other processors. – Whenever a processor finds a goal node, all the processors terminate. – If the search space is finite and the problem has no solutions, then all the processors eventually run out of work, and the algorithm terminates. 48
    49. 49. Parallel Depth-First Search A generic scheme for dynamic load balancing 49
    50. 50. Parallel Best-First Search• In most parallel formulations of BFS, different processors concurrently expand different nodes from the open list. – Differ according to the data structures they use to implement the open list. – Given p processors, the simplest strategy assigns each processor to work on one of the current best nodes on the open list. • This is called the centralized strategy because each processor gets work from a single global open list. 50
    51. 51. Parallel Best-First SearchA general schematic for parallel best-first search using a centralized strategy 51
    52. 52. A Parallel formulation of Best-First Search• We parallelize BFS by sharing the work to be done among a number of processors.• Each processor searches a disjoint part of the search space in a BFS fashion.• When a processor has finished searching its part of the search space, it tries to get an unsearched part of the search space from the other processors.• When a goal node is found, all of the processors quit.• If the search space is finite and has no solutions, then eventually all the processors would run out of work. (search terminate) 52
    53. 53. Parallel Breadth First Search 53
    54. 54. References• http://www.programmerinterview.com/index.php/data-structures/dfs-vs-bfs/• http://www.kirupa.com/developer/actionscript/depth_breadth_search.htm• http://www.cs.duke.edu/~josh/bfsdfs.html• http://www.math.brown.edu/~banchoff/STG/ma8/papers/emorgan/algorithms.ht ml There are many references I use it for this presentation when preparing, but I didn’t remember most of them. 54
    55. 55. 55

    ×