Upcoming SlideShare
×

# Algorithm Design and Complexity - Course 7

4,725 views

Published on

Published in: Education, Technology
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

### Algorithm Design and Complexity - Course 7

1. 1. Algorithm Design and Complexity Course 7
2. 2. Overview     Graphs – Introduction Representing Graphs Practical Examples of Using Graphs Search Algorithms    BFS DFS Topological Sorting
3. 3. Graphs – Introduction      Graphs are very important data structures as the model a lot of real-life objects There are a lot of problems that must be solved for graphs Some of them are very difficult (NP-complete) Others are in P In this chapter, we shall consider problems that are in P, therefore accept polynomial time algorithms for finding the exact solution
4. 4. Very Useful in Practice There exist a lot of Open-Source libraries for graphs: - Graphviz - http://www.graphviz.org/ - Prefuse - http://prefuse.org/ - Flare - http://flare.prefuse.org/ - Gephi  - Useful for: - Generating graphs Visualizing graphs Modeling graphs
5. 5. Representing Graphs  As input data:   Pairs of vertices representing the edges: (Src, Dest) Specialized data formats for representing graphs     RDF GraphML dot In memory, for designing algorithms    Adjacency lists Adjacency matrix Incidence matrix
6. 6. Dot graph G {node; Dristor2--Muncii--Iancului—Obor; Piata_Victoriei1--Gara_de_nord--Crangasi--Grozavesti--Eroilor; Pacii--Lujerului--Politehnica--Eroilor; Republica--Titan--Dristor1--Timpuri_Noi--Unirii1--Izvor--Eroilor; Dristor1--Dristor2; Unirii1--Unirii2; Piata_Victoriei1--Piata_Victoriei2; Piata_Sudului--Eroii_Revolutiei--Tineretului--Unirii2--Romana-Piata_Victoriei2--Aviatorilor--Pipera; }
7. 7. GraphML <graphml xmlns="http://graphml.graphdrawing.org/xmlns"> <graph edgedefault="undirected"> <!-- data schema --> <key id="name" for="node" attr.name="name" attr.type="string"/> <key id="gender" for="node" attr.name="gender" attr.type="string"/> <!-- nodes --> <node id="1"> <data key="name">Jeff</data> <data key="gender">M</data> </node> <node id="2"> <data key="name">Ed</data> <data key="gender">M</data> </node> <edge source="1" target="2"></edge> </graph> </graphml>
8. 8. Adjacency Lists  An array with |V| elements A One entry for each vertex  Each component in the array contains the list of neighbors for the index vertex  Adj[u] ; u V  I B B H C D E F F G C H A K K K A J B B I J G L L D L E F E D H A G C J K
9. 9. Adjacency Matrix A[i, j] = 1 if (i, j) 0 if (i, j) A E E A B C D E F 1 G H I J K 1 L 1 1 1 B C 1 1 D E I 1 F 1 G A J H 1 I B 1 J K G K C L H L D E F 1 1 1
10. 10. Incidence Matrix  B[u, e]      u V e E = 1 if edge e leaves vertex u = -1 if edge e enters vertex u = 0 otherwise
11. 11. Adjacency Matrix vs Adjacency Lists    Which one is better ? Answer: It depends Graphs may be:    Adjacency matrix:        Space required: (n2) Time for going through all edges: (n2) Time for finding if an edge exists: (1) Adjacency lists:   Sparse: m = O(n) Dense: m = (n2) Space required: (n+m) Time for going through all edges: (n+m) Time for finding if an edge exists: (max(|Adj(u)|)) |V| = n, |E| = m Matrix is better for dense graphs, lists are better for sparse graphs
12. 12. Practical Examples of Using Graphs  Maps (roads, etc.), networks (computer, electric, etc.), Web, flow networks (traffic – cars or computer networks, pipes, etc.), relations between processes/activities…  Simple examples:    The shortest path on Google Maps. The people that are most central (or most important) in a social network Google PageRank
13. 13. The Web + PageRank http://en.wikipedia.org/wiki/PageRank
14. 14. Computer Networks http://ist.marshall.edu/ist362/pics/OSPF.gif
15. 15. Social Web
16. 16. Semantic Web  Source: http://www.semanticfocus.com/media/insets/rdf-graph.png
17. 17. Linked Open Data on the Web (http://richard.cyganiak.de/2007/10/lod/)
18. 18. Graph Search Algorithms  Graph search (traversal)    The result is a list of nodes in the order that they are visited This list should be useful to us!   A methodology to go through all the nodes in a graph Should contain important information about the graph Graph search are very simple algorithms, but have quite some applications   Lee’s algorithm for BFS Topological sorting, SCC, articulation points for DFS
19. 19. Data & Notations       G = (V, E) the graph (either directed or undirected)  V – set of vertices (|V| = n)  E – of edges (|E| = m) The edges are not weighted  May also consider them to have equal weights: 1 (u,v) – an edge from node u to node v; u..v – path from u to v; we can also denote intermediate nodes u..x..v, u..y..v R(u) - reachable(u) = the set of nodes that can be reached by a path starting from node u Adj(u) – nodes that are adjacent to node u
20. 20. Data & Notations (2)  color(u) – color of node u – encodes the state of node u during the search:    White – undiscovered by the search; Grey – discovered, but not finished; Black – finished (any finished node is also discovered)   We have discovered all nodes in Adj(u) – for BFS We have discovered and finished all nodes in R(u) – for DFS  p(u) – parent of u – encodes the previous node on the path used to discover node u in the search  Other notations exist, but are specific to BFS or DFS
21. 21. Breadth First Search (BFS)  Uses a start vertex for the traversal: s  Objective: determine the minimum number of edges (shortest path considering that all the edges have the same weight = 1) between the source s and all the other vertices of the graph  We also traverse the vertices in the order of their distance from the source vertex  δ(s,u) – the cost of a optimum path s..u; δ(s,u) = ∞ <=> u R(s) d[u] = d(s,u) – the cost of the current discovered path s..u 
22. 22. BFS  For each node u, we store:    d[u] = d(s,u) – current distance from node s to node u p[u] color[u]  We also use a queue for storing the discovered nodes that are not yet finished  The predecessors form a tree called a BFS tree    The edges are (p[u], u), where p[u] != NULL The source s is the root The level of each node in the BFS tree is d[u]
23. 23. BFS – Algorithm BFS(G, s) FOREACH (u ∈ V) p(u) = NULL; d[u] = INF; color[u] = WHITE; // initialization Q= // the queue d[s] = 0; color[s] = GREY Q.push(s) WHILE (Q.length() > 0) // while we still have discovered nodes u = Q.pop() FOREACH (v ∈ Adj(u)) // for all neighbors of u IF (color[v] == WHITE) // if the node is undiscovered d[v] = d[u] + 1 p[v] = u color[v] = GREY Q.push(v) color[u] = BLACK // the current node is finished
24. 24. Complexity  Depends on how the graph is stored    It influences how we iterate through Adj(u) (n+m) for adjacency lists (n2) for adjacency matrix WHILE (Q.length() > 0) u = Q.pop() FOREACH (v ∈ Adj(u)) // do something  // at most n times – once for each node // n+m times for adjacency lists It makes sense to use adjacency lists for BFS
25. 25. Example I Source = A A Q = A; d(A) = 0 p(A) = null J B I G H G L H E Q = B, C d(C) = 2 Q = C, H d(H) = 2 I A A J B I A J B K C H D K C L E H D p(C) = B G K C H D F Q=E H D F p(H) = G p(D) = p(E) = C F Q=Ø I I A A J B A J B C H L D K F C L H D E F p(F) = E J B G K E E F I K C L L Q=F d(F)=4 G / G H E F J B E E F L A K D I J B G G L Q = H, D, E Q = D, E d(D)=d(E)=3 I G C D p(B) = A p(G) = A J B K D F A J B C E I A K C Q = G, B d(B) = d(G) = 1 G K C L H L D E F
26. 26. BFS – Properties & Correctness  While running BFS on graph G starting from source s: v∈Q ⇔ v∈R(s)  Not all the nodes in G are traversed  Only the nodes that are reachable from s  The others remain unexplored therefore d(u) = δ(s, u) = INF u R(s) (u,v) ∈ E, δ(s,v) ≤ δ(s,u) + 1   Usually, δ(s,v) = δ(s,u) + 1 when p(v) = u
27. 27. BFS – Properties & Correctness (2)  Loop invariants – for the outer loop Let S = {nodes that have been popped out the queue}  color[u] =       != NULL NULL if u Q U S if u V Q S d[u] =    if u S if u Q if u V Q S p[u] =   BLACK GREY WHITE != INF INF if u Q U S if u V Q S Let Q = {v1, …, vp} ; p >= 1  d[v1] d[v2] … d[vp]
28. 28. BFS – Properties & Correctness (3)   d[v1] d[vp] Why?    Because the white neighbors of v1 are pushed in the queue and they have d[u] = d[v1] +1 But, d[v1] … d[vp] d[u] = d[v1] +1 Therefore the nodes are added to the queue in order of their d[u]   d[v1] + 1 And never change again Using all the above properties, we can prove that d[v] = δ(s,v)  v V Thus BFS is correct!
29. 29. Depth First Search (DFS)  There is no source vertex All the vertices of the graph are traversed This traversal does not compute the shortest distance between vertices, but has a lot of useful applications  It computes two elements for each vertex:      d[u] = discovery time for vertex u f[u] = finish time for vertex u It uses a discrete time between 1.. 2*n that is incremented each time a value is assigned to the discovery or finish time of a node
30. 30. Discovery and finish time  Discovery of a vertex u:    When the vertex is seen for the first time in the traversal Changes color from WHITE to GREY Finishing a vertex u:     When the search leaves the vertex All the nodes that could have been discovered from that vertex are either GREY or BLACK There is no WHITE vertex in R(u) Changes color from GREY to BLACK
31. 31. Data Structures  We need a stack (LIFO) in order to implement the traversal in order to discover all the nodes in order of how they are reached   For each node u, we use:      The predecessors are lower in the stack p[u] color[u] d[u] f[u] color[u] and p[u] have the same meaning as for BFS
32. 32. DFS Trees  Similar to BFS, the predecessors form a forest of DFS trees:      The edges are (p[u], u), where p[u] != NULL The roots of the DFS trees are those vertices that have p[u] = NULL There is a forest as it might be more than a single tree All the vertices in R(u) that are WHITE when u is discovered become descendents of u in the DFS tree All the ancestors of u in the DFS tree are colored in GREY (including the source)
33. 33. DFS - Algorithm DFS(G) FOREACH (u ∈ V) color[u] = WHITE; p[u] = NULL; // initialization time = 0; FOREACH (u ∈ V) IF (color[u] == WHITE) DFS_Visit (u); DFS_Visit(u) d[u] = ++time // choose the roots of the DFS trees // start exploring the node // exploring a node // the node is discovered color[u] = GREY FOREACH (v ∈ Adj(u)) // look for undiscovered neighbors IF (color[v] == WHITE) p[v] = u DFS_Visit(v) // continue exploring this vertex color[u] = BLACK f[u] = ++time // the node is finished
34. 34. Complexity  DFS_Visit is called exactly once for each vertex of the graph    Therefore the complexity would be:   When the vertex is WHITE When it is discovered n * complexity(DFS_Visit without the recursive call) Therefore, similar to BFS:   (n+m) if using adjacency lists (n2) if using adjacency matrix
35. 35. DFS - Example The notation next to each vertex means d[u]/f[u]  I 17/ I A 1/16 A J 18/ J B 2/5 B G K K G 6/15 C C 7/14 H H 3/4 L L D D 8/9 E F E 10/13 F 11/12
36. 36. DFS – Example (2)  Some of the steps have been omitted I A J B G K C H D L E F A A J B G H D G H D F H D F G H D G C H D F J B K G K C L H D E E E F L A J B K C L A J B K E E E G C L A I I I J B K C L A J B K C I I I F F L
37. 37. DFS Tree – Example  In fact, the edges have the opposite sense than the one represented in the figure I A J B G K C H L D E F
38. 38. DFS – Properties  The DFS forest can be formally defined as:  Arb(G) = {Arb(u); p(u) = NULL}  Arb(u) = (V(u), E(u))    If G is undirected   V(u) = {v | d(u) < d(v) < f(u)} + {u}; E(u) = {(v, z) | v in V(u), z in V(u) && p(z) = v} G is a connected graph <=> Arb(G) has a single tree For a given graph, running DFS may build different DFS forests (and thus different DF traversals)   Depending on the order of choosing the roots of the trees Depending on how the elements in Adj(u) are chosen
39. 39. Parenthesis Theorem  u, v V, there are three correct alternatives to arrange the discovery and finish times of the two nodes:  d[u]       f[u] (u … (v … v) … u) f[u] d[v] f[v] (u … u) (v … v) there is no direct descendent relation between u and v d[v]  f[v] v is a descendant of u in the DFS tree d[u]  d[v] f[v] d[u] f[u] (v … v) (u … u) there is no direct descendent relation between u and v u and v may be in different trees or on different paths in the same tree (u = discovery of u u) = finishing of u
40. 40. White Path Theorem  At time d[u], any node v that is:    WHITE and Reachable from u (in R(u)) There exists a path u..v that consists of only WHITE vertices (except u that is GRAY)  Shall become a descendant of u in the same DFS tree that u belongs too  Alternative:  v is a descendent of u in a DFS tree <=> there exists a path that consists of only WHITE vertices (except u that is GRAY)
41. 41. Edge Classification (u, v) E is in one of the following classes:   Tree edge   Back edge   Any edge from a node to one of its descendants that are not its children Cross edge   Any edge from a node to one of its ancestors in the DFS tree Forward edge   Any edge that is part of a DFS tree Any other edge that cannot be classified in one of the above classes Which are the colors of the two vertices ?
42. 42. Edge Classification (2)  Tree edge    Back edge     Any edge from a node to one of its descendants that are not its children (u, v) => u – GREY ; v - BLACK Cross edge    Any edge from a node to one of its ancestors in the DFS tree (u, v) => u – GREY ; v - GREY Forward edge   Any edge that is part of a DFS tree (u, v) => u – GREY ; v - WHITE Any other edge that cannot be classified in one of the above classes (u, v) => u – GREY ; v - BLACK How can we classify a cross edge from a forward edge?  Use the relationship between d[u] and d[v]
43. 43. Edge Classification (3)  An undirected graph only has two types of edges:   Tree edges Back edges  There cannot be any forward edges (because the edges have no orientation) or cross edges  Theorem: A directed graph G is acyclic <=> G has no backward edges in a DFS search   Demo: on whiteboard The same is true for undirected graphs as well
44. 44. Application of Graph Searches  BFS:  Finding the minimum path in a maze with obstacles, a source and a destination   Called Lee’s algorithm DFS:      Topological Sorting Strongly Connected Components Articulation Points Bridges Biconnected Components
45. 45. Topological Sorting   Given a DAG (Directed Acyclic Graph) Used in real applications:  Activity diagrams:       Nodes: activities Edges: dependencies between activities Bayesian networks Combinatorial logic Compilers We want to order the vertices: find A[1..n] such that for (u, v) E and A[i] = u, A[j] = v => i < j
46. 46. Topological Sorting – General  It is used to sort partially sorted sets     Let S– the set ∝ - the partial order relationship    Sets for which we can define a partial order relationship There are elements that cannot be ordered ∝:SxS We call a topological sort of S, a list A = {s1, …,sn} that consists of all the elements from S, such that for any si ∝ sj => i < j In the case of DAGs, the relationship are given by the orientation of the edges: si ∝ sj <=> (si, sj) E
47. 47. Topological Sorting – Example  Source: http://serverbob.3x.ro/IA/images/fig572_01_0.jpg  The example is from CLRS
48. 48. Topological Sorting – Idea       Run a DFS on the graph, regardless how the DFS trees root vertices are chosen Also use a list for storing the topological sorting After finishing a vertex, append it to the beginning of the list At the end, all the elements in the list shall be ordered by their finish time: A = (v1, v2, …, vn) => f[vi] > f[vj] 1 i < j n Remark: A DAG may have more than a single topological sorting
49. 49. Algorithm DFS(G) FOREACH (u ∈ V) color[u] = WHITE; p[u] = NULL; A= FOREACH (u ∈ V) IF (color[u] == WHITE) // initialization // the list for storing the topological sorting DFS_Visit (u, A) PRINT A // print the topological sorting DFS_Visit(u, A) // A is transmitted by reference color[u] = GREY FOREACH (v ∈ Adj(u)) // look for undiscovered neighbors IF (color[v] == WHITE) p[v] = u DFS_Visit(v, A) ELSE IF (color[v] == GREY) print ‘ERROR: This is not a DAG!’ EXIT color[u] = BLACK A = cons(u, A) // insert in front of list A
50. 50. Correctness  We need to show that  Let’s consider that we explore (u, v)  It means that u is GREY  (u, v) E : f[v] < f[u] Is v:    GRAY? NO, because (u, v) would be a back edge in a DAG. Impossible! WHITE? Then d[u] < d[v] < f[v] < f[u] (Parenthesis theorem, Tree edge) BLACK? Then d[v] < f[v] < d[u] < f[u] (Cross edge) or d[u] < d[v] < f[v] < f[u] (Forward edge)
51. 51. Conclusions  We have seen that graphs are very important in modeling structures from the real world  Graph traversals are very simple algorithms  They are also very useful  Lots of applications  We have seen one of them: topological sorting
52. 52. References  CLRS – Chapter 23  MIT OCW – Introduction to Algorithms – video lecture 17