Upcoming SlideShare
×

# Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

1,902 views
1,692 views

Published on

AACIMP 2011 Summer School. Operational Research stream. Lecture by Sergiy Butenko.

Published in: Education, Technology
2 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
1,902
On SlideShare
0
From Embeds
0
Number of Embeds
76
Actions
Shares
0
69
0
Likes
2
Embeds 0
No embeds

No notes for slide

### Clique Relaxation Models in Networks: Theory, Algorithms, and Applications

1. 1. Clique Relaxation Models in Networks: Theory, Algorithms, and ApplicationsSergiy Butenko (butenko@tamu.edu)Department of Industrial and Systems EngineeringTexas A&M UniversityCollege Station, TX 77843-3131 1/93
2. 2. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 2/93
3. 3. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 3/93
4. 4. GraphsGraph is the mathematical term for a “network” - often visualized asvertices (points, nodes) connected by edges (lines, arcs) 4/93
5. 5. Graph theory basicsThe origins of graph theory are attributed to the Seven Bridges ofK¨nigsberg problem solved by Leonhard Euler in 1735. o 5/93
6. 6. Graph theory basicsA simple, undirected graph is a pair G = (V , E ), where V is a ﬁnite set ofvertices and E ⊆ V × V is a set of edges, with each edge deﬁned on a pairof vertices. 6 5 1 4 2 3 V = {1, 2, 3, 4, 5, 6} E = {(1, 2), (1, 6), (2, 3), (2, 4), (3, 4), (4, 5), (5, 6)} 6/93
7. 7. Graph theory basicsA graph can be represented in several ways. We can represent a graph G = (V , E ) by its |V | × |V | adjacency matrix AG = [aij ], such that 1, if (vi , vj ) ∈ E , aij = 0, otherwise. Another way of representing a graph is by its adjacency lists, where for each vertex v ∈ V we record the set of vertices adjacent to it. 7/93
8. 8. Graph theory basicsA multigraph is a graph with repeated edges.A directed graph, or digraph, is a graph with directions assignedto its edges. 2  7  b  I  d q    5  d        d I  d d        q ~  1 E 3  d  d  8 E 10 d d    d     I b     q   d 6     d   dd   I     ~    q  4 9 8/93
9. 9. Graph theory basicsIf G = (V , E ) is a graph and e = (u, v ) ∈ E , we say that u isadjacent to v and vice-versa. We also say that u and v areneighbors.Neighborhood N(v ) of a vertex v is the set of all its neighborsin G : N(v ) = {j : (i, j) ∈ E }.If G = (V , E ) is a graph and e = (u, v ) ∈ E , we say that e isincident to u and v .The degree of a vertex v is the number of its incident edges. 9/93
10. 10. Graph theory basicsG = (V , E ) is a simple undirected graph, V = {1, 2, . . . , n}.G = (V , E ), is the complement graph of G = (V , E ), whereE = {(i, j) | i, j ∈ V , i = j and (i, j) ∈ E }. /For S ⊆ V , G (S) = (S, E ∩ S × S) the subgraph induced by S. 10/93
11. 11. Cliques and independent setsA subset of vertices C ⊆ V is called a clique if G (C ) is acomplete graph.A subset I ⊆ V is called an independent set (stable set, vertexpacking) if G (I ) has no edges. ¯C is a clique in G if and only if C is an independent set in G . 11/93
12. 12. Cliques and independent setsA clique (independent set) is said to be – maximal, if it is not a subset of any larger clique (independent set); – maximum, if there is no larger clique (independent set) in the graph.ω(G ) – the clique number of G .α(G ) – the independence (stability) number of G . 12/93
13. 13. Cliques and independent setsG 1 1 G Complement 5 2 5 2 4 3 4 3{1,2,5} : maximal clique {1,2,5} : maximal independent set {1,4} : maximal {1,4} : maximal clique independent set{2,3,4,5} : maximum clique {2,3,4,5} : maximum independent set 13/93
14. 14. Social networksA social network is described by G = (V , E ) where V is the set of“actors” and E is the set of “ties”. actors are people and a tie exists if two people know each other. actors are wire transfer database records and a tie exists if two records have the same matching ﬁeld. actors are telephone numbers and a tie exists if calls were made between them. 14/93
15. 15. Social network analysis“Popular” social networks: Kevin Bacon Number Erd¨s Number o Six degrees of separation and small world phenomenon in acquaintance networks 15/93
16. 16. Cohesive subgroupsCohesive subgroups are “tightly knit groups” in a socialnetwork.Social cohesion is often used to explain and develop sociologicaltheories.Members of a cohesive subgroup are believed to shareinformation, have homogeneity of thought, identity, beliefs,behavior, even food habits and illnesses. 16/93
17. 17. ApplicationsAcquaintance Networks - criminal network analysisWire Transfer Database Networks - detecting moneylaunderingCall Networks - organized crime detectionProtein Interaction Networks - predicting protein functionsGene Co-expression Networks - detecting network motifsMetabolic Networks - identifying metabolic pathwaysStock Market Networks - stock portfoliosInternet Graphs - information search and retrievalWireless and telecommunication networks - clustering androuting 17/93
18. 18. Random graph modelsUniform random graphs (Erdos Renyi): G (n, p): n vertices, each edge exists with a probability p; G (n, M): all graphs with n vertices and M edges are assigned the same probabilityPower-law random graphs: the probability that a vertex has a degree k is proportional to k −βRandom graphs with a given degree sequence 18/93
19. 19. Scale-free graphsScale-free nature : A property observed in many natural andman-made networks - biological networks, social networks,internet graphsDegree distribution follows a power law - P(k) ∝ k −βFundamentally diﬀerent from classical random graph model ofErd¨s and Renyi oPrinciple of preferential attachment underlies the growth andevolution of such networks (i.e. the rich get richer!) 19/93
20. 20. Science - co-authorship networkImage source:http://www.jeﬀkennedyassociates.com:16080/connections/concept/image.html 20/93
21. 21. Internet colored by IP addressesImage source:http://www.jeﬀkennedyassociates.com:16080/connections/concept/image.html 21/93
22. 22. H. Pylori - protein interactionsImage generated using Graphviz 22/93
23. 23. Properties of cohesive subgroupsSome desirable properties of a cohesive subgroup are: Familiarity (degree); Reachability (distance, diameter); Robustness (connectivity); Density (edge density).Earliest models of cohesive subgroups were cliques (Luce and Perry,1949), representing an “ideal” model of a cohesive subgroup. 23/93
24. 24. Clique relaxations: k-cliques and k-clubsA k-clique is a subset of vertices C such that for every i, j ∈ C ,d(i, j) ≤ k.A k-club is a subset of vertices D such that diam(G [D]) ≤ k. 6 5 1 4 2 3 24/93
25. 25. Clique relaxations: k-cliques and k-clubsA k-clique is a subset of vertices C such that for every i, j ∈ C ,d(i, j) ≤ k.A k-club is a subset of vertices D such that diam(G [D]) ≤ k. 6 {2,3,4} is a 1-club ... the “regular” clique 5 1 4 2 3 24/93
26. 26. Clique relaxations: k-cliques and k-clubsA k-clique is a subset of vertices C such that for every i, j ∈ C ,d(i, j) ≤ k.A k-club is a subset of vertices D such that diam(G [D]) ≤ k. 6 {2,3,4} is a 1-club ... the “regular” clique 5 1 {1,2,4,5,6} is a 2-club 4 2 3 24/93
27. 27. Clique relaxations: k-cliques and k-clubsA k-clique is a subset of vertices C such that for every i, j ∈ C ,d(i, j) ≤ k.A k-club is a subset of vertices D such that diam(G [D]) ≤ k. 6 {2,3,4} is a 1-club ... the “regular” clique 5 1 {1,2,4,5,6} is a 2-club {1,2,3,4,5} is a 2-clique but 4 2 NOT a 2-club 3 24/93
28. 28. Clique relaxations: k-cliques and k-clubsA k-clique is a subset of vertices C such that for every i, j ∈ C ,d(i, j) ≤ k.A k-club is a subset of vertices D such that diam(G [D]) ≤ k. 6 {2,3,4} is a 1-club ... the “regular” clique 5 1 {1,2,4,5,6} is a 2-club {1,2,3,4,5} is a 2-clique but 4 2 NOT a 2-club maximality of a 2-club is harder 3 to test 24/93
29. 29. k-plexDeﬁnitionA subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S]) ≥ |S| − ki.e. every vertex in G [S] has degree at least |S| − k. 2 1 6 3 5 4 25/93
30. 30. k-plexDeﬁnitionA subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S]) ≥ |S| − ki.e. every vertex in G [S] has degree at least |S| − k. 2 1 {3,4,5,6} is a 1-plex ... the “regular” clique 6 3 5 4 25/93
31. 31. k-plexDeﬁnitionA subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S]) ≥ |S| − ki.e. every vertex in G [S] has degree at least |S| − k. 2 1 {3,4,5,6} is a 1-plex ... the “regular” clique 6 3 {1,3,4,5,6} is a 2-plex (and NOT a 1-plex) 5 4 25/93
32. 32. k-plexDeﬁnitionA subset of vertices S is said to be a k-plex if the minimum degreein the induced subgraph δ(G [S]) ≥ |S| − ki.e. every vertex in G [S] has degree at least |S| − k. 2 1 {3,4,5,6} is a 1-plex ... the “regular” clique 6 3 {1,3,4,5,6} is a 2-plex (and NOT a 1-plex) {1,2,3,4,5,6} is a 3-plex 5 4 (and NOT a 2-plex) 25/93
33. 33. co-k-plexDeﬁnitionA subset of vertices S is a co-k-plex if the maximum degree in theinduced subgraph ∆(G [S]) ≤ k − 1.i.e. degree of every vertex in G [S] is at most k − 1.S is a co-k-plex in G if and only if S is a k-plex in the complement ¯graph G . 2 1 2 1 6 3 6 3 5 4 5 4 3-plex Co-3-plex 26/93
34. 34. Structural Properties of a k-PlexIf G is a k-plex then 1. Every subgraph of G is a k-plex; n+2 2. If k 2 then diam(G ) ≤ 2; 3. κ(G ) ≥ n − 2k + 2. k-plexes for “small” k values, guarantee reachability and connectivity while relaxing familiarity. 27/93
35. 35. Known clique relaxation modelsk-clique (distance)k-club (diameter)k-plex (degree)k-core (degree)k-connected subgraph (connectivity)γ-quasi-clique (density)(γ, λ)-quasi-clique (density, degree)R-robust k-club (connectivity, diameter) 28/93
36. 36. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 29/93
37. 37. Alternative clique deﬁnitionsDistance-basedVertices are distance one away from each other.Diameter-basedVertices induce a subgraph of diameter one.Domination-basedEvery one vertex forms a dominating set. 30/93
38. 38. Alternative clique deﬁnitionsDegree-basedEach vertex is connected to all other vertices.Connectivity-basedAll vertices need to be removed to disconnect the induced subgraph.Density-basedVertices induce a subgraph that has all possible edges. 31/93
39. 39. Order of a clique relaxationWe introduce a hierarchy of clique relaxation models by deﬁningtheir order as follows. Clique is the only clique relaxation of order 0. Clique relaxations of order 1 relax one of the elementary properties (distance, diameter, ...) used to deﬁne clique. Clique relaxations of order 2 relax two of the elementary clique-deﬁning properties. ... 32/93
40. 40. Nature of a clique relaxationParametric clique relaxationsrelax a parameter (“one”, “all”) describing an elementaryclique-deﬁning property.Partial clique relaxationsrequire only a (high) fraction of elements to satisfy an elementaryclique-deﬁning property or a parametric clique-relaxation property. 33/93
41. 41. Types of parametric clique relaxationsParametric clique relaxation of type 1Replace “one” with “(at most) k”, k 1, in a deﬁnition of clique.Parametric clique relaxation of type 2Replace “one” with “(at most) all but k”, k 1, in a deﬁnition of clique.Parametric clique relaxation of type 1aReplace “all” with “(at least) k”, k 1, in a deﬁnition of clique.Parametric clique relaxation of type 2aReplace “all” with “(at least) all but k”, k 1, in a deﬁnition of clique. 34/93
42. 42. First-order parametric clique relaxations of type 1 Replace “one” with “(at most) k”, k 1.Distance-based: k-cliqueVertices are distance at most k away from each other.Diameter-based: k-clubVertices induce a subgraph of diameter at most k.Domination-based: k-plexEvery k vertices form a dominating set. 35/93
43. 43. First-order parametric clique relaxations of type 2 Replace “one” with “(at most) all but k”, k 1Distance-based: N/AVertices in S are distance at most |S| − k away from each other.Diameter-based: N/AVertices induce a subgraph G [S] of diameter at most |S| − k.Domination-based: N/AEvery |S| − k vertices form a dominating set. 36/93
44. 44. First-order parametric clique relaxations of type 1a Replace “all” with “(at least) k”, k 1Degree-based: k-coreEach vertex is connected to at least k other vertices.Connectivity-based: k-connected subgraphAt least k vertices need to be removed to disconnect the inducedsubgraph.Density-based: N/AVertices induce a subgraph that has at least k edges. 37/93
45. 45. First-order parametric clique relaxations of type 2a Replace “all” with “(at least) all but k”, k 1.Degree-based: k-plexEach vertex is connected to at least all but k other vertices.Connectivity-based: N/AAt least all but k vertices need to be removed to disconnect theinduced subgraph.Density-based: N/AVertices induce a subgraph that has at least all but k edges. 38/93
46. 46. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 39/93
47. 47. Basics of the complexity theoryGiven a combinatorial optimization problem, a natural questionis: is this problem “easy” or “hard”?Methods for easy and hard problems are fundamentallydiﬀerent.How do we distinguish between “easy” and “hard” problems? 40/93
48. 48. “Easy” problemsBy easy or tractable problems we mean the problems that canbe solved in time polynomial with respect to their size.We also call such problems polynomially solvable and denotethe class of polynomially solvable problems by P. Sorting Minimum weight spanning tree Linear programming 41/93
49. 49. Deﬁning “hard” problemsHow do we deﬁne “hard” problems?How about deﬁning hard problems as all problems that are noteasy, i.e., not in P?Then some of the problems in such a class could be TOO hard– we cannot even hope to be able to solve them.We want to deﬁne a class of hard problems that we may beable to solve, if we are lucky (say, we may be able to guess thesolution and check that it is indeed correct). 42/93
50. 50. Deﬁning “hard” problemsLet us take the maximum clique problem as an example and tryto guess a solution.We can pick some random clique, compute its size, but how dowe know if this solution is optimal?Indeed, this is as diﬃcult to verify as to solve the maximumclique problem – it is unlikely that we can count on luck in thiscase...How about transforming our optimization problem to such aform, where instead of ﬁnding an optimal solution the questionwould be “is there a solution with objective value ≥ s, where sis a given integer? 43/93
51. 51. Three versions of optimization problemsConsider a problem min f (x) subject to x ∈ X . Optimization version: ﬁnd x from X that maximizes f (x); Answer: x ∗ maximizes f (x) Evaluation version: ﬁnd the largest possible f (x); Answer: the largest possible value for f (x) is f ∗ Recognition version: Given f ∗ , does there exist an x such that f (x) ≥ f ∗ ? Answer: “yes” or “no” 44/93
52. 52. Recognition problemsRecognition problems can still be undecidable. Halting problem: Given a computer program with its input, will it ever halt?Recall the “luck factor”: if we pick a random feasible solutionand it happens to give “yes” answer, then we solved theproblem in polynomial time. Max clique: Randomly pick a solution (a clique). If its size is ≥ s (which we can verify in polynomial time), then obviously the answer is “yes”. This clique can be viewed as a certiﬁcate proving that this is indeed a yes instance of max clique. 45/93
53. 53. Class N PWe only consider problems for which any yes instance thereexists a concise (polynomial-size) certiﬁcate that can be veriﬁedin polynomial time.We call this class of problems nondeterministic polynomial anddenote it by N P. 46/93
54. 54. P vs N PNote that any problem from P is also in N P (i.e., P ⊆ N P),so there are easy problems in N P.So, are there “hard” problems in N P, and if there are, how dowe deﬁne them?We don’t know if P = NP, but “most” people believe thatP = NP.An “easy” way to make \$1,000,000!http://www.claymath.org/millennium/P vs NP/We can call a problem hard if the fact that we can solve thisproblem would mean that we can solve any other problem incomparable amount of time. 47/93
55. 55. Polynomial reducibilityReduce π1 to π2 : if we can solve π2 fast, then we can solve π1fast, given that the reduction is “easy”.Polynomial reduction from π1 to π2 requires existence ofpolynomial-time algorithms 1. A1 converts an input for π1 into an input for π2 ; 2. A2 converts an output for π2 into output for π1 .Transitivity: If π1 is polynomially reducible to π2 and π2 ispolynomially reducible to π3 then π1 is polynomially reducibleto π3 . 48/93
56. 56. NP-complete problemsA problem π is called NP-complete if 1. π ∈ NP; 2. Any problem from NP can be reduced to π in polynomial time.A problem π is called NP-hard if any problem from NP can bereduced to π in polynomial time. (no π ∈ NP requirement)Due to transitivity of polynomial reducibility, in order to showthat a problem π is N P-complete, it is suﬃcient to show that 1. π ∈ NP; 2. There is an NP-complete problem π that can be reduced to π in polynomial time.To use this observation, we need to know at least oneN P-complete problem... 49/93
57. 57. Satisﬁability (SAT) problemA Boolean variable x is a variable that can assume only the valuestrue and false.Boolean variables can be combined to form Boolean formulas usingthe following logical operations: 1. Logical AND (∧ or ·) -conjunction 2. Logical OR (∨ or +) - disjunction 3. Logical NOT (¯) x kjA clause is Cj = yjp , where a literal yjp is xr or xr for some r . ¯ p=1 mConjunctive normal form (CNF):F = Cj , where Cj is a clause. j=1 50/93
58. 58. Satisﬁability problem A CNF F is called satisﬁable if there is an assignment of variables such that F = 1 (TRUE ). Satisﬁability (SAT) problem: Given m clauses C1 , . . . , Cm involving the variables x1 , . . . , xn , is the CNF m F = Cj , j=1 satisﬁable?Theorem (Cook, 1971)SAT is NP-complete. 51/93
59. 59. “Best” approximation algorithms and heuristicsFor some problems there are hardness of approximation resultsstating that the problem is hard to approximate within a certainfactor.For example, the k-center problem is hard to approximatewithin a factor better than 2.Then any polynomial-time algorithm approximating thek-center problem within the factor of 2 can be considered the“best” approximation algorithm for this problem. 52/93
60. 60. “Best” approximation algorithms and heuristicsHowever, some problems are even harder to approximate. Forexample, the maximum clique is hard to approximate within afactor n1− for any positive .In this case, by the “best” heuristic we could mean a heuristicthat cannot be provably outperformed by any otherpolynomial-time algorithm (unless P = NP). 53/93
61. 61. Recognizing the gap between k-club and l -club numbersTheoremLet positive integer constants k and l, l k be given. The problemof checking whether ωl (G ) = ωk (G ) is NP-hard. ¯ ¯Note that ω(G ) ≤ ∆(G ) + 1 ≤ ωk (G ) ¯and observe that we can easily check whether ω(G ) = ∆(G ) + 1.Hence, it is NP-hard to check whether ωk (G ) = ∆(G ) + 1. ¯ 54/93
62. 62. “Best” heuristics for k-club/cliqueCorollaryLet k be a ﬁxed integer, k ≥ 2. Unless P = NP, there cannot be apolynomial time algorithm that ﬁnds a k-club of size greater than∆(G ) + 1 whenever such a k-club exists in the graph. 55/93
63. 63. Protein Interaction Networks 56/93
64. 64. A max 2-club/clique of S. Cerevisiae. 57/93
65. 65. A max 2-club/clique of H. Pylori. 58/93
66. 66. A max 3-clique/club of S. Cerevisiae 59/93
67. 67. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 60/93
68. 68. Size of clique relaxations in a graphQuestion: How large could a k-clique (k-club, k-plex) be for agraph with a small clique number? 61/93
69. 69. Size of clique relaxations in a graphQuestion: How large could a k-clique (k-club, k-plex) be for agraph with a small clique number?Answers: k-club: ω(K1,n ) = 2, but ωk (K1,n ) = n + 1, could be arbitrary ¯ large k-clique: same as k-club 61/93
70. 70. Size of clique relaxations in a graphQuestion: How large could a k-clique (k-club, k-plex) be for agraph with a small clique number?Answers: k-club: ω(K1,n ) = 2, but ωk (K1,n ) = n + 1, could be arbitrary ¯ large k-clique: same as k-club k-plex:LemmaFor any graph G and a positive integer k, ωk (G ) ≤ kω(G ) and αk (G ) ≤ kα(G ) 61/93
71. 71. Upper bound on the quasi-clique numberPropositionThe γ-clique number ωγ (G ) of a graph G with n vertices and medges satisﬁes the following inequality: γ+ γ 2 + 8γm ωγ (G ) ≤ . (1) 2γMoreover, if a graph G is connected then γ+2+ (γ + 2)2 + 8(m − n)γ ωγ (G ) ≤ . (2) 2γ 62/93
72. 72. Relation between ωγ (G ) and ω(G )PropositionThe γ-clique number ωγ (G ) and the clique number ω(G ) of graphG satisfy the following inequalities: ω(G ) − 1 ωγ (G ) − 1 1 ω(G ) − 1 ≤ ≤ . (3) ω(G ) ωγ (G ) γ ω(G )Corollary 1If γ 1 − ω(G ) then ω(G )γ ωγ (G ) ≤ . (4) 1 − ω(G ) + ω(G )γ 63/93
73. 73. Relation between omegaγ (G ) and ω(G )Table: The value of upper bound (4) on γ-clique number withγ = 0.95, 0.9, 0.85 for graphs with small clique number. 1 ω(G ) 1 − ω(G ) 0.95 0.9 0.85 3 0.667 3.35 3.86 4.64 4 0.75 4.75 6 8.5 5 0.8 6.33 9 17 6 0.83 8.14 13.5 51 7 0.86 10.23 21 – 8 0.88 12.67 36 – 9 0.89 15.55 81 – 10 0.9 19 – – 64/93
74. 74. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 65/93
75. 75. Math programming formulationsConsider a simple undirected graph G = (V , E ) with n verticesLet A = [aij ]n i,j=1 be the adjacency matrix of GLet x = (x1 , . . . , xn ) be a 0-1 vector with xi = 1 if node ibelongs to Gs , and xi = 0 otherwise. 66/93
76. 76. Math programming formulationsGS is a γ-clique if it has at least γ|GS |(|GS | − 1)/2 edges nWe have: |GS | = xi i=1This number of edges can be expressed in terms of vector x as: n n n n 1 2γ xi xi − 1 = 1γ 2 xi xj − xi i=1 i=1 i,j=1 i=1 n n n n 1 1 = 2γ xi xj + xi2 − xi = 2γ xi xj i,j=1;i=j i=1 i=1 i,j=1;i=j n 1The number of edges in GS is 2 aij xi xj i,j=1 67/93
77. 77. Math programming formulationsWe obtain the following 0-1 problem with one quadratic constraint: n max xi i=1subject to: n n aij xi xj ≥ γ xi xj i,j=1 i,j=1;i=j 68/93
78. 78. Math programming formulationsDeﬁne wij = xi xj . The constraint wij = xi xj is equivalent to wij ≤ xi , wij ≤ xj , wij ≥ xi + xj − 1 . nLinearized formulation: max xi i=1 n n subjectto : aij wij ≥ γ wij i,j=1 i,j=1;i=j wij ≤ xi , wij ≤ xj , wij ≥ xi + xj − 1 . wij , xi ∈ {0, 1}, ∀i j = 1, . . . , n O(n2 ) 0-1 variables, O(n2 ) constraints 69/93
79. 79. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 70/93
80. 80. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 71/93
81. 81. Clique relaxation and node deletion problems Node deletion problems Clique Node deletion relaxation problems problems with heredity 72/93
82. 82. Node deletion problemsA graph property Π is said to be hereditary on inducedsubgraphs, if G is a graph with property Π and the deletion ofany subset of vertices does not produce a graph violating ΠΠ is said to be nontrivial if it is true for a single vertex graphand is not satisﬁed by every graphΠ is said to be interesting if there are arbitrarily large graphssatisfying Π 73/93
83. 83. Yannakakis theoremThe maximum Π problem is to ﬁnd the largest order inducedsubgraph that does not violate property ΠTheorem (Yannakakis, 1978)The maximum Π problem for nontrivial, interesting graph propertiesthat are hereditary on induced subgraphs is NP-hard. 74/93
84. 84. Max weight node deletion problemGiven a simple, undirected graph G = (V , E ), positive weights w (v ) for each v ∈ V , and a nontrivial, interesting and hereditary (on induced subgraphs) property Π.Find ν(G ) = max{w (P) : P ⊆ V , G [P] satisﬁes Π},where w (P) = v ∈P w (v ) and G [P] is the subgraph induced by P. 75/93
85. 85. Generalizing max clique algorithmsSome of the most practically eﬀective algorithms for the maximumclique problem rely on the fact that clique is hereditary on inducedsubgraphs. Carraghan and Pardalos (1990) - used as DIMACS benchmark ¨ Osterg˚rd (2002) aWe use this observation to develop a generalized algorithm for theminimum weight node deletion problem (GAMWNDP). 76/93
86. 86. GAMWNDPOrder vertices V = {v1 , v2 , · · · , vn }, deﬁne Si = {vi , vi+1 , · · · , vn }We compute the function c(i) that is the weight of the maximuminduced subgraph with property Π in G [Si ].Obviously, c(n) = w (vn ) and c(1) = ν(G ).For the unweighted case with w (vi ) = 1, i = 1, . . . , n we have c(i + 1) + 1, if the solution must contain vi c(i) = c(i + 1), otherwiseFor the weighted case, c(i) c(i + 1) implies that vi belongs toevery optimal solution, and c(i) ≤ c(i + 1) + w (vi ). 77/93
87. 87. GAMWNDPWe compute the value of c(i) starting from c(n) and down toc(1), and in each major iteration work with a current feasiblesolution P, a candidate set C , and an incumbent solution S.Pruning occurs when w (C ) + w (P) w (S) or c(i) + w (P) w (S), where i = min{j : vj ∈ C }.Problem-speciﬁc features: candidate set generation; vertex ordering. 78/93
88. 88. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 79/93
89. 89. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 80/93
90. 90. k-Core and k-communityk-CoreA sub-graph G of G such that each node has degree greater than orequal to k.k-CommunityA subgraph G of G such that each (i, j) ∈ E if (i, j) ∈ E and i, jhave at least k common neighbors in G . 81/93
91. 91. Example - k-Core k-Core example (a) Original Graph (b) Max 3-core Each node in the max 3-core is connected to at least 3 other 82/93 vertices.
92. 92. Example - k-Community k-Community example (c) Original Graph (d) Max 2-community Any two vertices in the max 2-community are connected only if 83/93 they have at least 2 mutual neighbors.
93. 93. Scale reduction approachesReduce problem size by ignoring certain parts of the graphwhich we are certain would not be in the graph.Most commonly used technique: Peeling 1. Remove all the vertices with degree ≤ k − 1. 2. If no vertices removed, stop. Else, redo step 1 using the obtained graph.Peeling is nothing but ﬁnding the k-core! 84/93
94. 94. Scale reduction approachesProperty 1A clique of size k is both a (k − 1)-core and (k − 2)-community.Property 2If the k-Core of G is empty, then ω(G ) k + 1.Property 3If the k-Community of G is empty, then ω(G ) k + 2. 85/93
95. 95. k-Core Upper Bound Find smallest k1 such that k1 -Core is empty. UB = k1 = 5.k-Core upper bound (e) Original Graph (f) 4-core (g) Original Graph (h) 5-core Anurag Verma, Sergiy Butenko, Texas AM University 10/ 20Find smallest k1 such that k1 -Core is empty. UB = k1 = 5. 86/93
96. 96. k-Community Upper Bound k-Community upper bound Find smallest k2 such that k2 -Comm is empty. (i) Original Graph (j) Intermediate (k) 2-comm Find smallest k2 such that k2 -Comm 1 = 4. 3-comm is empty. Thus UB = k2 + is empty. UB = k2 + 1 = 4. 87/93
97. 97. Finding the least k2A simple binary search strategy: Step 0 Set ku = n − 2, kl = 0, k2 = n − 2 Step 1 If k2 -Comm(G) is empty, ku = k2 . Else, kl = k2 . Step 2 If ku − kl ≤ 1, set k2 = ku , STOP. Else, set k2 = (ku + kl )/2, go to Step 1. 88/93
98. 98. The call graphAbello, Pardalos, and Resende, 1999: 20-day period: 290 million vertices and 4 billion edges. one-day call graph: 53,767,087 vertices and over 170 millions of edges. 3,667,448 connected components; 302,468 of them with more than 3 vertices. A giant connected component with 44,989,297 vertices. 89/93
99. 99. Upper Bounds obtained bounds on SNAP Upper on SNAP Graphs database Graph n m k-Core UB k-Comm UB UB n’ UB n” WikiTalk 2394385 4659565 132 700 52 237 cit-Patents 3774768 16518947 65 106 35 83 Email-EuAll 265214 364481 38 292 19 62 Cit-HepPh 34546 420877 31 40 24 36 Cit-HepTh 27770 352285 38 52 29 48 Slashdot0811 77360 469180 55 129 34 87 Slashdot0902 82168 504230 56 134 35 96 soc-Epinions1 75879 405740 68 486 32 61 Wiki-Vote 7115 100762 54 336 22 50 p2p-Gnutella31 62586 147892 7 1004 4 57 p2p-Gnutella04 10876 39994 8 365 4 12 p2p-Gnutella24 26518 65369 6 7480 4 41 p2p-Gnutella25 22687 54705 6 6091 4 25 p2p-Gnutella30 36682 88328 8 14 4 42 web-Stanford 281903 1992636 72 387 61 126 web-NotreDame 325729 1090108 156 1367 155 155 web-Google 875713 4322051 45 48 44 48 web-BerkStan 685230 6649470 202 392 201 392 Amazon0601 403394 2443408 11 32886 11 5361 Amazon0505 410236 2439437 11 32632 11 4878 Amazon0302 262111 899792 7 286 7 105 Amazon0312 400727 2349869 11 27046 11 4534 Comparison of the upper bounds obtained by a k-core scheme vs a 90/93 k-comm scheme.
100. 100. Upper Bounds obtained on SNAPon SNAP Upper bounds Graphs database Graph n m k-Core UB k-Comm UB UB n’ UB n” WikiTalk 2394385 4659565 132 700 52 237 cit-Patents 3774768 16518947 65 106 35 83 Email-EuAll 265214 364481 38 292 19 62 Cit-HepPh 34546 420877 31 40 24 36 Cit-HepTh 27770 352285 38 52 29 48 Slashdot0811 77360 469180 55 129 34 87 Slashdot0902 82168 504230 56 134 35 96 soc-Epinions1 75879 405740 68 486 32 61 Wiki-Vote 7115 100762 54 336 22 50 p2p-Gnutella31 62586 147892 7 1004 4 57 p2p-Gnutella04 10876 39994 8 365 4 12 p2p-Gnutella24 26518 65369 6 7480 4 41 p2p-Gnutella25 22687 54705 6 6091 4 25 p2p-Gnutella30 36682 88328 8 14 4 42 web-Stanford 281903 1992636 72 387 61 126 web-NotreDame 325729 1090108 156 1367 155 155 web-Google 875713 4322051 45 48 44 48 web-BerkStan 685230 6649470 202 392 201 392 Amazon0601 403394 2443408 11 32886 11 5361 Amazon0505 410236 2439437 11 32632 11 4878 Amazon0302 262111 899792 7 286 7 105 Amazon0312 400727 2349869 11 27046 11 4534 Comparison of the number of nodes in the residual graph after the 91/93 scale reduction.
101. 101. Upper bounds on random graphsUniform random graphs with 1000 nodes.Bollobas-Erd¨ s estimate: 2 log(n)/ log(1/p). o 92/93
102. 102. OutlineIntroduction Graph Theory BasicsTaxonomy of Clique RelaxationsTheory Complexity Issues Analytical bounds Mathematical programming formulationsAlgorithms Node Deletion Problems with Heredity Scale Reduction ApproachesConclusion and References 93/93
103. 103. ConclusionWe propose a taxonomy of clique relaxation models opens new avenues for systematic study of clique relaxations covers known clique relaxation models unveils new models of potential interestAn eﬀective algorithm is provided for node deletion problemsthat are nontrivial, interesting, and hereditary on inducedsubgraphs state-of-the-art exact approach to max k-plex can be used to solve many other important problemsA scale-reduction approach based on the concept ofk-community is proposed outperforms the peeling approach based on k-core 94/93
104. 104. ReferencesB. Balasundaram, S. Butenko, I. V. Hicks.Clique relaxations in social network analysis: The maximum k-plexproblem.Operations Research, 59: 133–142, 2011.S. Trukhanov, B. Balasundaram, and S. Butenko.Exact algorithms for hard node deletion problems in networks.Submitted.S. Butenko, J. Pattillo, and N. Youssef.A taxonomy of clique relaxation models in networks.In preparation.A. Verma and S. Butenko.A new scale-reduction technique for the maximum clique and relatedproblems.In preparation. 95/93
105. 105. ReferencesJ. Pattillo, A. Veremyev, S. Butenko, and V. Boginski.On the maximum quasi-clique problem.Submitted.S. Butenko, S. Kahruman, and O. Prokopyev.On “provably best” heuristics for hard optimization problems.In preparation. 96/93
106. 106. hank you!T 97/93