SlideShare a Scribd company logo
1 of 129
CMU SCS
Large Graph Mining -
Patterns, Explanations and
Cascade Analysis
Christos Faloutsos
CMU
CMU SCS
Thank you!
• Wei FAN
• Albert BIFET
• Qiang YANG
• Philip YU
KDD'13 BIGMINE (c) 2013, C. Faloutsos 2
CMU SCS
(c) 2013, C. Faloutsos 3
Roadmap
• Introduction – Motivation
– Why study (big) graphs?
• Part#1: Patterns in graphs
• Part#2: Cascade analysis
• Conclusions
• [Extra: ebay fraud; tensors; spikes]
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 4
Graphs - why should we care?
Internet Map
[lumeta.com]
Food Web
[Martinez ‟91]
>$10B revenue
>0.5B users
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 5
Graphs - why should we care?
• web-log (‘blog’) news propagation
• computer network security: email/IP traffic and
anomaly detection
• Recommendation systems
• ....
• Many-to-many db relationship -> graph
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 6
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Static graphs
– Time-evolving graphs
– Why so many power-laws?
• Part#2: Cascade analysis
• Conclusions
KDD'13 BIGMINE
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 7
Part 1:
Patterns & Laws
CMU SCS
(c) 2013, C. Faloutsos 8
Laws and patterns
• Q1: Are real graphs random?
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 9
Laws and patterns
• Q1: Are real graphs random?
• A1: NO!!
– Diameter
– in- and out- degree distributions
– other (surprising) patterns
• Q2: why ‘no good cuts’?
• A2: <self-similarity – stay tuned>
• So, let’s look at the data
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 10
Solution# S.1
• Power law in the degree distribution
[SIGCOMM99]
log(rank)
log(degree)
internet domains
att.com
ibm.com
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 11
Solution# S.1
• Power law in the degree distribution
[SIGCOMM99]
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 12
Solution# S.1
• Q: So what?
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 13
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs:
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 14
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
KDD'13 BIGMINE
~0.8PB ->
a data center(!)
DCO @ CMU
Gaussian trap
CMU SCS
(c) 2013, C. Faloutsos 15
Solution# S.1
• Q: So what?
• A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2
log(rank)
log(degree)
-0.82
internet domains
att.com
ibm.com
KDD'13 BIGMINE
~0.8PB ->
a data center(!)
Gaussian trap
CMU SCS
(c) 2013, C. Faloutsos 16
Solution# S.2: Eigen Exponent E
• A2: power law in the eigenvalues of the adjacency
matrix
E = -0.48
Exponent = slope
Eigenvalue
Rank of decreasing eigenvalue
May 2001
KDD'13 BIGMINE
A x = x
CMU SCS
(c) 2013, C. Faloutsos 17
Roadmap
• Introduction – Motivation
• Problem#1: Patterns in graphs
– Static graphs
• degree, diameter, eigen,
• Triangles
– Time evolving graphs
• Problem#2: Tools
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 18
Solution# S.3: Triangle „Laws‟
• Real social networks have a lot of triangles
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 19
Solution# S.3: Triangle „Laws‟
• Real social networks have a lot of triangles
– Friends of friends are friends
• Any patterns?
– 2x the friends, 2x the triangles ?
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 20
Triangle Law: #S.3
[Tsourakakis ICDM 2008]
SNReuters
Epinions
X-axis: degree
Y-axis: mean # triangles
n friends -> ~n1.6 triangles
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 21
Triangle Law: Computations
[Tsourakakis ICDM 2008]
But: triangles are expensive to compute
(3-way join; several approx. algos) – O(dmax
2)
Q: Can we do that quickly?
A:
details
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 22
Triangle Law: Computations
[Tsourakakis ICDM 2008]
But: triangles are expensive to compute
(3-way join; several approx. algos) – O(dmax
2)
Q: Can we do that quickly?
A: Yes!
#triangles = 1/6 Sum ( i
3 )
(and, because of skewness (S2) ,
we only need the top few eigenvalues! - O(E)
details
KDD'13 BIGMINE
A x = x
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
23KDD'13 BIGMINE 23(c) 2013, C. Faloutsos
? ?
?
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
24KDD'13 BIGMINE 24(c) 2013, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
25KDD'13 BIGMINE 25(c) 2013, C. Faloutsos
CMU SCS
Triangle counting for large graphs?
Anomalous nodes in Twitter(~ 3 billion edges)
[U Kang, Brendan Meeder, +, PAKDD’11]
26KDD'13 BIGMINE 26(c) 2013, C. Faloutsos
CMU SCS
(c) 2013, C. Faloutsos 27
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Static graphs
• Power law degrees; eigenvalues; triangles
• Anti-pattern: NO good cuts!
– Time-evolving graphs
• ….
• Conclusions
KDD'13 BIGMINE
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 28
Background: Graph cut problem
• Given a graph, and k
• Break it into k (disjoint) communities
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 29
Graph cut problem
• Given a graph, and k
• Break it into k (disjoint) communities
• (assume: block diagonal = ‘cavemen’ graph)
k = 2
CMU SCS
Many algo‟s for graph
partitioning
• METIS [Karypis, Kumar +]
• 2nd eigenvector of Laplacian
• Modularity-based [Girwan+Newman]
• Max flow [Flake+]
• …
• …
• …
KDD'13 BIGMINE (c) 2013, C. Faloutsos 30
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 31
Strange behavior of min cuts
• Subtle details: next
– Preliminaries: min-cut plots of ‘usual’ graphs
NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti,
Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004
Workshop on Link Analysis, Counter-terrorism and Privacy
Statistical Properties of Community Structure in Large Social and
Information Networks, J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney.
WWW 2008.
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 32
“Min-cut” plot
• Do min-cuts recursively.
log (# edges)
log (mincut-size / #edges)
N nodes
Mincut size
= sqrt(N)
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 33
“Min-cut” plot
• Do min-cuts recursively.
log (# edges)
log (mincut-size / #edges)
N nodes
New min-cut
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 34
“Min-cut” plot
• Do min-cuts recursively.
log (# edges)
log (mincut-size / #edges)
N nodes
New min-cut
Slope = -0.5
Better
cut
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 35
“Min-cut” plot
log (# edges)
log (mincut-size / #edges)
Slope = -1/d
For a d-dimensional
grid, the slope is -1/d
log (# edges)
log (mincut-size / #edges)
For a random graph
(and clique),
the slope is 0
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 36
Experiments
• Datasets:
– Google Web Graph: 916,428 nodes and
5,105,039 edges
– Lucent Router Graph: Undirected graph of
network routers from
www.isi.edu/scan/mercator/maps.html; 112,969
nodes and 181,639 edges
– User  Website Clickstream Graph: 222,704
nodes and 952,580 edges
NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti,
Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004
Workshop on Link Analysis, Counter-terrorism and Privacy
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 37
“Min-cut” plot
• What does it look like for a real-world
graph?
log (# edges)
log (mincut-size / #edges)
?
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 38
Experiments
• Used the METIS algorithm [Karypis, Kumar,
1995]
log (# edges)
log(mincut-size/#edges)
•Google Web graph
• Values along the y-
axis are averaged
•“lip” for large # edges
• Slope of -0.4,
corresponds to a 2.5-
dimensional grid!
Slope~ -0.4
log (# edges)
log (mincut-size / #edges)
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 39
Experiments
• Used the METIS algorithm [Karypis, Kumar,
1995]
log (# edges)
log(mincut-size/#edges)
•Google Web graph
• Values along the y-
axis are averaged
•“lip” for large # edges
• Slope of -0.4,
corresponds to a 2.5-
dimensional grid!
Slope~ -0.4
log (# edges)
log (mincut-size / #edges)
Better
cut
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 40
Experiments
• Same results for other graphs too…
log (# edges) log (# edges)
log(mincut-size/#edges)
log(mincut-size/#edges)
Lucent Router graph Clickstream graph
Slope~ -0.57 Slope~ -0.45
CMU SCS
Why no good cuts?
• Answer: self-similarity (few foils later)
KDD'13 BIGMINE (c) 2013, C. Faloutsos 41
CMU SCS
(c) 2013, C. Faloutsos 42
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– Static graphs
– Time-evolving graphs
– Why so many power-laws?
• Part#2: Cascade analysis
• Conclusions
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 43
Problem: Time evolution
• with Jure Leskovec (CMU ->
Stanford)
• and Jon Kleinberg (Cornell –
sabb. @ CMU)
KDD'13 BIGMINE
Jure Leskovec, Jon Kleinberg and Christos Faloutsos: Graphs
over Time: Densification Laws, Shrinking Diameters and Possible
Explanations, KDD 2005
CMU SCS
(c) 2013, C. Faloutsos 44
T.1 Evolution of the Diameter
• Prior work on Power Law graphs hints
atslowly growing diameter:
– [diameter ~ O( N1/3)]
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
KDD'13 BIGMINE
diameter
CMU SCS
(c) 2013, C. Faloutsos 45
T.1 Evolution of the Diameter
• Prior work on Power Law graphs hints
atslowly growing diameter:
– [diameter ~ O( N1/3)]
– diameter ~ O(log N)
– diameter ~ O(log log N)
• What is happening in real data?
• Diameter shrinks over time
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 46
T.1 Diameter – “Patents”
• Patent citation
network
• 25 years of data
• @1999
– 2.9 M nodes
– 16.5 M edges
time [years]
diameter
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 47
T.2 Temporal Evolution of the
Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
KDD'13 BIGMINE
Say, k friends on average
k
CMU SCS
(c) 2013, C. Faloutsos 48
T.2 Temporal Evolution of the
Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled! ~ 3x
– But obeying the ``Densification Power Law’’
KDD'13 BIGMINE
Say, k friends on average
Gaussian trap
CMU SCS
(c) 2013, C. Faloutsos 49
T.2 Temporal Evolution of the
Graphs
• N(t) … nodes at time t
• E(t) … edges at time t
• Suppose that
N(t+1) = 2 * N(t)
• Q: what is your guess for
E(t+1) =? 2 * E(t)
• A: over-doubled! ~ 3x
– But obeying the ``Densification Power Law’’
KDD'13 BIGMINE
Say, k friends on average
Gaussian trap
✗ ✔
log
log
lin
lin
CMU SCS
(c) 2013, C. Faloutsos 50
T.2 Densification – Patent
Citations
• Citations among
patents granted
• @1999
– 2.9 M nodes
– 16.5 M edges
• Each year is a
datapoint
N(t)
E(t)
1.66
KDD'13 BIGMINE
CMU SCS
MORE Graph Patterns
KDD'13 BIGMINE (c) 2013, C. Faloutsos 51
RTG: A Recursive Realistic Graph Generator using Random
Typing Leman Akoglu and Christos Faloutsos. PKDD‟09.
CMU SCS
MORE Graph Patterns
KDD'13 BIGMINE (c) 2013, C. Faloutsos 52
✔
✔
✔
✔
✔
RTG: A Recursive Realistic Graph Generator using Random
Typing Leman Akoglu and Christos Faloutsos. PKDD‟09.
CMU SCS
MORE Graph Patterns
KDD'13 BIGMINE (c) 2013, C. Faloutsos 53
• Mary McGlohon, Leman Akoglu, Christos
Faloutsos. Statistical Properties of Social
Networks. in "Social Network Data Analytics” (Ed.:
CharuAggarwal)
• Deepayan Chakrabarti and Christos Faloutsos,
Graph Mining: Laws, Tools, and Case StudiesOct.
2012, Morgan Claypool.
CMU SCS
(c) 2013, C. Faloutsos 54
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– …
– Why so many power-laws?
– Why no ‘good cuts’?
• Part#2: Cascade analysis
• Conclusions
KDD'13 BIGMINE
CMU SCS
2 Questions, one answer
• Q1: why so many power laws
• Q2: why no ‘good cuts’?
KDD'13 BIGMINE (c) 2013, C. Faloutsos 55
CMU SCS
2 Questions, one answer
• Q1: why so many power laws
• Q2: why no ‘good cuts’?
• A: Self-similarity = fractals = ‘RMAT’ ~
‘Kronecker graphs’
KDD'13 BIGMINE (c) 2013, C. Faloutsos 56
possible
CMU SCS
20‟‟ intro to fractals
• Remove the middle triangle; repeat
• ->Sierpinski triangle
• (Bonus question - dimensionality?
– >1 (inf. perimeter – (4/3)∞ )
– <2 (zero area – (3/4) ∞ )
KDD'13 BIGMINE (c) 2013, C. Faloutsos 57
…
CMU SCS
20‟‟ intro to fractals
KDD'13 BIGMINE (c) 2013, C. Faloutsos 58
Self-similarity -> no char. scale
-> power laws, eg:
2x the radius,
3x the #neighbors nn(r)
nn(r) = C rlog3/log2
CMU SCS
20‟‟ intro to fractals
KDD'13 BIGMINE (c) 2013, C. Faloutsos 59
Self-similarity ->no char. scale
-> power laws, eg:
2x the radius,
3x the #neighbors nn(r)
nn(r) = C rlog3/log2
CMU SCS
20‟‟ intro to fractals
KDD'13 BIGMINE (c) 2013, C. Faloutsos 60
Self-similarity -> no char. scale
-> power laws, eg:
2x the radius,
3x the #neighbors
nn = C rlog3/log2
N(t)
E(t)
1.66
Reminder:
Densification P.L.
(2x nodes, ~3x edges)
CMU SCS
20‟‟ intro to fractals
KDD'13 BIGMINE (c) 2013, C. Faloutsos 61
Self-similarity -> no char. scale
-> power laws, eg:
2x the radius,
3x the #neighbors
nn = C rlog3/log2
2x the radius,
4x neighbors
nn = C rlog4/log2 = C r2
CMU SCS
20‟‟ intro to fractals
KDD'13 BIGMINE (c) 2013, C. Faloutsos 62
Self-similarity -> no char. scale
-> power laws, eg:
2x the radius,
3x the #neighbors
nn = C rlog3/log2
2x the radius,
4x neighbors
nn = C rlog4/log2 = C r2
Fractal dim.
=1.58
CMU SCS
20‟‟ intro to fractals
KDD'13 BIGMINE (c) 2013, C. Faloutsos 63
Self-similarity -> no char. scale
->power laws, eg:
2x the radius,
3x the #neighbors
nn = C rlog3/log2
2x the radius,
4x neighbors
nn = C rlog4/log2 = C r2
Fractal dim.
CMU SCS
How does self-similarity help in
graphs?
• A: RMAT/Kronecker generators
– With self-similarity, we get all power-laws,
automatically,
– And small/shrinking diameter
– And `no good cuts’
KDD'13 BIGMINE (c) 2013, C. Faloutsos 64
R-MAT: A Recursive Model for Graph Mining,
by D. Chakrabarti, Y. Zhan and C. Faloutsos,
SDM 2004, Orlando, Florida, USA
Realistic, Mathematically Tractable Graph Generation
and Evolution, Using Kronecker Multiplication,
by J. Leskovec, D. Chakrabarti, J. Kleinberg,
and C. Faloutsos, in PKDD 2005, Porto, Portugal
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 65
Graph gen.: Problem dfn
• Given a growing graph with count of nodes N1,
N2, …
• Generate a realistic sequence of graphs that will
obey all the patterns
– Static Patterns
S1 Power Law Degree Distribution
S2 Power Law eigenvalue and eigenvector distribution
Small Diameter
– Dynamic Patterns
T2 Growth Power Law (2x nodes; 3x edges)
T1 Shrinking/Stabilizing Diameters
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 66Adjacency matrix
Kronecker Graphs
Intermediate stage
Adjacency matrix
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 67Adjacency matrix
Kronecker Graphs
Intermediate stage
Adjacency matrix
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 68Adjacency matrix
Kronecker Graphs
Intermediate stage
Adjacency matrix
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 69
Kronecker Graphs
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 70
Kronecker Graphs
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 71
Kronecker Graphs
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 72
Kronecker Graphs
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
Holes within holes;
Communities
within communities
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 73
Properties:
• We can PROVE that
– Degree distribution is multinomial ~ power law
– Diameter: constant
– Eigenvalue distribution: multinomial
– First eigenvector: multinomial
new
Self-similarity -> power
laws
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 74
Problem Definition
• Given a growing graph with nodes N1, N2, …
• Generate a realistic sequence of graphs that will obey all
the patterns
– Static Patterns
Power Law Degree Distribution
Power Law eigenvalue and eigenvector distribution
Small Diameter
– Dynamic Patterns
Growth Power Law
Shrinking/Stabilizing Diameters
• First generator for which we can prove all these
properties





CMU SCS
Impact: Graph500
• Based on RMAT (= 2x2 Kronecker)
• Standard for graph benchmarks
• http://www.graph500.org/
• Competitions 2x year, with all major
entities: LLNL, Argonne, ITC-U. Tokyo,
Riken, ORNL, Sandia, PSC, …
KDD'13 BIGMINE (c) 2013, C. Faloutsos 75
R-MAT: A Recursive Model for Graph Mining,
by D. Chakrabarti, Y. Zhan and C. Faloutsos,
SDM 2004, Orlando, Florida, USA
To iterate is human, to recurse is devine
CMU SCS
(c) 2013, C. Faloutsos 76
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
– …
– Q1: Why so many power-laws?
– Q2: Why no ‘good cuts’?
• Part#2: Cascade analysis
• Conclusions
KDD'13 BIGMINE
A: real graphs ->
self similar ->
power laws
CMU SCS
Q2: Why „no good cuts‟?
• A: self-similarity
– Communities within communities within
communities …
KDD'13 BIGMINE (c) 2013, C. Faloutsos 77
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 78
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
REMINDER
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 79
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
REMINDER
Communities within
communities within
communities …
„Linux users‟
„Mac users‟
„win users‟
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 80
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
REMINDER
Communities within
communities within
communities …
How many
Communities?
3?
9?
27?
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 81
Kronecker Product – a Graph
• Continuing multiplying with G1 we obtain G4and
so on …
G4 adjacency matrix
REMINDER
Communities within
communities within
communities …
How many
Communities?
3?
9?
27?
A: one – but
not a typical,
block-like
community…
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 82
Communities? (Gaussian)
Clusters?
Piece-wise
flat parts?
age
# songs
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 83
Wrong questions to ask!
age
# songs
CMU SCS
Summary of Part#1
• *many* patterns in real graphs
– Small & shrinking diameters
– Power-laws everywhere
– Gaussian trap
– ‘no good cuts’
• Self-similarity (RMAT/Kronecker): good
model
KDD'13 BIGMINE (c) 2013, C. Faloutsos 84
CMU SCS
KDD'13 BIGMINE (c) 2013, C. Faloutsos 85
Part 2:
Cascades &
Immunization
CMU SCS
Why do we care?
• Information Diffusion
• Viral Marketing
• Epidemiology and Public Health
• Cyber Security
• Human mobility
• Games and Virtual Worlds
• Ecology
• ........
(c) 2013, C. Faloutsos 86KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 87
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: Cascade analysis
– (Fractional) Immunization
– Epidemic thresholds
• Conclusions
KDD'13 BIGMINE
CMU SCS
Fractional Immunization of Networks
B. Aditya Prakash, LadaAdamic, Theodore
Iwashyna (M.D.), Hanghang Tong,
Christos Faloutsos
SDM 2013, Austin, TX
(c) 2013, C. Faloutsos 88KDD'13 BIGMINE
CMU SCS
Whom to immunize?
• Dynamical Processes over networks
•Each circle is a hospital
• ~3,000 hospitals
• More than 30,000 patients
transferred
[US-MEDICARE
NETWORK 2005]
Problem: Given k units of
disinfectant, whom to immunize?
(c) 2013, C. Faloutsos 89KDD'13 BIGMINE
CMU SCS
Whom to immunize?
CURRENT PRACTICE OUR METHOD
[US-MEDICARE
NETWORK 2005]
~6x
fewer!
(c) 2013, C. Faloutsos 90KDD'13 BIGMINE
Hospital-acquired inf. : 99K+ lives, $5B+ per year
CMU SCS
Fractional Asymmetric Immunization
Hospital Another
Hospital
Drug-resistant Bacteria
(like XDR-TB)
(c) 2013, C. Faloutsos 91KDD'13 BIGMINE
CMU SCS
Fractional Asymmetric Immunization
Hospital Another
Hospital
(c) 2013, C. Faloutsos 92KDD'13 BIGMINE
CMU SCS
Fractional Asymmetric Immunization
Hospital Another
Hospital
(c) 2013, C. Faloutsos 93KDD'13 BIGMINE
CMU SCS
Fractional Asymmetric Immunization
Hospital Another
Hospital
Problem:
Given k units of disinfectant,
distribute them
to maximize hospitals saved
(c) 2013, C. Faloutsos 94KDD'13 BIGMINE
CMU SCS
Fractional Asymmetric Immunization
Hospital Another
Hospital
Problem:
Given k units of disinfectant,
distribute them
to maximize hospitals saved @ 365 days
(c) 2013, C. Faloutsos 95KDD'13 BIGMINE
CMU SCS
Straightforward solution:
Simulation:
1. Distribute resources
2. ‘infect’ a few nodes
3. Simulate evolution of spreading
– (10x, take avg)
4. Tweak, and repeat step 1
KDD'13 BIGMINE (c) 2013, C. Faloutsos 96
CMU SCS
Straightforward solution:
Simulation:
1. Distribute resources
2. ‘infect’ a few nodes
3. Simulate evolution of spreading
– (10x, take avg)
4. Tweak, and repeat step 1
KDD'13 BIGMINE (c) 2013, C. Faloutsos 97
CMU SCS
Straightforward solution:
Simulation:
1. Distribute resources
2. ‘infect’ a few nodes
3. Simulate evolution of spreading
– (10x, take avg)
4. Tweak, and repeat step 1
KDD'13 BIGMINE (c) 2013, C. Faloutsos 98
CMU SCS
Straightforward solution:
Simulation:
1. Distribute resources
2. ‘infect’ a few nodes
3. Simulate evolution of spreading
– (10x, take avg)
4. Tweak, and repeat step 1
KDD'13 BIGMINE (c) 2013, C. Faloutsos 99
CMU SCS
Running Time
Simulations SMART-ALLOC
> 1 week
Wall-Clock
Time
≈
14 secs
> 30,000x
speed-up!
better
(c) 2013, C. Faloutsos 100KDD'13 BIGMINE
CMU SCS
Experiments
K = 120
better
(c) 2013, C. Faloutsos 101KDD'13 BIGMINE
# epochs
# infected
uniform
SMART-ALLOC
CMU SCS
What is the „silver bullet‟?
A: Try to decrease connectivity of graph
Q: how to measure connectivity?
– Avg degree? Max degree?
– Std degree / avg degree ?
– Diameter?
– Modularity?
– ‘Conductance’ (~min cut size)?
– Some combination of above?
KDD'13 BIGMINE (c) 2013, C. Faloutsos 102
≈
14
secs
>
30,000x
speed-
up!
CMU SCS
What is the „silver bullet‟?
A: Try to decrease connectivity of graph
Q: how to measure connectivity?
A: first eigenvalue of adjacency matrix
Q1: why??
(Q2: dfn& intuition of eigenvalue ? )
KDD'13 BIGMINE (c) 2013, C. Faloutsos 103
Avg degree
Max degree
Diameter
Modularity
„Conductance‟
CMU SCS
Why eigenvalue?
A1: ‘G2’ theorem and ‘eigen-drop’:
• For (almost) any type of virus
• For any network
• -> no epidemic, if small-enough first
eigenvalue (λ1 ) of adjacency matrix
• Heuristic: for immunization, try to min λ1
• The smaller λ1, the closer to extinction.
KDD'13 BIGMINE (c) 2013, C. Faloutsos 104
Threshold Conditions for Arbitrary Cascade Models on
Arbitrary Networks, B. Aditya Prakash, Deepayan
Chakrabarti, Michalis Faloutsos, Nicholas Valler,
Christos Faloutsos, ICDM 2011, Vancouver, Canada
CMU SCS
Why eigenvalue?
A1: ‘G2’ theorem and ‘eigen-drop’:
• For (almost) any type of virus
• For any network
• -> no epidemic, if small-enough first
eigenvalue (λ1 ) of adjacency matrix
• Heuristic: for immunization, try to min λ1
• The smaller λ1, the closer to extinction.
KDD'13 BIGMINE (c) 2013, C. Faloutsos 105
CMU SCS
Threshold Conditions for Arbitrary Cascade
Models on Arbitrary Networks
B. Aditya Prakash, Deepayan Chakrabarti,
Michalis Faloutsos, Nicholas Valler,
Christos Faloutsos
IEEE ICDM 2011, Vancouver
extended version, in arxiv
http://arxiv.org/abs/1004.0060
G2 theorem
~10 pages proof
CMU SCS
Our thresholds for some models
• s = effective strength
• s< 1 : below threshold
(c) 2013, C. Faloutsos 107KDD'13 BIGMINE
Models Effective Strength
(s)
Threshold (tipping
point)
SIS, SIR, SIRS,
SEIR
s = λ .
s = 1
SIV, SEIV s = λ .
(H.I.V.) s = λ . 12
221
vv
v
2121 VVISI
CMU SCS
Our thresholds for some models
• s = effective strength
• s< 1 : below threshold
(c) 2013, C. Faloutsos 108KDD'13 BIGMINE
Models Effective Strength
(s)
Threshold (tipping
point)
SIS, SIR, SIRS,
SEIR
s = λ .
s = 1
SIV, SEIV s = λ .
(H.I.V.) s = λ . 12
221
vv
v
2121 VVISI
No
immunity
Temp.
immunity
w/
incubation
CMU SCS
(c) 2013, C. Faloutsos 109
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: Cascade analysis
– (Fractional) Immunization
– intuition behind λ1
• Conclusions
KDD'13 BIGMINE
CMU SCS
Intuition for λ
“Official” definitions:
• Let A be the adjacency
matrix. Then λ is the root
with the largest magnitude of
the characteristic polynomial
of A [det(A – xI)].
• Also: Ax = x
Neither gives much intuition!
“Un-official” Intuition
• For ‘homogeneous’
graphs, λ == degree
• λ ~ avg degree
– done right, for
skewed degree
distributions
(c) 2013, C. Faloutsos 110KDD'13 BIGMINE
CMU SCS
Largest Eigenvalue (λ)
λ ≈ 2 λ = N λ = N-1
N = 1000 nodes
λ ≈ 2 λ= 31.67 λ= 999
better connectivity higher λ
(c) 2013, C. Faloutsos 111KDD'13 BIGMINE
CMU SCS
Largest Eigenvalue (λ)
λ ≈ 2 λ = N λ = N-1
N = 1000 nodes
λ ≈ 2 λ= 31.67 λ= 999
better connectivity higher λ
(c) 2013, C. Faloutsos 112KDD'13 BIGMINE
CMU SCS
Examples: Simulations – SIR (mumps)
FractionofInfections
Footprint Effective StrengthTime ticks
(c) 2013, C. Faloutsos 113KDD'13 BIGMINE
(a) Infection profile (b) “Take-off” plot
PORTLAND graph: synthetic population,
31 million links, 6 million nodes
CMU SCS
Examples: Simulations – SIRS
(pertusis)
FractionofInfections
Footprint
Effective StrengthTime ticks
(c) 2013, C. Faloutsos 114KDD'13 BIGMINE
(a) Infection profile (b) “Take-off” plot
PORTLAND graph: synthetic population,
31 million links, 6 million nodes
CMU SCS
Immunization - conclusion
In (almost any) immunization setting,
• Allocate resources, such that to
• Minimize λ1
• (regardless of virus specifics)
• Conversely, in a market penetration
setting
– Allocate resources to
– Maximize λ1KDD'13 BIGMINE (c) 2013, C. Faloutsos 115
CMU SCS
(c) 2013, C. Faloutsos 116
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: Cascade analysis
– (Fractional) Immunization
– Epidemic thresholds
• What next?
• Acks& Conclusions
• [Tools: ebay fraud; tensors; spikes]
KDD'13 BIGMINE
CMU SCS
Challenge #1: „Connectome‟ –
brain wiring
KDD'13 BIGMINE (c) 2013, C. Faloutsos 117
• Which neurons get activated by ‘bee’
• How wiring evolves
• Modeling epilepsy
N. Sidiropoulos
George Karypis
V. Papalexakis
Tom Mitchell
CMU SCS
Challenge#2: Time evolving
networks / tensors
• Periodicities? Burstiness?
• What is ‘typical’ behavior of a node, over time
• Heterogeneous graphs (= nodes w/ attributes)
KDD'13 BIGMINE (c) 2013, C. Faloutsos 118
…
CMU SCS
(c) 2013, C. Faloutsos 119
Roadmap
• Introduction – Motivation
• Part#1: Patterns in graphs
• Part#2: Cascade analysis
– (Fractional) Immunization
– Epidemic thresholds
• Acks& Conclusions
• [Tools: ebay fraud; tensors; spikes]
KDD'13 BIGMINE
Off line
CMU SCS
(c) 2013, C. Faloutsos 120
Thanks
KDD'13 BIGMINE
Thanks to: NSF IIS-0705359, IIS-0534205,
CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT,
Google, INTEL, HP, iLab
CMU SCS
(c) 2013, C. Faloutsos 121
Thanks
KDD'13 BIGMINE
Thanks to: NSF IIS-0705359, IIS-0534205,
CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT,
Google, INTEL, HP, iLab
Disclaimer: All opinions are mine; not necessarily reflecting
the opinions of the funding agencies
CMU SCS
(c) 2013, C. Faloutsos 122
Project info: PEGASUS
KDD'13 BIGMINE
www.cs.cmu.edu/~pegasus
Results on large graphs: with Pegasus +
hadoop + M45
Apache license
Code, papers, manual, video
Prof. U Kang Prof. Polo Chau
CMU SCS
(c) 2013, C. Faloutsos 123
Cast
Akoglu,
Leman
Chau,
Polo
Kang, U
McGlohon,
Mary
Tong,
Hanghang
Prakash,
Aditya
KDD'13 BIGMINE
Koutra,
Danai
Beutel,
Alex
Papalexakis,
Vagelis
CMU SCS
(c) 2013, C. Faloutsos 124
CONCLUSION#1 – Big data
• Large datasets reveal patterns/outliers that
are invisible otherwise
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 125
CONCLUSION#2 – self-similarity
• powerful tool / viewpoint
– Power laws; shrinking diameters
– Gaussian trap (eg., F.O.F.)
– ‘no good cuts’
– RMAT – graph500 generator
KDD'13 BIGMINE
CMU SCS
(c) 2013, C. Faloutsos 126
CONCLUSION#3 – eigen-drop
• Cascades & immunization: G2 theorem
&eigenvalue
KDD'13 BIGMINE
CURRENT PRACTICE OUR METHOD
[US-MEDICARE
NETWORK 2005]
~6x
fewer!
≈
14
secs
>
30,000x
speed-
up!
CMU SCS
(c) 2013, C. Faloutsos 127
References
• D. Chakrabarti, C. Faloutsos: Graph Mining – Laws,
Tools and Case Studies, Morgan Claypool 2012
• http://www.morganclaypool.com/doi/abs/10.2200/S004
49ED1V01Y201209DMK006
KDD'13 BIGMINE
CMU SCS
TAKE HOME MESSAGE:
Cross-disciplinarity
KDD'13 BIGMINE (c) 2013, C. Faloutsos 128
CMU SCS
TAKE HOME MESSAGE:
Cross-disciplinarity
KDD'13 BIGMINE (c) 2013, C. Faloutsos 129
QUESTIONS?

More Related Content

Viewers also liked

Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceGut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceMartin Junghanns
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)SocialMediaMining
 
Graph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph miningGraph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph miningtuxette
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Social Network Analysis in Two Parts
Social Network Analysis in Two PartsSocial Network Analysis in Two Parts
Social Network Analysis in Two PartsPatti Anklam
 
Mining the social graph
Mining the social graphMining the social graph
Mining the social graphshunya kimura
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And MiningSrinath Srinivasa
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009Houw Liong The
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
 
Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Pythonrik0
 
Kick start graph visualization projects
Kick start graph visualization projectsKick start graph visualization projects
Kick start graph visualization projectsLinkurious
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph MiningSabri Skhiri
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network AnalysisPatti Anklam
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Martin Junghanns
 
Social Network Analysis: applications for education research
Social Network Analysis: applications for education researchSocial Network Analysis: applications for education research
Social Network Analysis: applications for education researchChristian Bokhove
 

Viewers also liked (19)

Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business IntelligenceGut vernetzt: Skalierbares Graph Mining für Business Intelligence
Gut vernetzt: Skalierbares Graph Mining für Business Intelligence
 
Graph mining
Graph miningGraph mining
Graph mining
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
 
Graph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph miningGraph mining 2: Statistical approaches for graph mining
Graph mining 2: Statistical approaches for graph mining
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Social Network Analysis in Two Parts
Social Network Analysis in Two PartsSocial Network Analysis in Two Parts
Social Network Analysis in Two Parts
 
Mining the social graph
Mining the social graphMining the social graph
Mining the social graph
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
Graph mining seminar_2009
Graph mining seminar_2009Graph mining seminar_2009
Graph mining seminar_2009
 
Lect12 graph mining
Lect12 graph miningLect12 graph mining
Lect12 graph mining
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Python
 
Kick start graph visualization projects
Kick start graph visualization projectsKick start graph visualization projects
Kick start graph visualization projects
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
Gradoop: Scalable Graph Analytics with Apache Flink @ Flink & Neo4j Meetup Be...
 
Social Network Analysis: applications for education research
Social Network Analysis: applications for education researchSocial Network Analysis: applications for education research
Social Network Analysis: applications for education research
 

Similar to Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos

Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...
Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...
Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...MLconf
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfcookie1969
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight lineVasia Kalavri
 
critico planilla de analisis para poder desarrolar proyectos.pdf
critico planilla de analisis para poder desarrolar proyectos.pdfcritico planilla de analisis para poder desarrolar proyectos.pdf
critico planilla de analisis para poder desarrolar proyectos.pdfAlan Steeven Suárez Quimí
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Yuichiro Yasui
 
Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2somu12bemech
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks Ryan Rossi
 
Comparative Analysis of Efficient Designs of D Latch using 32nm CMOS Technology
Comparative Analysis of Efficient Designs of D Latch using 32nm CMOS TechnologyComparative Analysis of Efficient Designs of D Latch using 32nm CMOS Technology
Comparative Analysis of Efficient Designs of D Latch using 32nm CMOS Technologyijtsrd
 
Day 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on dsDay 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on dsGUNASUNDARISAPIIICSE
 
Massively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMassively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMehdi Zitouni
 
Vlsi physical design
Vlsi physical designVlsi physical design
Vlsi physical designI World Tech
 
Planning & Scheduling - Training
Planning & Scheduling - TrainingPlanning & Scheduling - Training
Planning & Scheduling - TrainingMohammed Feroze
 
Chapter_3_Project_Planning_Scheduling_an 2.ppt
Chapter_3_Project_Planning_Scheduling_an 2.pptChapter_3_Project_Planning_Scheduling_an 2.ppt
Chapter_3_Project_Planning_Scheduling_an 2.pptJastienAndreanaYaran
 
Implementation of D Flip Flop using CMOS Technology
Implementation of D Flip Flop using CMOS TechnologyImplementation of D Flip Flop using CMOS Technology
Implementation of D Flip Flop using CMOS Technologyijtsrd
 
Basics Planning PERT.ppt
Basics Planning PERT.pptBasics Planning PERT.ppt
Basics Planning PERT.pptKAhmedRehman
 

Similar to Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos (20)

Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...
Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...
Danai Koutra – CMU/Technicolor Researcher, Carnegie Mellon University at MLco...
 
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdfHailey_Database_Performance_Made_Easy_through_Graphics.pdf
Hailey_Database_Performance_Made_Easy_through_Graphics.pdf
 
The shortest path is not always a straight line
The shortest path is not always a straight lineThe shortest path is not always a straight line
The shortest path is not always a straight line
 
critico planilla de analisis para poder desarrolar proyectos.pdf
critico planilla de analisis para poder desarrolar proyectos.pdfcritico planilla de analisis para poder desarrolar proyectos.pdf
critico planilla de analisis para poder desarrolar proyectos.pdf
 
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
Graph500 and Green Graph500 benchmarks on SGI UV2000 @ SGI UG SC14
 
Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2Solid modelling Slide share academic writing assignment 2
Solid modelling Slide share academic writing assignment 2
 
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks Leveraging Multiple GPUs and CPUs for  Graphlet Counting in Large Networks
Leveraging Multiple GPUs and CPUs for Graphlet Counting in Large Networks
 
Comparative Analysis of Efficient Designs of D Latch using 32nm CMOS Technology
Comparative Analysis of Efficient Designs of D Latch using 32nm CMOS TechnologyComparative Analysis of Efficient Designs of D Latch using 32nm CMOS Technology
Comparative Analysis of Efficient Designs of D Latch using 32nm CMOS Technology
 
Lect21
Lect21Lect21
Lect21
 
Cad notes
Cad notesCad notes
Cad notes
 
Day 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on dsDay 5 application of graph ,biconnectivity fdp on ds
Day 5 application of graph ,biconnectivity fdp on ds
 
Massively distributed environments and closed itemset mining
Massively distributed environments and closed itemset miningMassively distributed environments and closed itemset mining
Massively distributed environments and closed itemset mining
 
Vlsi physical design
Vlsi physical designVlsi physical design
Vlsi physical design
 
Planning & Scheduling - Training
Planning & Scheduling - TrainingPlanning & Scheduling - Training
Planning & Scheduling - Training
 
RBootcam Day 2
RBootcam Day 2RBootcam Day 2
RBootcam Day 2
 
Guru-PD.ppt
Guru-PD.pptGuru-PD.ppt
Guru-PD.ppt
 
Chapter_3_Project_Planning_Scheduling_an 2.ppt
Chapter_3_Project_Planning_Scheduling_an 2.pptChapter_3_Project_Planning_Scheduling_an 2.ppt
Chapter_3_Project_Planning_Scheduling_an 2.ppt
 
Computer graphics
Computer graphicsComputer graphics
Computer graphics
 
Implementation of D Flip Flop using CMOS Technology
Implementation of D Flip Flop using CMOS TechnologyImplementation of D Flip Flop using CMOS Technology
Implementation of D Flip Flop using CMOS Technology
 
Basics Planning PERT.ppt
Basics Planning PERT.pptBasics Planning PERT.ppt
Basics Planning PERT.ppt
 

More from BigMine

Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...BigMine
 
From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...
From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...
From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...BigMine
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16BigMine
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBigMine
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeBigMine
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 

More from BigMine (7)

Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
 
From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...
From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...
From Practice to Theory in Learning from Massive Data by Charles Elkan at Big...
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping Ye
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 

Recently uploaded

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 

Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos

  • 1. CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU
  • 2. CMU SCS Thank you! • Wei FAN • Albert BIFET • Qiang YANG • Philip YU KDD'13 BIGMINE (c) 2013, C. Faloutsos 2
  • 3. CMU SCS (c) 2013, C. Faloutsos 3 Roadmap • Introduction – Motivation – Why study (big) graphs? • Part#1: Patterns in graphs • Part#2: Cascade analysis • Conclusions • [Extra: ebay fraud; tensors; spikes] KDD'13 BIGMINE
  • 4. CMU SCS (c) 2013, C. Faloutsos 4 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ‟91] >$10B revenue >0.5B users KDD'13 BIGMINE
  • 5. CMU SCS (c) 2013, C. Faloutsos 5 Graphs - why should we care? • web-log (‘blog’) news propagation • computer network security: email/IP traffic and anomaly detection • Recommendation systems • .... • Many-to-many db relationship -> graph KDD'13 BIGMINE
  • 6. CMU SCS (c) 2013, C. Faloutsos 6 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Static graphs – Time-evolving graphs – Why so many power-laws? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE
  • 7. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 7 Part 1: Patterns & Laws
  • 8. CMU SCS (c) 2013, C. Faloutsos 8 Laws and patterns • Q1: Are real graphs random? KDD'13 BIGMINE
  • 9. CMU SCS (c) 2013, C. Faloutsos 9 Laws and patterns • Q1: Are real graphs random? • A1: NO!! – Diameter – in- and out- degree distributions – other (surprising) patterns • Q2: why ‘no good cuts’? • A2: <self-similarity – stay tuned> • So, let’s look at the data KDD'13 BIGMINE
  • 10. CMU SCS (c) 2013, C. Faloutsos 10 Solution# S.1 • Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com KDD'13 BIGMINE
  • 11. CMU SCS (c) 2013, C. Faloutsos 11 Solution# S.1 • Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE
  • 12. CMU SCS (c) 2013, C. Faloutsos 12 Solution# S.1 • Q: So what? log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE
  • 13. CMU SCS (c) 2013, C. Faloutsos 13 Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE
  • 14. CMU SCS (c) 2013, C. Faloutsos 14 Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2 log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE ~0.8PB -> a data center(!) DCO @ CMU Gaussian trap
  • 15. CMU SCS (c) 2013, C. Faloutsos 15 Solution# S.1 • Q: So what? • A1: # of two-step-away pairs: O(d_max ^2) ~ 10M^2 log(rank) log(degree) -0.82 internet domains att.com ibm.com KDD'13 BIGMINE ~0.8PB -> a data center(!) Gaussian trap
  • 16. CMU SCS (c) 2013, C. Faloutsos 16 Solution# S.2: Eigen Exponent E • A2: power law in the eigenvalues of the adjacency matrix E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001 KDD'13 BIGMINE A x = x
  • 17. CMU SCS (c) 2013, C. Faloutsos 17 Roadmap • Introduction – Motivation • Problem#1: Patterns in graphs – Static graphs • degree, diameter, eigen, • Triangles – Time evolving graphs • Problem#2: Tools KDD'13 BIGMINE
  • 18. CMU SCS (c) 2013, C. Faloutsos 18 Solution# S.3: Triangle „Laws‟ • Real social networks have a lot of triangles KDD'13 BIGMINE
  • 19. CMU SCS (c) 2013, C. Faloutsos 19 Solution# S.3: Triangle „Laws‟ • Real social networks have a lot of triangles – Friends of friends are friends • Any patterns? – 2x the friends, 2x the triangles ? KDD'13 BIGMINE
  • 20. CMU SCS (c) 2013, C. Faloutsos 20 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles KDD'13 BIGMINE
  • 21. CMU SCS (c) 2013, C. Faloutsos 21 Triangle Law: Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax 2) Q: Can we do that quickly? A: details KDD'13 BIGMINE
  • 22. CMU SCS (c) 2013, C. Faloutsos 22 Triangle Law: Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) – O(dmax 2) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( i 3 ) (and, because of skewness (S2) , we only need the top few eigenvalues! - O(E) details KDD'13 BIGMINE A x = x
  • 23. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 23KDD'13 BIGMINE 23(c) 2013, C. Faloutsos ? ? ?
  • 24. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 24KDD'13 BIGMINE 24(c) 2013, C. Faloutsos
  • 25. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 25KDD'13 BIGMINE 25(c) 2013, C. Faloutsos
  • 26. CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 26KDD'13 BIGMINE 26(c) 2013, C. Faloutsos
  • 27. CMU SCS (c) 2013, C. Faloutsos 27 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Static graphs • Power law degrees; eigenvalues; triangles • Anti-pattern: NO good cuts! – Time-evolving graphs • …. • Conclusions KDD'13 BIGMINE
  • 28. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 28 Background: Graph cut problem • Given a graph, and k • Break it into k (disjoint) communities
  • 29. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 29 Graph cut problem • Given a graph, and k • Break it into k (disjoint) communities • (assume: block diagonal = ‘cavemen’ graph) k = 2
  • 30. CMU SCS Many algo‟s for graph partitioning • METIS [Karypis, Kumar +] • 2nd eigenvector of Laplacian • Modularity-based [Girwan+Newman] • Max flow [Flake+] • … • … • … KDD'13 BIGMINE (c) 2013, C. Faloutsos 30
  • 31. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 31 Strange behavior of min cuts • Subtle details: next – Preliminaries: min-cut plots of ‘usual’ graphs NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy Statistical Properties of Community Structure in Large Social and Information Networks, J. Leskovec, K. Lang, A. Dasgupta, M. Mahoney. WWW 2008.
  • 32. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 32 “Min-cut” plot • Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes Mincut size = sqrt(N)
  • 33. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 33 “Min-cut” plot • Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut
  • 34. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 34 “Min-cut” plot • Do min-cuts recursively. log (# edges) log (mincut-size / #edges) N nodes New min-cut Slope = -0.5 Better cut
  • 35. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 35 “Min-cut” plot log (# edges) log (mincut-size / #edges) Slope = -1/d For a d-dimensional grid, the slope is -1/d log (# edges) log (mincut-size / #edges) For a random graph (and clique), the slope is 0
  • 36. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 36 Experiments • Datasets: – Google Web Graph: 916,428 nodes and 5,105,039 edges – Lucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges – User  Website Clickstream Graph: 222,704 nodes and 952,580 edges NetMine: New Mining Tools for Large Graphs, by D. Chakrabarti, Y. Zhan, D. Blandford, C. Faloutsos and G. Blelloch, in the SDM 2004 Workshop on Link Analysis, Counter-terrorism and Privacy
  • 37. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 37 “Min-cut” plot • What does it look like for a real-world graph? log (# edges) log (mincut-size / #edges) ?
  • 38. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 38 Experiments • Used the METIS algorithm [Karypis, Kumar, 1995] log (# edges) log(mincut-size/#edges) •Google Web graph • Values along the y- axis are averaged •“lip” for large # edges • Slope of -0.4, corresponds to a 2.5- dimensional grid! Slope~ -0.4 log (# edges) log (mincut-size / #edges)
  • 39. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 39 Experiments • Used the METIS algorithm [Karypis, Kumar, 1995] log (# edges) log(mincut-size/#edges) •Google Web graph • Values along the y- axis are averaged •“lip” for large # edges • Slope of -0.4, corresponds to a 2.5- dimensional grid! Slope~ -0.4 log (# edges) log (mincut-size / #edges) Better cut
  • 40. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 40 Experiments • Same results for other graphs too… log (# edges) log (# edges) log(mincut-size/#edges) log(mincut-size/#edges) Lucent Router graph Clickstream graph Slope~ -0.57 Slope~ -0.45
  • 41. CMU SCS Why no good cuts? • Answer: self-similarity (few foils later) KDD'13 BIGMINE (c) 2013, C. Faloutsos 41
  • 42. CMU SCS (c) 2013, C. Faloutsos 42 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – Static graphs – Time-evolving graphs – Why so many power-laws? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE
  • 43. CMU SCS (c) 2013, C. Faloutsos 43 Problem: Time evolution • with Jure Leskovec (CMU -> Stanford) • and Jon Kleinberg (Cornell – sabb. @ CMU) KDD'13 BIGMINE Jure Leskovec, Jon Kleinberg and Christos Faloutsos: Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005
  • 44. CMU SCS (c) 2013, C. Faloutsos 44 T.1 Evolution of the Diameter • Prior work on Power Law graphs hints atslowly growing diameter: – [diameter ~ O( N1/3)] – diameter ~ O(log N) – diameter ~ O(log log N) • What is happening in real data? KDD'13 BIGMINE diameter
  • 45. CMU SCS (c) 2013, C. Faloutsos 45 T.1 Evolution of the Diameter • Prior work on Power Law graphs hints atslowly growing diameter: – [diameter ~ O( N1/3)] – diameter ~ O(log N) – diameter ~ O(log log N) • What is happening in real data? • Diameter shrinks over time KDD'13 BIGMINE
  • 46. CMU SCS (c) 2013, C. Faloutsos 46 T.1 Diameter – “Patents” • Patent citation network • 25 years of data • @1999 – 2.9 M nodes – 16.5 M edges time [years] diameter KDD'13 BIGMINE
  • 47. CMU SCS (c) 2013, C. Faloutsos 47 T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) KDD'13 BIGMINE Say, k friends on average k
  • 48. CMU SCS (c) 2013, C. Faloutsos 48 T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! ~ 3x – But obeying the ``Densification Power Law’’ KDD'13 BIGMINE Say, k friends on average Gaussian trap
  • 49. CMU SCS (c) 2013, C. Faloutsos 49 T.2 Temporal Evolution of the Graphs • N(t) … nodes at time t • E(t) … edges at time t • Suppose that N(t+1) = 2 * N(t) • Q: what is your guess for E(t+1) =? 2 * E(t) • A: over-doubled! ~ 3x – But obeying the ``Densification Power Law’’ KDD'13 BIGMINE Say, k friends on average Gaussian trap ✗ ✔ log log lin lin
  • 50. CMU SCS (c) 2013, C. Faloutsos 50 T.2 Densification – Patent Citations • Citations among patents granted • @1999 – 2.9 M nodes – 16.5 M edges • Each year is a datapoint N(t) E(t) 1.66 KDD'13 BIGMINE
  • 51. CMU SCS MORE Graph Patterns KDD'13 BIGMINE (c) 2013, C. Faloutsos 51 RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD‟09.
  • 52. CMU SCS MORE Graph Patterns KDD'13 BIGMINE (c) 2013, C. Faloutsos 52 ✔ ✔ ✔ ✔ ✔ RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD‟09.
  • 53. CMU SCS MORE Graph Patterns KDD'13 BIGMINE (c) 2013, C. Faloutsos 53 • Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: CharuAggarwal) • Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case StudiesOct. 2012, Morgan Claypool.
  • 54. CMU SCS (c) 2013, C. Faloutsos 54 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – … – Why so many power-laws? – Why no ‘good cuts’? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE
  • 55. CMU SCS 2 Questions, one answer • Q1: why so many power laws • Q2: why no ‘good cuts’? KDD'13 BIGMINE (c) 2013, C. Faloutsos 55
  • 56. CMU SCS 2 Questions, one answer • Q1: why so many power laws • Q2: why no ‘good cuts’? • A: Self-similarity = fractals = ‘RMAT’ ~ ‘Kronecker graphs’ KDD'13 BIGMINE (c) 2013, C. Faloutsos 56 possible
  • 57. CMU SCS 20‟‟ intro to fractals • Remove the middle triangle; repeat • ->Sierpinski triangle • (Bonus question - dimensionality? – >1 (inf. perimeter – (4/3)∞ ) – <2 (zero area – (3/4) ∞ ) KDD'13 BIGMINE (c) 2013, C. Faloutsos 57 …
  • 58. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 58 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn(r) nn(r) = C rlog3/log2
  • 59. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 59 Self-similarity ->no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn(r) nn(r) = C rlog3/log2
  • 60. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 60 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 N(t) E(t) 1.66 Reminder: Densification P.L. (2x nodes, ~3x edges)
  • 61. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 61 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 2x the radius, 4x neighbors nn = C rlog4/log2 = C r2
  • 62. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 62 Self-similarity -> no char. scale -> power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 2x the radius, 4x neighbors nn = C rlog4/log2 = C r2 Fractal dim. =1.58
  • 63. CMU SCS 20‟‟ intro to fractals KDD'13 BIGMINE (c) 2013, C. Faloutsos 63 Self-similarity -> no char. scale ->power laws, eg: 2x the radius, 3x the #neighbors nn = C rlog3/log2 2x the radius, 4x neighbors nn = C rlog4/log2 = C r2 Fractal dim.
  • 64. CMU SCS How does self-similarity help in graphs? • A: RMAT/Kronecker generators – With self-similarity, we get all power-laws, automatically, – And small/shrinking diameter – And `no good cuts’ KDD'13 BIGMINE (c) 2013, C. Faloutsos 64 R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USA Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication, by J. Leskovec, D. Chakrabarti, J. Kleinberg, and C. Faloutsos, in PKDD 2005, Porto, Portugal
  • 65. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 65 Graph gen.: Problem dfn • Given a growing graph with count of nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns S1 Power Law Degree Distribution S2 Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns T2 Growth Power Law (2x nodes; 3x edges) T1 Shrinking/Stabilizing Diameters
  • 66. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 66Adjacency matrix Kronecker Graphs Intermediate stage Adjacency matrix
  • 67. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 67Adjacency matrix Kronecker Graphs Intermediate stage Adjacency matrix
  • 68. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 68Adjacency matrix Kronecker Graphs Intermediate stage Adjacency matrix
  • 69. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 69 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix
  • 70. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 70 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix
  • 71. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 71 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix
  • 72. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 72 Kronecker Graphs • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix Holes within holes; Communities within communities
  • 73. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 73 Properties: • We can PROVE that – Degree distribution is multinomial ~ power law – Diameter: constant – Eigenvalue distribution: multinomial – First eigenvector: multinomial new Self-similarity -> power laws
  • 74. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 74 Problem Definition • Given a growing graph with nodes N1, N2, … • Generate a realistic sequence of graphs that will obey all the patterns – Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter – Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters • First generator for which we can prove all these properties     
  • 75. CMU SCS Impact: Graph500 • Based on RMAT (= 2x2 Kronecker) • Standard for graph benchmarks • http://www.graph500.org/ • Competitions 2x year, with all major entities: LLNL, Argonne, ITC-U. Tokyo, Riken, ORNL, Sandia, PSC, … KDD'13 BIGMINE (c) 2013, C. Faloutsos 75 R-MAT: A Recursive Model for Graph Mining, by D. Chakrabarti, Y. Zhan and C. Faloutsos, SDM 2004, Orlando, Florida, USA To iterate is human, to recurse is devine
  • 76. CMU SCS (c) 2013, C. Faloutsos 76 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs – … – Q1: Why so many power-laws? – Q2: Why no ‘good cuts’? • Part#2: Cascade analysis • Conclusions KDD'13 BIGMINE A: real graphs -> self similar -> power laws
  • 77. CMU SCS Q2: Why „no good cuts‟? • A: self-similarity – Communities within communities within communities … KDD'13 BIGMINE (c) 2013, C. Faloutsos 77
  • 78. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 78 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER
  • 79. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 79 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER Communities within communities within communities … „Linux users‟ „Mac users‟ „win users‟
  • 80. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 80 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER Communities within communities within communities … How many Communities? 3? 9? 27?
  • 81. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 81 Kronecker Product – a Graph • Continuing multiplying with G1 we obtain G4and so on … G4 adjacency matrix REMINDER Communities within communities within communities … How many Communities? 3? 9? 27? A: one – but not a typical, block-like community…
  • 82. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 82 Communities? (Gaussian) Clusters? Piece-wise flat parts? age # songs
  • 83. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 83 Wrong questions to ask! age # songs
  • 84. CMU SCS Summary of Part#1 • *many* patterns in real graphs – Small & shrinking diameters – Power-laws everywhere – Gaussian trap – ‘no good cuts’ • Self-similarity (RMAT/Kronecker): good model KDD'13 BIGMINE (c) 2013, C. Faloutsos 84
  • 85. CMU SCS KDD'13 BIGMINE (c) 2013, C. Faloutsos 85 Part 2: Cascades & Immunization
  • 86. CMU SCS Why do we care? • Information Diffusion • Viral Marketing • Epidemiology and Public Health • Cyber Security • Human mobility • Games and Virtual Worlds • Ecology • ........ (c) 2013, C. Faloutsos 86KDD'13 BIGMINE
  • 87. CMU SCS (c) 2013, C. Faloutsos 87 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – Epidemic thresholds • Conclusions KDD'13 BIGMINE
  • 88. CMU SCS Fractional Immunization of Networks B. Aditya Prakash, LadaAdamic, Theodore Iwashyna (M.D.), Hanghang Tong, Christos Faloutsos SDM 2013, Austin, TX (c) 2013, C. Faloutsos 88KDD'13 BIGMINE
  • 89. CMU SCS Whom to immunize? • Dynamical Processes over networks •Each circle is a hospital • ~3,000 hospitals • More than 30,000 patients transferred [US-MEDICARE NETWORK 2005] Problem: Given k units of disinfectant, whom to immunize? (c) 2013, C. Faloutsos 89KDD'13 BIGMINE
  • 90. CMU SCS Whom to immunize? CURRENT PRACTICE OUR METHOD [US-MEDICARE NETWORK 2005] ~6x fewer! (c) 2013, C. Faloutsos 90KDD'13 BIGMINE Hospital-acquired inf. : 99K+ lives, $5B+ per year
  • 91. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital Drug-resistant Bacteria (like XDR-TB) (c) 2013, C. Faloutsos 91KDD'13 BIGMINE
  • 92. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital (c) 2013, C. Faloutsos 92KDD'13 BIGMINE
  • 93. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital (c) 2013, C. Faloutsos 93KDD'13 BIGMINE
  • 94. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital Problem: Given k units of disinfectant, distribute them to maximize hospitals saved (c) 2013, C. Faloutsos 94KDD'13 BIGMINE
  • 95. CMU SCS Fractional Asymmetric Immunization Hospital Another Hospital Problem: Given k units of disinfectant, distribute them to maximize hospitals saved @ 365 days (c) 2013, C. Faloutsos 95KDD'13 BIGMINE
  • 96. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 96
  • 97. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 97
  • 98. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 98
  • 99. CMU SCS Straightforward solution: Simulation: 1. Distribute resources 2. ‘infect’ a few nodes 3. Simulate evolution of spreading – (10x, take avg) 4. Tweak, and repeat step 1 KDD'13 BIGMINE (c) 2013, C. Faloutsos 99
  • 100. CMU SCS Running Time Simulations SMART-ALLOC > 1 week Wall-Clock Time ≈ 14 secs > 30,000x speed-up! better (c) 2013, C. Faloutsos 100KDD'13 BIGMINE
  • 101. CMU SCS Experiments K = 120 better (c) 2013, C. Faloutsos 101KDD'13 BIGMINE # epochs # infected uniform SMART-ALLOC
  • 102. CMU SCS What is the „silver bullet‟? A: Try to decrease connectivity of graph Q: how to measure connectivity? – Avg degree? Max degree? – Std degree / avg degree ? – Diameter? – Modularity? – ‘Conductance’ (~min cut size)? – Some combination of above? KDD'13 BIGMINE (c) 2013, C. Faloutsos 102 ≈ 14 secs > 30,000x speed- up!
  • 103. CMU SCS What is the „silver bullet‟? A: Try to decrease connectivity of graph Q: how to measure connectivity? A: first eigenvalue of adjacency matrix Q1: why?? (Q2: dfn& intuition of eigenvalue ? ) KDD'13 BIGMINE (c) 2013, C. Faloutsos 103 Avg degree Max degree Diameter Modularity „Conductance‟
  • 104. CMU SCS Why eigenvalue? A1: ‘G2’ theorem and ‘eigen-drop’: • For (almost) any type of virus • For any network • -> no epidemic, if small-enough first eigenvalue (λ1 ) of adjacency matrix • Heuristic: for immunization, try to min λ1 • The smaller λ1, the closer to extinction. KDD'13 BIGMINE (c) 2013, C. Faloutsos 104 Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks, B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos, ICDM 2011, Vancouver, Canada
  • 105. CMU SCS Why eigenvalue? A1: ‘G2’ theorem and ‘eigen-drop’: • For (almost) any type of virus • For any network • -> no epidemic, if small-enough first eigenvalue (λ1 ) of adjacency matrix • Heuristic: for immunization, try to min λ1 • The smaller λ1, the closer to extinction. KDD'13 BIGMINE (c) 2013, C. Faloutsos 105
  • 106. CMU SCS Threshold Conditions for Arbitrary Cascade Models on Arbitrary Networks B. Aditya Prakash, Deepayan Chakrabarti, Michalis Faloutsos, Nicholas Valler, Christos Faloutsos IEEE ICDM 2011, Vancouver extended version, in arxiv http://arxiv.org/abs/1004.0060 G2 theorem ~10 pages proof
  • 107. CMU SCS Our thresholds for some models • s = effective strength • s< 1 : below threshold (c) 2013, C. Faloutsos 107KDD'13 BIGMINE Models Effective Strength (s) Threshold (tipping point) SIS, SIR, SIRS, SEIR s = λ . s = 1 SIV, SEIV s = λ . (H.I.V.) s = λ . 12 221 vv v 2121 VVISI
  • 108. CMU SCS Our thresholds for some models • s = effective strength • s< 1 : below threshold (c) 2013, C. Faloutsos 108KDD'13 BIGMINE Models Effective Strength (s) Threshold (tipping point) SIS, SIR, SIRS, SEIR s = λ . s = 1 SIV, SEIV s = λ . (H.I.V.) s = λ . 12 221 vv v 2121 VVISI No immunity Temp. immunity w/ incubation
  • 109. CMU SCS (c) 2013, C. Faloutsos 109 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – intuition behind λ1 • Conclusions KDD'13 BIGMINE
  • 110. CMU SCS Intuition for λ “Official” definitions: • Let A be the adjacency matrix. Then λ is the root with the largest magnitude of the characteristic polynomial of A [det(A – xI)]. • Also: Ax = x Neither gives much intuition! “Un-official” Intuition • For ‘homogeneous’ graphs, λ == degree • λ ~ avg degree – done right, for skewed degree distributions (c) 2013, C. Faloutsos 110KDD'13 BIGMINE
  • 111. CMU SCS Largest Eigenvalue (λ) λ ≈ 2 λ = N λ = N-1 N = 1000 nodes λ ≈ 2 λ= 31.67 λ= 999 better connectivity higher λ (c) 2013, C. Faloutsos 111KDD'13 BIGMINE
  • 112. CMU SCS Largest Eigenvalue (λ) λ ≈ 2 λ = N λ = N-1 N = 1000 nodes λ ≈ 2 λ= 31.67 λ= 999 better connectivity higher λ (c) 2013, C. Faloutsos 112KDD'13 BIGMINE
  • 113. CMU SCS Examples: Simulations – SIR (mumps) FractionofInfections Footprint Effective StrengthTime ticks (c) 2013, C. Faloutsos 113KDD'13 BIGMINE (a) Infection profile (b) “Take-off” plot PORTLAND graph: synthetic population, 31 million links, 6 million nodes
  • 114. CMU SCS Examples: Simulations – SIRS (pertusis) FractionofInfections Footprint Effective StrengthTime ticks (c) 2013, C. Faloutsos 114KDD'13 BIGMINE (a) Infection profile (b) “Take-off” plot PORTLAND graph: synthetic population, 31 million links, 6 million nodes
  • 115. CMU SCS Immunization - conclusion In (almost any) immunization setting, • Allocate resources, such that to • Minimize λ1 • (regardless of virus specifics) • Conversely, in a market penetration setting – Allocate resources to – Maximize λ1KDD'13 BIGMINE (c) 2013, C. Faloutsos 115
  • 116. CMU SCS (c) 2013, C. Faloutsos 116 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – Epidemic thresholds • What next? • Acks& Conclusions • [Tools: ebay fraud; tensors; spikes] KDD'13 BIGMINE
  • 117. CMU SCS Challenge #1: „Connectome‟ – brain wiring KDD'13 BIGMINE (c) 2013, C. Faloutsos 117 • Which neurons get activated by ‘bee’ • How wiring evolves • Modeling epilepsy N. Sidiropoulos George Karypis V. Papalexakis Tom Mitchell
  • 118. CMU SCS Challenge#2: Time evolving networks / tensors • Periodicities? Burstiness? • What is ‘typical’ behavior of a node, over time • Heterogeneous graphs (= nodes w/ attributes) KDD'13 BIGMINE (c) 2013, C. Faloutsos 118 …
  • 119. CMU SCS (c) 2013, C. Faloutsos 119 Roadmap • Introduction – Motivation • Part#1: Patterns in graphs • Part#2: Cascade analysis – (Fractional) Immunization – Epidemic thresholds • Acks& Conclusions • [Tools: ebay fraud; tensors; spikes] KDD'13 BIGMINE Off line
  • 120. CMU SCS (c) 2013, C. Faloutsos 120 Thanks KDD'13 BIGMINE Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab
  • 121. CMU SCS (c) 2013, C. Faloutsos 121 Thanks KDD'13 BIGMINE Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab Disclaimer: All opinions are mine; not necessarily reflecting the opinions of the funding agencies
  • 122. CMU SCS (c) 2013, C. Faloutsos 122 Project info: PEGASUS KDD'13 BIGMINE www.cs.cmu.edu/~pegasus Results on large graphs: with Pegasus + hadoop + M45 Apache license Code, papers, manual, video Prof. U Kang Prof. Polo Chau
  • 123. CMU SCS (c) 2013, C. Faloutsos 123 Cast Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tong, Hanghang Prakash, Aditya KDD'13 BIGMINE Koutra, Danai Beutel, Alex Papalexakis, Vagelis
  • 124. CMU SCS (c) 2013, C. Faloutsos 124 CONCLUSION#1 – Big data • Large datasets reveal patterns/outliers that are invisible otherwise KDD'13 BIGMINE
  • 125. CMU SCS (c) 2013, C. Faloutsos 125 CONCLUSION#2 – self-similarity • powerful tool / viewpoint – Power laws; shrinking diameters – Gaussian trap (eg., F.O.F.) – ‘no good cuts’ – RMAT – graph500 generator KDD'13 BIGMINE
  • 126. CMU SCS (c) 2013, C. Faloutsos 126 CONCLUSION#3 – eigen-drop • Cascades & immunization: G2 theorem &eigenvalue KDD'13 BIGMINE CURRENT PRACTICE OUR METHOD [US-MEDICARE NETWORK 2005] ~6x fewer! ≈ 14 secs > 30,000x speed- up!
  • 127. CMU SCS (c) 2013, C. Faloutsos 127 References • D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 • http://www.morganclaypool.com/doi/abs/10.2200/S004 49ED1V01Y201209DMK006 KDD'13 BIGMINE
  • 128. CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity KDD'13 BIGMINE (c) 2013, C. Faloutsos 128
  • 129. CMU SCS TAKE HOME MESSAGE: Cross-disciplinarity KDD'13 BIGMINE (c) 2013, C. Faloutsos 129 QUESTIONS?