SlideShare a Scribd company logo
1 of 117
Representing Higher-order Dependencies in Networks:
Methods, Applications, and Visualizations
Jian Xu
Department of Computer Science and Engineering
University of Notre Dame
i@jianxu.net
June 20th, 2017
2
3
Higher-order network (HON)
5
6
7
8
9
10
11
12
13
14
15
Figure from Rosvall et al., Nature Comm., 2014
16
Figure from Rosvall et al., Nature Comm., 2014
17
Figure from Rosvall et al., Nature Comm., 2014
18
Variable orders in HON
Fixed-order Variable-order
Assuming a fixed order
beyond the second order
becomes impractical
because “higher-order
Markov models are more
complex” due to
combinatorial explosion
--- Rosvall et al. (Nature
Comm. 2014)
19
Variable orders in HON
Scalable for big data
20
Scalability
Network representation
(global shipping data)
Number of edges Number of nodes Network density Clustering
time (mins)
Ranking
time (s)
Conventional first-order 31,028 2,675 4.3×10-3 4 1.3
Fixed second-order 116,611 19,182 3.2×10-4 73 7.7
HON, max order two 64,914 17,235 2.2×10-4 45 4.8
HON, max order three 78,415 26,577 1.1×10-4 63 6.2
HON, max order four 83,480 30,631 8.9×10-5 67 7.0
HON, max order five 85,025 31,854 8.4×10-5 68 7.6
21
Scalability
Network representation
(global shipping data)
Number of edges Number of nodes Network density Clustering
time (mins)
Ranking
time (s)
Conventional first-order 31,028 2,675 4.3×10-3 4 1.3
Fixed second-order 116,611 19,182 3.2×10-4 73 7.7
HON, max order two 64,914 17,235 2.2×10-4 45 4.8
HON, max order three 78,415 26,577 1.1×10-4 63 6.2
HON, max order four 83,480 30,631 8.9×10-5 67 7.0
HON, max order five 85,025 31,854 8.4×10-5 68 7.6
22
Scalability
Network representation
(global shipping data)
Number of edges Number of nodes Network density Clustering
time (mins)
Ranking
time (s)
Conventional first-order 31,028 2,675 4.3×10-3 4 1.3
Fixed second-order 116,611 19,182 3.2×10-4 73 7.7
HON, max order two 64,914 17,235 2.2×10-4 45 4.8
HON, max order three 78,415 26,577 1.1×10-4 63 6.2
HON, max order four 83,480 30,631 8.9×10-5 67 7.0
HON, max order five 85,025 31,854 8.4×10-5 68 7.6
HON algorithm
24
How to construct HON?
Raw data
• Sequential data
Rule
extraction
• Which nodes to
split into higher-
order nodes, and
how high the
orders are
Network
wiring
• Connect nodes
representing
different orders
HON
• Use HON like
the conventional
network for
analyses
25
Raw data
Rule
extraction
Network
wiring
HON
26
Raw data
Rule
extraction
Network
wiring
HON
27
Raw data
Rule
extraction
Network
wiring
HON
28
Raw data
Rule
extraction
Network
wiring
HON
29
Raw data
Rule
extraction
Network
wiring
HON
30
A
• Convert all first-order rules into edges
B
• Convert higher-order rules
• Add higher-order nodes when necessary
C
• Rewire edges
• The edge weights are preserved
D
• Rewire remaining edges
Raw data
Rule
extraction
Network
wiring
HON
31
A
• Convert all first-order rules into edges
B
• Convert higher-order rules
• Add higher-order nodes when necessary
C
• Rewire edges
• The edge weights are preserved
D
• Rewire remaining edges
Raw data
Rule
extraction
Network
wiring
HON
32
A
• Convert all first-order rules into edges
B
• Convert higher-order rules
• Add higher-order nodes when necessary
C
• Rewire edges
• The edge weights are preserved
D
• Rewire remaining edges
Raw data
Rule
extraction
Network
wiring
HON
33
A
• Convert all first-order rules into edges
B
• Convert higher-order rules
• Add higher-order nodes when necessary
C
• Rewire edges
• The edge weights are preserved
D
• Rewire remaining edges
Raw data
Rule
extraction
Network
wiring
HON
HON+
Parameter-free, scalable for arbitrarily high order
35
1st order
2nd order
3rd order
4th order
5th order
Actual dependencies
36
1st order
2nd order
3rd order
4th order
5th order
Actual dependencies
Storage cost
Θ(𝑂𝑟𝑑𝑒𝑟2
× 𝑅𝑎𝑤𝐷𝑎𝑡𝑎𝑆𝑖𝑧𝑒)
37
1st order
2nd order
3rd order
4th order
5th order
Actual dependencies
Θ(𝑂𝑟𝑑𝑒𝑟 × 𝑅𝑎𝑤𝐷𝑎𝑡𝑎𝑆𝑖𝑧𝑒)
Storage cost
38
Pre-determine
if higher order will make a difference
Effectiveness
40
Influence on dynamics
41
Influence on dynamics
42
Influence on dynamics
43
Influence on dynamics
44
Influence on dynamics
Higher accuracy
in simulating real movements
45
Higher-order dependencies revealed by HON
Data # Records
Dependencies
revealed
Similar observations
Ship movement 3,415,577 Up to 5th order N/A
Clickstream 3,047,697 Up to 3rd order
“… appear to saturate at k = 3 for
Yahoo… browsing behavior
across websites is definitely not
Markovian but can be captured
reasonably well by a not-too-high
order Markov chain.”
--- Chierichetti et al. (2012)
Retweet 23,755,810 N/A
Assuming the second order has
“marginal consequences for
disease spreading”
--- Rosvall et al. (2014)
46
Higher-order dependencies revealed by HON
Data # Records
Dependencies
revealed
Similar observations
Ship movement 3,415,577 Up to 5th order N/A
Clickstream 3,047,697 Up to 3rd order
“… appear to saturate at k = 3 for
Yahoo… browsing behavior
across websites is definitely not
Markovian but can be captured
reasonably well by a not-too-high
order Markov chain.”
--- Chierichetti et al. (2012)
Retweet 23,755,810 N/A
Assuming the second order has
“marginal consequences for
disease spreading”
--- Rosvall et al. (2014)
47
Higher-order dependencies revealed by HON
Data # Records
Dependencies
revealed
Similar observations
Ship movement 3,415,577 Up to 5th order N/A
Clickstream 3,047,697 Up to 3rd order
“… appear to saturate at k = 3 for
Yahoo… browsing behavior
across websites is definitely not
Markovian but can be captured
reasonably well by a not-too-high
order Markov chain.”
--- Chierichetti et al. (2012)
Retweet 23,755,810 N/A
Assuming the second order has
“marginal consequences for
disease spreading”
--- Rosvall et al. (2014)
48
Higher-order dependencies revealed by HON
Data # Records
Dependencies
revealed
Similar observations
Ship movement 3,415,577 Up to 5th order N/A
Clickstream 3,047,697 Up to 3rd order
“… appear to saturate at k = 3 for
Yahoo… browsing behavior
across websites is definitely not
Markovian but can be captured
reasonably well by a not-too-high
order Markov chain.”
--- Chierichetti et al. (2012)
Retweet 23,755,810 N/A
Assuming the second order has
“marginal consequences for
disease spreading”
--- Rosvall et al. (2014)
49
Existing approaches
Ignore
higher-orders
Modify
existing
algorithms
Inaccurate Cannot generalize
50
Existing approaches
Ignore
higher-orders
Modify
existing
algorithms
Inaccurate Cannot generalize
51
Existing approaches
Ignore
higher-orders
Modify
existing
algorithms
Higher-order network
 Accurate representation
 Generalizes to existing algorithms
52
Compatible with existing tools
Conventionally: every node
represents a single entity
(location, state, etc.)
Now: break down nodes into
higher-order nodes that carry
different dependency relationships
 
 
1
1 1( | ) t t
t t t t
tj
W i i
P X i X i
W i j

 

  

 1
( | )
| ( | )
( | )
t t
k
W i h j
P X j X i h
W i h k


  

Only change the node labeling
Applications: clustering
Controlling species invasion through the global shipping network
54
Invasive species
Zebra mussels @ Great Lakes
Clogging water pipes, attach to boats
Photos: Great Lakes Environmental Research Lab; TIME & LIFE Images, Getty Images
$120 billion / year
damage & control costs
55
Ship-borne species invasion
Picture from GloBallast Programme 2002
56
Ship-borne species invasion
Picture from GloBallast Programme 2002
57
Clustering on global shipping network
58
Clustering on global shipping network
Shanghai
Tokyo
Los Angeles
SeattleSingapore
59
Clustering on global shipping network
Shanghai Los Angeles
Singapore SeattleTokyo
Shanghai
Tokyo
Los Angeles
SeattleSingapore
60
Clustering: first-order network
61
Clustering: higher-order network
No changes
to the clustering algorithm
Applications: ranking
Web page access behaviors for server optimization and advertising
63
Ranking on clickstream network
User 1 WDBJ7 home  View photo  WDBJ7 home  …
User 2 WDBJ7 home  View photo  Upload photo  …
User 3 View photo  Upload photo  View photo  …
User 4 WDBJ7 home  Upload photo  WDBJ7 home  …
… … … …
64
Ranking on clickstream network
User 1 Company ranking  Job listing  Apply …
User 2 Weather  Homepage  News  …
User 3 News  Sports  Scores  …
User 4 Events  Homepage  Weather  …
… … … …
65
Ranking on clickstream network
66
Ranking on clickstream network
• 26% pages show more
than 10% changes in
ranking
• More than 90% pages lose
PageRank scores, while a
few pages gain significant
scores
No changes
to the ranking algorithm
Applications: anomaly detection
Unveiling higher-order anomalies with HON
68
Anomaly detection with dynamic network
69
Anomaly detection with dynamic network
70
Anomaly detection with dynamic network
71
Anomaly detection with dynamic network
72
Anomaly detection with dynamic network
73
Anomaly detection with dynamic network
Capturing complex
anomalous patterns
No changes in
Network topology
74
Synthetic data with 11 billion movements
100,000 ships, each moving 100 steps;
11 scenarios, each repeating 100 times;
Total: 11,000,000,000 movements
75
Higher-order anomalies captured by HON
First-order network HON
Injection and alternation of 1st order dependencies
Networkdifferences
Day Day
76
Higher-order anomalies captured by HON
First-order network HON
Injection and alternation of 2nd order dependencies
Networkdifferences
Day Day
77
Higher-order anomalies captured by HON
First-order network HON
Injection and alternation of 3rd order dependencies
Networkdifferences
Day Day
78
Higher-order anomalies captured by HON
First-order network HON
Networkdifferences
Day Day
Injection and alternation of higher-order dependencies
Fails to capture certain anomalies
Higher-order anomalies captured by HON
Porto Taxi GPS trajectory data, 1 year
Higher-order anomalies captured by HON
Porto Taxi GPS trajectory data, 1 year
Visualization with HoNVis
Facilitates interactive explorations
82
HoNVis framework
83
HoNVis framework
84
HoNVis framework
85
HoNVis interface
86
Visualization & interactive exploration
Opportunities
88
Varieties of data
Application fields Input Trajectories Nodes Edges
Transportation Ship trajectories Ports Ship traffic
Computer network Clickstreams Web pages Web traffic
Human interactions
Phone call or
message cascades
People Information flow
Human behavior Human movements POIs Traffic
Healthcare Patient records Diseases Disease evolutions
NLP Sentences Words # word pairs
89
Other potential applications
90
Flexible inputs
Raw diffusion data
91
Flexible inputs
92
Flexible inputs
93
94
Summary
95
For the community
HigherOrderNetwork.com Code available
Visualization tool
Available this summer
96
Collaborators
The HON paper received valuable comments from Yuxiao Dong, Reid Johnson and David Lodge.
The HON introduction video is designed by Mayra Duarte in iCeNSA.
HON
HoNVis
Anomaly
Bruno Ribeiro
97
Acknowledgements: Funding
Thank you!
Jian Xu
i@jianxu.net
June 20th, 2017
Appendix
100
References
• Rosvall, Martin, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, and Renaud Lambiotte. "Memory in network flows and its effects on
spreading dynamics and community detection." Nature communications 5 (2014).
• Chierichetti, Flavio, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. "Are web users really markovian?." In Proceedings of the 21st
international conference on World Wide Web, pp. 609-618. ACM, 2012.
• Pons, Pascal, and Matthieu Latapy. "Computing communities in large networks using random walks." J. Graph Algorithms Appl. 10, no. 2
(2006): 191-218.
• Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. "The PageRank citation ranking: bringing order to the web." (1999).
• Gleich, David F., Lek-Heng Lim, and Yongyang Yu. "Multilinear PageRank." SIAM Journal on Matrix Analysis and Applications 36, no. 4
(2015): 1507-1541.
• Akoglu, Leman, Mary McGlohon, and Christos Faloutsos. "Anomaly detection in large graphs." In In CMU-CS-09-173 Technical Report. 2009.
• Seebens, H., M. T. Gastner, and B. Blasius. "The risk of marine bioinvasion caused by global shipping." Ecology Letters 16, no. 6 (2013): 782-
790.
• Rosvall, Martin, and Carl T. Bergstrom. "Maps of random walks on complex networks reveal community structure." Proceedings of the National
Academy of Sciences 105, no. 4 (2008): 1118-1123.
• Ducruet, César. "Network diversity and maritime flows." Journal of Transport Geography 30 (2013): 77-88.
• Ducruet, César, Sung-Woo Lee, and Adolf KY Ng. "Centrality and vulnerability in liner shipping networks: revisiting the Northeast Asian port
hierarchy." Maritime Policy & Management 37, no. 1 (2010): 17-36.
• Ducruet, César, Céline Rozenblat, and Faraz Zaidi. "Ports in multi-level maritime networks: evidence from the Atlantic (1996–2006)." Journal of
Transport Geography 18, no. 4 (2010): 508-518.
• Ducruet, César, and Theo Notteboom. "The worldwide maritime network of container shipping: spatial structure and regional dynamics." Global
Networks 12, no. 3 (2012): 395-423.
• Kaluza, Pablo, Andrea Kölzsch, Michael T. Gastner, and Bernd Blasius. "The complex network of global cargo ship movements." Journal of the
Royal Society Interface 7, no. 48 (2010): 1093-1103.
• Hu, Yihong, and Daoli Zhu. "Empirical analysis of the worldwide maritime transportation network." Physica A: Statistical Mechanics and its
Applications 388, no. 10 (2009): 2061-2071.
101
Takeaways
Higher-order network is:
More accurate in capturing dynamics in raw data.
More scalable than fixed-order networks.
3
Compatible with existing network algorithms.
102
Compatible with existing tools
Hypergraph
Higher-order anomalies captured by HON
104
105
106
107
108
109
110
Comparison with VOM
111
Comparison with VOM
HON VOM
In HON but not in
VOM
In VOM but not
in HON
0th order 0 3,029 0 3029
1st order 31,028 31,028 0 0
2nd order 32,960 35,288 427 2,755
3rd order 15,642 21,536 550 6,444
4th order 4,632 8,973 302 4,643
5th order 763 2,084 23 1,344
Total 85,025 101,938 1,302 18,215
112
Higher-order dependencies revealed by HON
Data # Records
Inject known variable-
order dependencies
Synthetic 10,000,000
10 second-order
10 third-order
10 fourth-order
• Effectiveness: correctly captures all 30 of the higher-order dependencies
• Accuracy: does not extract false dependencies beyond the fourth order
• Compactness: determines that all other dependencies are first-order
113
Clustering: higher-order network
• 45% of ports belong to more than one cluster
• 44 ports (1.7% of all) belong to five clusters
o New York, Shanghai, Hong Kong, Gibraltar, Hamburg, etc.
• Panama Canal belongs to six clusters
• Highlighting ports that may be invaded by species from multiple regions
114
Ranking on clickstream network
Pages that gain PageRank scores ΔPageRank Pages that lose PageRank scores ΔPageRank
South Bend Tribune - Home. +0.0119 KTUU - Home. −0.0057
Hagerstown News / obituaries - Front. +0.0115 KWCH - Home. −0.0031
South Bend Tribune - Obits - 3rd Party. +0.0112 Imperial Valley Press - Home. −0.0011
South Bend Tribune / sports / notredame - Front. +0.0102 Hagerstown News / sports - Front. −0.0005
Aberdeen News / news / obituaries - Front. +0.0077 Imperial Valley Press / classifieds / topjobs - Front. −0.0004
WDBJ7 - Home. +0.0075 Gaylord - Home. −0.0004
KY3 / weather - Front. +0.0075 WDBJ7 / weather / web-cams - Front. −0.0004
Hagerstown News - Home. +0.0072 KTUU / about / meetnewsteam - Front. −0.0003
Daily American / lifestyle / obituaries - Front. +0.0054 Smithsburg man faces more charges following salvag −0.0003
WDBJ7 / weather / closings - Front. +0.0048 KWCH / about / station / newsteam - Front. −0.0003
WSBT TV / weather - Front. +0.0041 South Bend Tribune / sports / highschoolsports - Front. −0.0003
Daily American - Home. +0.0036 Hagerstown News / opinion - Front. −0.0002
WDBJ7 / weather / radar - Front. +0.0036 WDBJ7 / news / anchors-reporters - Front. −0.0002
WDBJ7 / weather / 7-day-planner - Front. +0.0031 Petoskey News / news / obituaries - Front. −0.0002
WDBJ7 / weather - Front. +0.0019 KWCH / news - Front. −0.0002
115
Ranking on clickstream network
Pages that gain PageRank scores ΔPageRank Pages that lose PageRank scores ΔPageRank
South Bend Tribune - Home. +0.0119 KTUU - Home. −0.0057
Hagerstown News / obituaries - Front. +0.0115 KWCH - Home. −0.0031
South Bend Tribune - Obits - 3rd Party. +0.0112 Imperial Valley Press - Home. −0.0011
South Bend Tribune / sports / notredame - Front. +0.0102 Hagerstown News / sports - Front. −0.0005
Aberdeen News / news / obituaries - Front. +0.0077 Imperial Valley Press / classifieds / topjobs - Front. −0.0004
WDBJ7 - Home. +0.0075 Gaylord - Home. −0.0004
KY3 / weather - Front. +0.0075 WDBJ7 / weather / web-cams - Front. −0.0004
Hagerstown News - Home. +0.0072 KTUU / about / meetnewsteam - Front. −0.0003
Daily American / lifestyle / obituaries - Front. +0.0054 Smithsburg man faces more charges following salvag −0.0003
WDBJ7 / weather / closings - Front. +0.0048 KWCH / about / station / newsteam - Front. −0.0003
WSBT TV / weather - Front. +0.0041 South Bend Tribune / sports / highschoolsports - Front. −0.0003
Daily American - Home. +0.0036 Hagerstown News / opinion - Front. −0.0002
WDBJ7 / weather / radar - Front. +0.0036 WDBJ7 / news / anchors-reporters - Front. −0.0002
WDBJ7 / weather / 7-day-planner - Front. +0.0031 Petoskey News / news / obituaries - Front. −0.0002
WDBJ7 / weather - Front. +0.0019 KWCH / news - Front. −0.0002
No changes
to the ranking algorithm
116
Network representation Number of
edges
Number of
nodes
Network
density
Prob. of
returning
after two
steps
Prob. of
returning
after three
steps
Entropy
rate (bits)
Clustering
time (mins)
Ranking
time (s)
Conventional first-order 31,028 2,675 4.3×10-3 10.7% 1.5% 3.44 4 1.3
Fixed second-order 116,611 19,182 3.2×10-4 42.8% 8.0% 1.45 73 7.7
HON, max order two 64,914 17,235 2.2×10-4 41.7% 7.3% 1.46 45 4.8
HON, max order three 78,415 26,577 1.1×10-4 45.9% 16.4% 0.90 63 6.2
HON, max order four 83,480 30,631 8.9×10-5 48.9% 18.5% 0.68 67 7.0
HON, max order five 85,025 31,854 8.4×10-5 49.3% 19.2% 0.63 68 7.6
117
Parameter sensitivity

More Related Content

Similar to Representing higher-order dependencies in networks: methods, applications, and visualizations

HON_NetSci_2016
HON_NetSci_2016HON_NetSci_2016
HON_NetSci_2016
Jian Xu
 
Stochastic bandwidth estimation in networks with random service
Stochastic bandwidth estimation in networks with random serviceStochastic bandwidth estimation in networks with random service
Stochastic bandwidth estimation in networks with random service
Papitha Velumani
 
Topology ppt
Topology pptTopology ppt
Topology ppt
boocse11
 

Similar to Representing higher-order dependencies in networks: methods, applications, and visualizations (20)

From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"From complex Systems to Networks: Discovering and Modeling the Correct Network"
From complex Systems to Networks: Discovering and Modeling the Correct Network"
 
HON_NetSci_2016
HON_NetSci_2016HON_NetSci_2016
HON_NetSci_2016
 
Big Data Visualization
Big Data VisualizationBig Data Visualization
Big Data Visualization
 
Braintalk cuso nm
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nm
 
18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence18 Diffusion Models and Peer Influence
18 Diffusion Models and Peer Influence
 
09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence09 Diffusion Models & Peer Influence
09 Diffusion Models & Peer Influence
 
An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)An Overview of Bionimbus (March 2010)
An Overview of Bionimbus (March 2010)
 
ECCS 2010
ECCS 2010ECCS 2010
ECCS 2010
 
Hidden geometric correlations in real multiplex networks
Hidden geometric correlations in real multiplex networksHidden geometric correlations in real multiplex networks
Hidden geometric correlations in real multiplex networks
 
Stochastic bandwidth estimation in networks with random service
Stochastic bandwidth estimation in networks with random serviceStochastic bandwidth estimation in networks with random service
Stochastic bandwidth estimation in networks with random service
 
Multi-hop Communication for the Next Generation (xG) Wireless Network
Multi-hop Communication for the Next Generation (xG) Wireless NetworkMulti-hop Communication for the Next Generation (xG) Wireless Network
Multi-hop Communication for the Next Generation (xG) Wireless Network
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
Topology ppt
Topology pptTopology ppt
Topology ppt
 
Energy Efficient Data Gathering Protocol in WSN
Energy Efficient Data Gathering Protocol in WSNEnergy Efficient Data Gathering Protocol in WSN
Energy Efficient Data Gathering Protocol in WSN
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
04 Diffusion and Peer Influence
04 Diffusion and Peer Influence04 Diffusion and Peer Influence
04 Diffusion and Peer Influence
 
04 Diffusion and Peer Influence (2016)
04 Diffusion and Peer Influence (2016)04 Diffusion and Peer Influence (2016)
04 Diffusion and Peer Influence (2016)
 
Monitoring of Transmission and Distribution Grids using PMUs
Monitoring of Transmission and Distribution Grids using PMUsMonitoring of Transmission and Distribution Grids using PMUs
Monitoring of Transmission and Distribution Grids using PMUs
 
An Improved DEEHC to Extend Lifetime of WSN
An Improved DEEHC to Extend Lifetime of WSNAn Improved DEEHC to Extend Lifetime of WSN
An Improved DEEHC to Extend Lifetime of WSN
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 

Recently uploaded

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Representing higher-order dependencies in networks: methods, applications, and visualizations

  • 1. Representing Higher-order Dependencies in Networks: Methods, Applications, and Visualizations Jian Xu Department of Computer Science and Engineering University of Notre Dame i@jianxu.net June 20th, 2017
  • 2. 2
  • 3. 3
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. 12
  • 13. 13
  • 14. 14
  • 15. 15 Figure from Rosvall et al., Nature Comm., 2014
  • 16. 16 Figure from Rosvall et al., Nature Comm., 2014
  • 17. 17 Figure from Rosvall et al., Nature Comm., 2014
  • 18. 18 Variable orders in HON Fixed-order Variable-order Assuming a fixed order beyond the second order becomes impractical because “higher-order Markov models are more complex” due to combinatorial explosion --- Rosvall et al. (Nature Comm. 2014)
  • 19. 19 Variable orders in HON Scalable for big data
  • 20. 20 Scalability Network representation (global shipping data) Number of edges Number of nodes Network density Clustering time (mins) Ranking time (s) Conventional first-order 31,028 2,675 4.3×10-3 4 1.3 Fixed second-order 116,611 19,182 3.2×10-4 73 7.7 HON, max order two 64,914 17,235 2.2×10-4 45 4.8 HON, max order three 78,415 26,577 1.1×10-4 63 6.2 HON, max order four 83,480 30,631 8.9×10-5 67 7.0 HON, max order five 85,025 31,854 8.4×10-5 68 7.6
  • 21. 21 Scalability Network representation (global shipping data) Number of edges Number of nodes Network density Clustering time (mins) Ranking time (s) Conventional first-order 31,028 2,675 4.3×10-3 4 1.3 Fixed second-order 116,611 19,182 3.2×10-4 73 7.7 HON, max order two 64,914 17,235 2.2×10-4 45 4.8 HON, max order three 78,415 26,577 1.1×10-4 63 6.2 HON, max order four 83,480 30,631 8.9×10-5 67 7.0 HON, max order five 85,025 31,854 8.4×10-5 68 7.6
  • 22. 22 Scalability Network representation (global shipping data) Number of edges Number of nodes Network density Clustering time (mins) Ranking time (s) Conventional first-order 31,028 2,675 4.3×10-3 4 1.3 Fixed second-order 116,611 19,182 3.2×10-4 73 7.7 HON, max order two 64,914 17,235 2.2×10-4 45 4.8 HON, max order three 78,415 26,577 1.1×10-4 63 6.2 HON, max order four 83,480 30,631 8.9×10-5 67 7.0 HON, max order five 85,025 31,854 8.4×10-5 68 7.6
  • 24. 24 How to construct HON? Raw data • Sequential data Rule extraction • Which nodes to split into higher- order nodes, and how high the orders are Network wiring • Connect nodes representing different orders HON • Use HON like the conventional network for analyses
  • 30. 30 A • Convert all first-order rules into edges B • Convert higher-order rules • Add higher-order nodes when necessary C • Rewire edges • The edge weights are preserved D • Rewire remaining edges Raw data Rule extraction Network wiring HON
  • 31. 31 A • Convert all first-order rules into edges B • Convert higher-order rules • Add higher-order nodes when necessary C • Rewire edges • The edge weights are preserved D • Rewire remaining edges Raw data Rule extraction Network wiring HON
  • 32. 32 A • Convert all first-order rules into edges B • Convert higher-order rules • Add higher-order nodes when necessary C • Rewire edges • The edge weights are preserved D • Rewire remaining edges Raw data Rule extraction Network wiring HON
  • 33. 33 A • Convert all first-order rules into edges B • Convert higher-order rules • Add higher-order nodes when necessary C • Rewire edges • The edge weights are preserved D • Rewire remaining edges Raw data Rule extraction Network wiring HON
  • 34. HON+ Parameter-free, scalable for arbitrarily high order
  • 35. 35 1st order 2nd order 3rd order 4th order 5th order Actual dependencies
  • 36. 36 1st order 2nd order 3rd order 4th order 5th order Actual dependencies Storage cost Θ(𝑂𝑟𝑑𝑒𝑟2 × 𝑅𝑎𝑤𝐷𝑎𝑡𝑎𝑆𝑖𝑧𝑒)
  • 37. 37 1st order 2nd order 3rd order 4th order 5th order Actual dependencies Θ(𝑂𝑟𝑑𝑒𝑟 × 𝑅𝑎𝑤𝐷𝑎𝑡𝑎𝑆𝑖𝑧𝑒) Storage cost
  • 38. 38 Pre-determine if higher order will make a difference
  • 44. 44 Influence on dynamics Higher accuracy in simulating real movements
  • 45. 45 Higher-order dependencies revealed by HON Data # Records Dependencies revealed Similar observations Ship movement 3,415,577 Up to 5th order N/A Clickstream 3,047,697 Up to 3rd order “… appear to saturate at k = 3 for Yahoo… browsing behavior across websites is definitely not Markovian but can be captured reasonably well by a not-too-high order Markov chain.” --- Chierichetti et al. (2012) Retweet 23,755,810 N/A Assuming the second order has “marginal consequences for disease spreading” --- Rosvall et al. (2014)
  • 46. 46 Higher-order dependencies revealed by HON Data # Records Dependencies revealed Similar observations Ship movement 3,415,577 Up to 5th order N/A Clickstream 3,047,697 Up to 3rd order “… appear to saturate at k = 3 for Yahoo… browsing behavior across websites is definitely not Markovian but can be captured reasonably well by a not-too-high order Markov chain.” --- Chierichetti et al. (2012) Retweet 23,755,810 N/A Assuming the second order has “marginal consequences for disease spreading” --- Rosvall et al. (2014)
  • 47. 47 Higher-order dependencies revealed by HON Data # Records Dependencies revealed Similar observations Ship movement 3,415,577 Up to 5th order N/A Clickstream 3,047,697 Up to 3rd order “… appear to saturate at k = 3 for Yahoo… browsing behavior across websites is definitely not Markovian but can be captured reasonably well by a not-too-high order Markov chain.” --- Chierichetti et al. (2012) Retweet 23,755,810 N/A Assuming the second order has “marginal consequences for disease spreading” --- Rosvall et al. (2014)
  • 48. 48 Higher-order dependencies revealed by HON Data # Records Dependencies revealed Similar observations Ship movement 3,415,577 Up to 5th order N/A Clickstream 3,047,697 Up to 3rd order “… appear to saturate at k = 3 for Yahoo… browsing behavior across websites is definitely not Markovian but can be captured reasonably well by a not-too-high order Markov chain.” --- Chierichetti et al. (2012) Retweet 23,755,810 N/A Assuming the second order has “marginal consequences for disease spreading” --- Rosvall et al. (2014)
  • 51. 51 Existing approaches Ignore higher-orders Modify existing algorithms Higher-order network  Accurate representation  Generalizes to existing algorithms
  • 52. 52 Compatible with existing tools Conventionally: every node represents a single entity (location, state, etc.) Now: break down nodes into higher-order nodes that carry different dependency relationships     1 1 1( | ) t t t t t t tj W i i P X i X i W i j          1 ( | ) | ( | ) ( | ) t t k W i h j P X j X i h W i h k       Only change the node labeling
  • 53. Applications: clustering Controlling species invasion through the global shipping network
  • 54. 54 Invasive species Zebra mussels @ Great Lakes Clogging water pipes, attach to boats Photos: Great Lakes Environmental Research Lab; TIME & LIFE Images, Getty Images $120 billion / year damage & control costs
  • 55. 55 Ship-borne species invasion Picture from GloBallast Programme 2002
  • 56. 56 Ship-borne species invasion Picture from GloBallast Programme 2002
  • 57. 57 Clustering on global shipping network
  • 58. 58 Clustering on global shipping network Shanghai Tokyo Los Angeles SeattleSingapore
  • 59. 59 Clustering on global shipping network Shanghai Los Angeles Singapore SeattleTokyo Shanghai Tokyo Los Angeles SeattleSingapore
  • 61. 61 Clustering: higher-order network No changes to the clustering algorithm
  • 62. Applications: ranking Web page access behaviors for server optimization and advertising
  • 63. 63 Ranking on clickstream network User 1 WDBJ7 home  View photo  WDBJ7 home  … User 2 WDBJ7 home  View photo  Upload photo  … User 3 View photo  Upload photo  View photo  … User 4 WDBJ7 home  Upload photo  WDBJ7 home  … … … … …
  • 64. 64 Ranking on clickstream network User 1 Company ranking  Job listing  Apply … User 2 Weather  Homepage  News  … User 3 News  Sports  Scores  … User 4 Events  Homepage  Weather  … … … … …
  • 66. 66 Ranking on clickstream network • 26% pages show more than 10% changes in ranking • More than 90% pages lose PageRank scores, while a few pages gain significant scores No changes to the ranking algorithm
  • 67. Applications: anomaly detection Unveiling higher-order anomalies with HON
  • 68. 68 Anomaly detection with dynamic network
  • 69. 69 Anomaly detection with dynamic network
  • 70. 70 Anomaly detection with dynamic network
  • 71. 71 Anomaly detection with dynamic network
  • 72. 72 Anomaly detection with dynamic network
  • 73. 73 Anomaly detection with dynamic network Capturing complex anomalous patterns No changes in Network topology
  • 74. 74 Synthetic data with 11 billion movements 100,000 ships, each moving 100 steps; 11 scenarios, each repeating 100 times; Total: 11,000,000,000 movements
  • 75. 75 Higher-order anomalies captured by HON First-order network HON Injection and alternation of 1st order dependencies Networkdifferences Day Day
  • 76. 76 Higher-order anomalies captured by HON First-order network HON Injection and alternation of 2nd order dependencies Networkdifferences Day Day
  • 77. 77 Higher-order anomalies captured by HON First-order network HON Injection and alternation of 3rd order dependencies Networkdifferences Day Day
  • 78. 78 Higher-order anomalies captured by HON First-order network HON Networkdifferences Day Day Injection and alternation of higher-order dependencies Fails to capture certain anomalies
  • 79. Higher-order anomalies captured by HON Porto Taxi GPS trajectory data, 1 year
  • 80. Higher-order anomalies captured by HON Porto Taxi GPS trajectory data, 1 year
  • 81. Visualization with HoNVis Facilitates interactive explorations
  • 88. 88 Varieties of data Application fields Input Trajectories Nodes Edges Transportation Ship trajectories Ports Ship traffic Computer network Clickstreams Web pages Web traffic Human interactions Phone call or message cascades People Information flow Human behavior Human movements POIs Traffic Healthcare Patient records Diseases Disease evolutions NLP Sentences Words # word pairs
  • 93. 93
  • 95. 95 For the community HigherOrderNetwork.com Code available Visualization tool Available this summer
  • 96. 96 Collaborators The HON paper received valuable comments from Yuxiao Dong, Reid Johnson and David Lodge. The HON introduction video is designed by Mayra Duarte in iCeNSA. HON HoNVis Anomaly Bruno Ribeiro
  • 100. 100 References • Rosvall, Martin, Alcides V. Esquivel, Andrea Lancichinetti, Jevin D. West, and Renaud Lambiotte. "Memory in network flows and its effects on spreading dynamics and community detection." Nature communications 5 (2014). • Chierichetti, Flavio, Ravi Kumar, Prabhakar Raghavan, and Tamas Sarlos. "Are web users really markovian?." In Proceedings of the 21st international conference on World Wide Web, pp. 609-618. ACM, 2012. • Pons, Pascal, and Matthieu Latapy. "Computing communities in large networks using random walks." J. Graph Algorithms Appl. 10, no. 2 (2006): 191-218. • Page, Lawrence, Sergey Brin, Rajeev Motwani, and Terry Winograd. "The PageRank citation ranking: bringing order to the web." (1999). • Gleich, David F., Lek-Heng Lim, and Yongyang Yu. "Multilinear PageRank." SIAM Journal on Matrix Analysis and Applications 36, no. 4 (2015): 1507-1541. • Akoglu, Leman, Mary McGlohon, and Christos Faloutsos. "Anomaly detection in large graphs." In In CMU-CS-09-173 Technical Report. 2009. • Seebens, H., M. T. Gastner, and B. Blasius. "The risk of marine bioinvasion caused by global shipping." Ecology Letters 16, no. 6 (2013): 782- 790. • Rosvall, Martin, and Carl T. Bergstrom. "Maps of random walks on complex networks reveal community structure." Proceedings of the National Academy of Sciences 105, no. 4 (2008): 1118-1123. • Ducruet, César. "Network diversity and maritime flows." Journal of Transport Geography 30 (2013): 77-88. • Ducruet, César, Sung-Woo Lee, and Adolf KY Ng. "Centrality and vulnerability in liner shipping networks: revisiting the Northeast Asian port hierarchy." Maritime Policy & Management 37, no. 1 (2010): 17-36. • Ducruet, César, Céline Rozenblat, and Faraz Zaidi. "Ports in multi-level maritime networks: evidence from the Atlantic (1996–2006)." Journal of Transport Geography 18, no. 4 (2010): 508-518. • Ducruet, César, and Theo Notteboom. "The worldwide maritime network of container shipping: spatial structure and regional dynamics." Global Networks 12, no. 3 (2012): 395-423. • Kaluza, Pablo, Andrea Kölzsch, Michael T. Gastner, and Bernd Blasius. "The complex network of global cargo ship movements." Journal of the Royal Society Interface 7, no. 48 (2010): 1093-1103. • Hu, Yihong, and Daoli Zhu. "Empirical analysis of the worldwide maritime transportation network." Physica A: Statistical Mechanics and its Applications 388, no. 10 (2009): 2061-2071.
  • 101. 101 Takeaways Higher-order network is: More accurate in capturing dynamics in raw data. More scalable than fixed-order networks. 3 Compatible with existing network algorithms.
  • 102. 102 Compatible with existing tools Hypergraph
  • 104. 104
  • 105. 105
  • 106. 106
  • 107. 107
  • 108. 108
  • 109. 109
  • 111. 111 Comparison with VOM HON VOM In HON but not in VOM In VOM but not in HON 0th order 0 3,029 0 3029 1st order 31,028 31,028 0 0 2nd order 32,960 35,288 427 2,755 3rd order 15,642 21,536 550 6,444 4th order 4,632 8,973 302 4,643 5th order 763 2,084 23 1,344 Total 85,025 101,938 1,302 18,215
  • 112. 112 Higher-order dependencies revealed by HON Data # Records Inject known variable- order dependencies Synthetic 10,000,000 10 second-order 10 third-order 10 fourth-order • Effectiveness: correctly captures all 30 of the higher-order dependencies • Accuracy: does not extract false dependencies beyond the fourth order • Compactness: determines that all other dependencies are first-order
  • 113. 113 Clustering: higher-order network • 45% of ports belong to more than one cluster • 44 ports (1.7% of all) belong to five clusters o New York, Shanghai, Hong Kong, Gibraltar, Hamburg, etc. • Panama Canal belongs to six clusters • Highlighting ports that may be invaded by species from multiple regions
  • 114. 114 Ranking on clickstream network Pages that gain PageRank scores ΔPageRank Pages that lose PageRank scores ΔPageRank South Bend Tribune - Home. +0.0119 KTUU - Home. −0.0057 Hagerstown News / obituaries - Front. +0.0115 KWCH - Home. −0.0031 South Bend Tribune - Obits - 3rd Party. +0.0112 Imperial Valley Press - Home. −0.0011 South Bend Tribune / sports / notredame - Front. +0.0102 Hagerstown News / sports - Front. −0.0005 Aberdeen News / news / obituaries - Front. +0.0077 Imperial Valley Press / classifieds / topjobs - Front. −0.0004 WDBJ7 - Home. +0.0075 Gaylord - Home. −0.0004 KY3 / weather - Front. +0.0075 WDBJ7 / weather / web-cams - Front. −0.0004 Hagerstown News - Home. +0.0072 KTUU / about / meetnewsteam - Front. −0.0003 Daily American / lifestyle / obituaries - Front. +0.0054 Smithsburg man faces more charges following salvag −0.0003 WDBJ7 / weather / closings - Front. +0.0048 KWCH / about / station / newsteam - Front. −0.0003 WSBT TV / weather - Front. +0.0041 South Bend Tribune / sports / highschoolsports - Front. −0.0003 Daily American - Home. +0.0036 Hagerstown News / opinion - Front. −0.0002 WDBJ7 / weather / radar - Front. +0.0036 WDBJ7 / news / anchors-reporters - Front. −0.0002 WDBJ7 / weather / 7-day-planner - Front. +0.0031 Petoskey News / news / obituaries - Front. −0.0002 WDBJ7 / weather - Front. +0.0019 KWCH / news - Front. −0.0002
  • 115. 115 Ranking on clickstream network Pages that gain PageRank scores ΔPageRank Pages that lose PageRank scores ΔPageRank South Bend Tribune - Home. +0.0119 KTUU - Home. −0.0057 Hagerstown News / obituaries - Front. +0.0115 KWCH - Home. −0.0031 South Bend Tribune - Obits - 3rd Party. +0.0112 Imperial Valley Press - Home. −0.0011 South Bend Tribune / sports / notredame - Front. +0.0102 Hagerstown News / sports - Front. −0.0005 Aberdeen News / news / obituaries - Front. +0.0077 Imperial Valley Press / classifieds / topjobs - Front. −0.0004 WDBJ7 - Home. +0.0075 Gaylord - Home. −0.0004 KY3 / weather - Front. +0.0075 WDBJ7 / weather / web-cams - Front. −0.0004 Hagerstown News - Home. +0.0072 KTUU / about / meetnewsteam - Front. −0.0003 Daily American / lifestyle / obituaries - Front. +0.0054 Smithsburg man faces more charges following salvag −0.0003 WDBJ7 / weather / closings - Front. +0.0048 KWCH / about / station / newsteam - Front. −0.0003 WSBT TV / weather - Front. +0.0041 South Bend Tribune / sports / highschoolsports - Front. −0.0003 Daily American - Home. +0.0036 Hagerstown News / opinion - Front. −0.0002 WDBJ7 / weather / radar - Front. +0.0036 WDBJ7 / news / anchors-reporters - Front. −0.0002 WDBJ7 / weather / 7-day-planner - Front. +0.0031 Petoskey News / news / obituaries - Front. −0.0002 WDBJ7 / weather - Front. +0.0019 KWCH / news - Front. −0.0002 No changes to the ranking algorithm
  • 116. 116 Network representation Number of edges Number of nodes Network density Prob. of returning after two steps Prob. of returning after three steps Entropy rate (bits) Clustering time (mins) Ranking time (s) Conventional first-order 31,028 2,675 4.3×10-3 10.7% 1.5% 3.44 4 1.3 Fixed second-order 116,611 19,182 3.2×10-4 42.8% 8.0% 1.45 73 7.7 HON, max order two 64,914 17,235 2.2×10-4 41.7% 7.3% 1.46 45 4.8 HON, max order three 78,415 26,577 1.1×10-4 45.9% 16.4% 0.90 63 6.2 HON, max order four 83,480 30,631 8.9×10-5 48.9% 18.5% 0.68 67 7.0 HON, max order five 85,025 31,854 8.4×10-5 49.3% 19.2% 0.63 68 7.6

Editor's Notes

  1. Good afternoon, my name is Jian, I’m from University of Notre Dame. Today I will discuss an approach to represent higher-order dependencies in networks.
  2. Overall, we have placed an important piece for representing raw data with networks
  3. Overall, we have placed an important piece for representing raw data with networks
  4. Now let’s begin with the higher-order network.
  5. Our world is complex. A single ship may sail among hundreds of ports in the world Consider hundreds of thousands of such ships. They naturally weave a global shipping network, that transports cargos intentionally, and transports invasive species unintentionally.
  6. Suppose we have raw data about global shipping movements, which in every row of the data we know which vessel goes from which port to which other port at what time. We see sequential data like this very frequently.
  7. In order to perform network analyses, we first need to construct a network from this data. The conventional way is to look at the data, count how many ships sailed between port pairs, and use these as edge weights to build a network. This conventional network gives an overview of port-to-port traffic. In this network, the edges appear to be the same, since port-to-port traffic are equal.
  8. When we simulate ship movements on this network, according to the network structure, where a ship will go from Singapore will not depend on where it came to Singapore.
  9. In reality, according to the raw data, the ship is more likely to continue onto LA if it came from Shanghai to Singapore, and more likely to go to Seattle if it came from Tokyo to Singapore. Such complex movement patterns are not captured by the conventional first order network, which only retains pairwise interactions between ports, and loses important information about higher-order dependencies.
  10. We know the network representation derived from the raw data is the foundation for further analyses, if the network captures only pairwise connections in the raw data, and loses information such as higher-order dependencies,
  11. then for all following network analyses, the results may not be accurate.
  12. The question is…
  13. We propose the higher-order network. The basic idea is to split nodes into higher-order nodes. In this example, based on the movement patterns in the raw data, Singapore is split into Singapore given Shanghai and Singapore given Tokyo.
  14. When ships follow different paths to Singapore, their choices for the next step will be different, therefore being able to reproduce ship movements more accurately.
  15. The idea of higher-order Markov is not new, but it has not been discussed in the network context until very recently.
  16. In 2014, Rosvall’s group proposed the fixed second order network, which counts the number of trigrams in the raw trajectories,
  17. And uses these memory nodes to remember the one more previous step. This is a powerful approach that captures all second-order dependencies.
  18. Although the fixed-order network is relatively easy to build, going beyond second order is impractical due to the possible combinations of multiple previous steps. Instead, we propose to use a variable-order representation, that uses the first order when it is sufficient, and uses higher orders only when necessary.
  19. As a result, we can represent variable orders of dependencies in the same network. Here we see the first order node of Singapore, second order, third order, fourth order, and so on.
  20. While assuming a fixed second order significantly increases the size of networks
  21. By using variable orders in HON, the information in the network is comparable, but the network size is greatly reduced, since higher orders are used only when necessary.
  22. Even the 5th order higher-order network is smaller than the fixed second order network. Therefore, HON has good scalability.
  23. Now how to construct HON?
  24. The workflow is as follows: Given the raw sequential data, we first decide which nodes need to be split into higher-order nodes, and how high the orders are; Then we connect nodes representing different orders of dependency. Finally the output is the higher-order network, which can be used like conventional network for analyses.
  25. The rule extraction process works as follows. In the shipping example, from the raw ship movement trajectories,
  26. first count how many ships are going from Singapore to the next port
  27. normalize the distributions for the next step given the current port being Singapore;
  28. Then given one more previous step, say Shanghai, see how that changes the distribution of the next step from Singapore;
  29. If the change is significant, then there is second order dependency here; this process is then repeated recursively for higher-orders.
  30. After deciding the orders of dependencies, we first construct a first-order network in the conventional way,
  31. then for every second order rule, add the corresponding node,
  32. and rewire the previous first-order link to connect to the new node; Then repeat the process for third order rules and so on.
  33. Finally, for the highest order nodes, rewire their out-edges to connect to nodes with the highest orders possible.
  34. Our latest work has made the algorithm parameter-free and more scalable.
  35. In practice, we do not have a lot of higher-order dependencies.
  36. To identify higher-order dependencies, we need to iterate every order and test the significance. However, the number of possible dependencies grows fast with orders.
  37. Is there a way to optimize the cost in space and time?
  38. Given the probability distribution for the next step, we can actually pre-determine if a higher-order will make a significant difference. Therefore, the algorithm does not have to iterate all possibilities for higher orders.
  39. Let us see how HON affects dynamic processes on the network.
  40. Suppose we have these ship trajectories, the first two ships are going back and forth between port A and port B. The third ship goes between port A and port C.
  41. We can construct both the first-order network and the higher-order network from these trajectories.
  42. Then based on the network structure, we predict where the ship is going. On the first-order network, if a ship used to go between a and b, it is not guaranteed to go back to b again. But on the higher-order network, if a ship goes from b to a, according to the network structure, it will go back to b for sure.
  43. We can compare the predicted trajectories with the testing set, and know how accurately movements are simulated on these two networks.
  44. We see that compared with the first-order network, fixed second-order network improves the accuracy, but HON with maximum order of 5 can further improve the accuracy, with a smaller size than the fixed 2nd order network.
  45. We take diverse real-world data sets as the input, including the ship movement data, user clickstream data on websites, and Weibo retweet data reflecting diffusion process.
  46. Interestingly, HON shows that a ship’s movement can depend on up to five previous ports, while all existing papers on shipping network simply ignore these higher-order dependencies.
  47. The clickstream data shows up to 3rd order, which matches observations in other literatures.
  48. The retweet data as a diffusion process, very interestingly, shows no higher-order dependency.
  49. Given diverse raw data with higher-order dependencies, existing analysis methods are divided into two directions. The first one is simply ignore higher-order patterns in the data and use the first-order network representation, which is inaccurate, and may potentially undermine the subsequent network analyses.
  50. The other approach, is to modify existing algorithms to incorporate higher-order movement patterns in the analyses, Such as the recent papers in Multilinear PageRank, or clustering using supervised random walks. However, such approaches do not generalize, that we cannot modify all existing algorithms to incorporate higher-order movement patterns.
  51. The higher-order network resolves this dilemma, by fundamentally improving the network itself, such that network algorithms, without any modification, will be aware of higher-order dependencies.
  52. The reason behind direct compatibility is that HON keeps the data structure consistent with FON. Here shows an example of random walks, we see that the only change is node labeling.
  53. Now we see how HON influences a wide range of applications.
  54. Thirty years ago zebra mussels did not even exist in the U.S., but now they are all over the Great Lakes. They do significant economical harms.
  55. One of the major vectors of species introduction is through global shipping; (clicks) here shows an example of how species may be taken up with ballast water at one location, and discharged at another. Can we discover the species flow patterns by studying the global shipping network, in order to inform the policy makers to devise effective invasive species control policies?
  56. One of the major vectors of species introduction is through global shipping; (click) here shows an example of how species may be taken up with ballast water at one location, and discharged at another. Can we discover the species flow patterns by studying the global shipping network, in order to inform the policy makers to devise effective invasive species control policies?
  57. Clustering methods based on flow dynamics, such as Infomap and walktrap, can help us identify clusters of ports that are loosely connected to each other by shipping traffic. Controlling the weak links between clusters can prevent species propagation from one cluster of ports to another.
  58. If we do clustering on the first-order network, these nodes appear to be equivalent.
  59. In reality, given ship movements, species are potentially more likely to be introduced to Los Angeles indirectly from Shanghai. Clustering on HON will automatically take this into account and cluster Shanghai and Los Angeles together.
  60. If we cluster the ports on the first-order network, it gives reasonable results, that ports geographically close to each other are likely to bring invasive species to each other. We see that for Malta, a small country in the Mediterranean, these two ports belong to the same Mediterranean cluster.
  61. On the higher-order network, however, clustering shows that Malta Freeport belongs to as many as five clusters, very different from Valletta, even they are geographically close to each other. The reason is that Valletta is a small local port, while Malta Freeport is an international port with lots of traffic and is potentially susceptible to multiple sources of invasions. Clustering on HON is able to reflect such differences, with no changes to the clustering algorithm itself. **Our biology collaborators commented that the ability to capture indirect species invasions is critical for devising effective species control policies. **Currently we are focusing on the Arctic region, which is overseeing increased traffic due to the reduced sea ice coverage caused by climate change.
  62. Another application, ranking web page clickstreams
  63. Given the clickstreams of web pages (click),
  64. one could build a first-order network like this. According to the network structure, after a user views or uploads a photo, the user is very likely to go back to the home page. The higher-order network representation of the same data, however, shows additional information. It keeps the nodes and connections of the first-order network (click), but in addition (click),
  65. it adds additional higher-order nodes, representing the natural scenario that once a user uploads and views a photo, the user is very likely to keep uploading more photos rather than going back to the home page. Given that PageRank is essentially random walks with resets on a network, the random walker is more likely to get trapped in the strong loop in HON, leading to differences in PR scores.
  66. More than 90% pages lose PageRank scores, while a few pages gain significant scores. In the raw data, we see weather forecast and obituaries gain the most scores on HON, which is natural considering the data set is from a news company. By using the HON, PR is able to simulate complex user browsing behaviors, with no changes to the PR algorithm.
  67. Let’s see how HON can help unveil anomalies that are otherwise hidden.
  68. The network approach of anomaly detection is to build a dynamic network from a raw data, which is a series of networks for each time stamp. Then monitor when the network shows significant changes. Suppose we have trajectories like this. Ships at c will randomly go to d or e, no matter where the ship comes to c.
  69. We can build both the FON and HON. Since there are no higher-order dependencies, HON looks the same as FON.
  70. At the second time stamp, there is still no higher-order dependency, the two networks remain the same. In time frame 1 and 2, the network distance is 0.
  71. Now that a second-order pattern has emerged, that all ships coming from a to c will go to d, and all ships from b to c will go to e. Since the aggregated pairwise traffic remains the same, the first-order network shows no changes. On the HON, the network topology will change to reflect the emergence of the second-order patterns, therefore the rising network difference signals an anomaly here.
  72. When the second-order patterns change, now that all ships from a through c will go to e and all ships from b through c will go to d, the first-order network still shows no change
  73. But again, the HON can capture complex anomalous patterns.
  74. We use a more comprehensive data to test the effectiveness of the HON based anomaly detection scheme. We produce a big synthetic data with known first-order dependencies, 2nd order, 3rd order, and mixed order dependencies. The 11 cases correspond to 11 billion movements.
  75. We represent this data with dynamic FON and dynamic HON. The alternation of first-order movements is reflected by both networks.
  76. But for 2nd, 3rd and more complex movement behavior changes, FON may completely fail to capture such anomalies.
  77. But for 2nd, 3rd and more complex movement behavior changes, FON may completely fail to capture such anomalies.
  78. Even if it reflects some changes, the signal can be very weak, compared to HON.
  79. We also test the anomaly detection framework on the Porto taxi gps trajectory data.
  80. Because the graph distances of HON reflect changes in higher-order movement patterns and first-order patterns, HON is able to amplify anomalous patterns.
  81. Despite the rich information in HON, there is not an existing tool for visualizing it. We have developed a toolkit for that.
  82. Here is the framework. We map the raw trajectories with useful metadata such as time and location.
  83. Then we build both the FON and HON from the input, and create a mapping between the two.
  84. Given the networks, we provide three levels of exploration, including the global level to identify nodes of interest, individual level, and local level
  85. Design the interface based on five different views
  86. This is how the global shipping network is represented as HON in our visualization software. Users can import heterogeneous data, quickly identify interactive patterns of interest, explore at different granularities, trace propagation pathways and see how they expand the subgraph of ports invaded by certain species, test targeted species control strategies for given ports / shipping routes, and drill down to the raw data to understand the formation and evolution of higher-order dependencies. This software package expedites the discovery of complex patterns and facilitates real-world applications in decision making.
  87. I strongly believe HON has great potentials to be further explored.
  88. Serves as a feature for fraud detection in signatures, where each signature is a trajectory. Eye tracking data. Patient records. Human trafficking.
  89. For example, HON can not only take trajectories as the input, but also diffusion data. We can follow the pathways in the diffusion trees to build observations. Therefore, it is possible to analyze problems like information diffusion with HON.
  90. We can also discretize the time series as discrete sequential data. Making it possible to analyze stock market or health data with HON.
  91. It is also possible to chain pairwise interactions together. For example, if we assume phone calls made within 10 minutes are related, we can build phone call cascades, and use that as the input of HON.
  92. Overall, we have placed an important piece for representing raw data with networks
  93. Thanks to our funding agencies,
  94. Thanks to our funding agencies,
  95. Unlike the hypergraph, which requires network algorithms to be designed specifically for the network,
  96. Use this knowledge, we propose the optimized algorithm.
  97. It is a lazy algorithm, that it builds first-order observations only
  98. And try the expansion
  99. Higher-order observations are only generated when necessary.
  100. Now that if the maximum possible divergence has already fallen below the threshold, the algorithm stops trying to look for even higher orders.
  101. We construct a synthetic data set with known higher-order dependencies as the ground truth. We generate 10M ship movements, and inject variable orders of dependencies. The result shows that our method correctly captures these variable orders of dependencies, does not extract false dependencies beyond the fourth order, and keeps all other dependencies the first-order for compactness.
  102. Additionally, …
  103. Appendix