SlideShare a Scribd company logo
1 of 1
Download to read offline
12
Data
• Ship movements
• Web clickstreams
• Phone call cascades
• … …
Network
representation
• Global shipping
network
• Web traffic network
• Social network
• … …
Network
analysis
• Clustering
• Ranking
• Link prediction
• Anomaly detection
• … …
Representing higher-order dependencies in networks
and how it improves the result of a variety of tasks such as random walking, clustering, and ranking
Prerequisite: To ensure the correctness of network analysis meth-
ods, the network as the input has to be a precise representation of
the underlying data.
Question: How to effectively represent data as networks, without
losing crucial information about higher-order dependencies?
Our answer: We propose the Higher-order Network (HON) that
can embed higher-orders of dependencies to represent data more
accurately, can use variable orders for concise representation, and
is directly compatible with the existing suite of network analysis
methods.
Overview
What people usually do: Direct conversion, the sum of pairwise con-
nections in the raw data --> the edge weights in the network.
What is assumed: The Markov property (first-order dependency).
What does it mean: When movements are simulated on the net-
work, where the flow moves next dependes only on its current node.
Example: In this first-order network weighted by shipping frequen-
cy, a ship at Singapore has similar probabilities of going to Los Ange-
les and Seattle, no matter where the ship came to Singapore from;
but in fact ships from Tokyo are more likely to go to Seattle, and
ships from Shanghai are more likely to go to Los Angeles.
Problem of first-order representation
What people want: A network representing the complex system.
In reality: The network is usually not directly available.
What people have: Recorded sequences of events, such as trajec-
tories of vehicles, retweets of messages, streams of web clicks, etc.
Example: Global shipping data, containing ship movements.
Representing data as network
What we do: Break down nodes into higher-order
nodes that carry different dependency relationships.
Example: In this higher-order network, by breaking
down the node Singapore, the ship’s next step can
depend on previous steps when following different
paths to Singapore, thus more accurately simulate
movement patterns in data.
Compatibility: The data structure is consistent with
the conventional network representation, allowing
for a variety of network analysis methods and algo-
rithms to run on HON without modification.
Proposed higher-order network (HON)
Improving the result of a variety of network analysis methods
Random walking
Experiment: See how well random walkers on different network representations
of the same global shipping data can simulate the true ship movements in the
raw data.
Result: The accuracy of simulating one step on HON doubles that of the conven-
tional first-order network, and is higher by one magnitude when simulating
three steps (algorithms based on random walking such as PageRank usually
need to simulate multiple steps of movement).
Clustering
Experiment: Given the global ship movements which drives the diffusion of
invasion species, compare the clustering of ports (ports tightly coupled by spe-
cies flows) on different network representations.
Result on first order network: (Fig. A) non-overlapping clusters are generated.
Although Valetta and Malta Freeport are local and international ports respec-
tively, the clustering result does not distinguish the two.
Result on HON: (Fig. B) clusters may overlap, and international ports (such as
Malta Freeport) are effectively identified by belonging to multiple clusters and
potentially suffering from multiple sources of invasions. Fourty-four ports
belong to five clusters, including NewYork, Shanghai, Gibraltar, and so on.
What does it mean: Species may be introduced from multiple previous ports a
ship has stopped at; these indirect species introduction pathways are already
captured by HON and factored in the clustering.
Ranking
Experiment: With a clickstream data set recording how users navigate
through Web pages, compare Web PageRank scores by using HON in-
stead of first-order network representation.
Result: More than 90% of the pages show a decrease of PageRank
scores, such as pages of news personnels; while a few pages gain consid-
erable PageRank scores, such as weather forecasts and obituaries.
Why such changes: A case study: (Fig. A) the first-order network repre-
sentation indicates that a user is likely to go back to the homepage after
viewing or uploading snow photos. (Fig. B) the HON representation uses
additional higher-order nodes and edges to represent a natural scenario
that once a user views and uploads a photo, the user is likely to repeat
this process to upload more photos and is less likely to go back to the
home page.
What does it mean: HON can better“guide”random walkers to simulate
more complex movements such as user’s non-Markovian web browsing
behaviors, thus HON may be used to improve the ranking results of web
ranking, citation ranking, keyphrase extraction, and more, without modi-
fication of the PageRank algorithm.
Problem: Previous work using fixed order for networks will result in exponential growth of
network size when higher orders are incorporated due to combinatorial explosion.
Question: How to build a scalable representation for complex data with high orders of de-
pendencies, and how much more space is needed to represent higher-order dependencies?
Our solution: Use variable orders of dependency, and add higher-orders nodes and edges to
a first-order network only where necessary.
Compact representation: HON has less nodes and half the number of edges compared with
the fixed second-order network. Even when increasing the maximum order to five, HON still
has less edges than the fixed second-order network, while all the useful dependencies up to
the fifth order are incorporated in the network.
Speeding up analyses: Another important advantage of HON over a fixed-order network is
that network analysis algorithms can run faster on HON, due to HON’s compact representa-
tion. In addition, HON is sparser than the fixed-order representation, and many network tool-
kits are optimized for sparse networks. Compared with the fixed second-order network, these
tasks run almost two times faster on HON with a maximum order of two, and about the same
speed on HON with a maximum order of five (which embeds more higher-order dependen-
cies and is more accurate).
Scalability of HON
Hypergraph: Although its edges can connect to multiple nodes simulta-
neously, the nodes are unordered, thus cannot represent dependencies.
Fixed second order network: (Rosvall et al.) is effective but does not scale
well. It is an overkill when first order is enough, and insufficient when
higher orders are needed. HON scales better by using variable orders.
Variable Order Markov: For a network representation, VOM generates
unnecessary probabilities while failing to capture some necessary proba-
bilities, thus not guaranteeing a network representation.
Related works and why we need HONContributions at a glance
Full paper: DOI: 10.1126/sciadv.1600028 or scan QR code
Science Advances 2, e1600028 (2016)
Source code is available at https://github.com/xyjprc/hon
Collaborations and discussions are welcomed.
More information
Acknowledgements: This work is based on research supported by the University of Notre Dame
Office of Research via Environmental Change Initiative (ECI) and the National Science Foundation (NSF)
Awards EF-1427157, IIS-1447795 and BCS-1229450; the research was supported by the Army Research Labo-
ratory (ARL), and was accomplished under Cooperative Agreement Number W911NF-09-2-0053 [the ARL
Network Science Collaborative Technology Alliance (NSCTA). We thank David Lodge,Yuxiao Dong and Reid
Johnson for their valuable comments.
Accurate: HON is a more accurate network representation that is reflec-
tive of the underlying real-world phenomena by embedding higher
order dependencies.
Compact: Use higher orders only where necessary, scales well.
Compatible: HON is consistent with existing network representation by
using simple nodes and edges only, so that existing network analysis
tools can be applied on HON directly with no modifications.
Algorithms: Guarantee that HON can extract and represent arbitrary
orders mixed in the same data set.
Jian Xu (jxu5@nd.edu) Thanuka Wickramarathne (twickram@nd.edu) Nitesh Chawla (nchawla@nd.edu, corresponding author)
Los Angeles
SeaƩle
Singapore|Tokyo
Singapore|Shanghai
Los Angeles
SeaƩle
Tokyo
Shanghai
Tokyo
Shanghai
Singapore
Shanghai|·
Tokyo|·
Busan, Tianjin
Busan, Dalian
...
Tokyo
Singapore
...
Singapore|Shanghai
Singapore|·
Singapore|Tokyo
Singapore|Tokyo, Busan, Tianjin
Singapore|Tokyo, Busan, Dalian
Shanghai
data
er network
rder network (HON)
orders of dependencies in HON
Vessel Depart Sailing_date Arrive Arrival_date
V-001 Shanghai 2013-01-01 Singapore 2013-01-15
V-001 Singapore 2013-01-16 Los Angeles 2013-02-05
V-002 Singapore 2013-02-01 Los Angeles 2013-03-08
… … … … … … … … … …
Los Angeles
SeaƩle
Singapore|Tokyo
Singapore|Shanghai
Los Angeles
SeaƩle
Tokyo
Shanghai
Tokyo
Shanghai
Singapore
B
C
Shanghai|·
Tokyo|·
Tokyo|Busan, Tianjin
Tokyo|Busan, Dalian
...
Tokyo
Singapore
...
Singapore|Shanghai
Singapore|·
Singapore|Tokyo
Singapore|Tokyo, Busan, Tianjin
Singapore|Tokyo, Busan, Dalian
Shanghai
D
Original data
First-order network
Higher-order network (HON)
Variable orders of dependencies in HON
Vessel Depart Sailing_date Arrive Arrival_date
V-001 Shanghai 2013-01-01 Singapore 2013-01-15
V-001 Singapore 2013-01-16 Los Angeles 2013-02-05
V-002 Singapore 2013-02-01 Los Angeles 2013-03-08
… … … … … … … … … …
A
Los Angeles
SeaƩle
Singapore|Tokyo
Singapore|Shanghai
Los Angeles
SeaƩle
Tokyo
Shanghai
Tokyo
Shanghai
Singapore
B
C
Shanghai|·
Tokyo|·
Tokyo|Busan, Tianjin
Tokyo|Busan, Dalian
...
Tokyo
Singapore
...
Singapore|Shanghai
Singapore|·
Singapore|Tokyo
Singapore|Tokyo, Busan, Tianjin
Singapore|Tokyo, Busan, Dalian
Shanghai
D
Original data
First-order network
Higher-order network (HON)
Variable orders of dependencies in HON
Vessel Depart Sailing_date Arrive Arrival_date
V-001 Shanghai 2013-01-01 Singapore 2013-01-15
V-001 Singapore 2013-01-16 Los Angeles 2013-02-05
V-002 Singapore 2013-02-01 Los Angeles 2013-03-08
… … … … … … … … … …
A
By assuming an order of two for the whole network, the accuracies
on the fixed second-order network increase considerably as in Fig. 2D,
because the network structure can help the random walker remember
its last two steps. Meanwhile, the accuracies on HON with a maxi-
mum order of two are comparable and slightly better than the fixed
second-order network, because HON can capture second-order de-
pendencies while avoiding the overfitting caused by splitting all
first-order nodes into second-order nodes. Increasing the maximum
order of HON can further improve the accuracy and lower the entro-
py rate; particularly, ship movements in bigger loops need more steps
of memory and can only be captured with higher orders, as reflected
in Table 1, where the probability of returning in three steps increases
from 7.3 to 16.4% when increasing the maximum order from two to
three in HON. By increasing the maximum order to five, HON can
capture all dependencies below or equal to the fifth order, and the
accuracy of simulating one step on HON doubles that of the
conventional first-order network.
Furthermore, when simulating multiple steps, the advantage of
using HON is even bigger. The reason is that in a first-order network,
a random walker “forgets” where it came from after each step and has
a higher chance of disobeying higher-order movement patterns. This
error is amplified quickly in a few steps—the accuracy of simulating
three steps on the first-order network is almost zero. On the contrary,
in HON, the higher-order nodes and edges can help the random walk-
er remember where it came from and provide the corresponding
probability distributions for the next step. As a consequence, the
on random walking, following the intuition that random walkers are
more likely to move within the same cluster rather than between different
clusters. Because using HON instead of a first-order network alters the
movement patterns of random walkers on the network, a compelling
question becomes: How does HON affect the clustering results?
Consider an important real-world application of clustering: identi-
fying regions wherein aquatic species invasions are likely to happen.
Because the global shipping network is the dominant global vector for
the unintentional translocation of non-native aquatic species (45)
[species get translocated either during ballast water uptake/discharge
or by accumulating on the surfaces of ships (11)], identifying clusters
of ports that are tightly coupled by frequent shipping can reveal ports
that are likely to introduce non-native species to each other. The lim-
itation of the existing approach (10) is that the clustering is based on a
first-order network that only accounts for direct species flows, whereas
in reality the species introduced to a port by a ship may also come
from multiple previous ports at which the ship has stopped because
of partial ballast water exchanges and hull fouling. These indirect spe-
cies introduction pathways driven by ship movements are already cap-
tured by HON and can influence the clustering result. As represented
by the HON example in Fig. 1C, following the most likely shipping
route, species are more likely to be introduced to Los Angeles from
Shanghai (via Singapore) rather than from Tokyo, so the clustering
(driven by random walking) on HON prefers grouping Los Angeles
with Shanghai rather than with Tokyo. In comparison, indirect species
introduction pathways are ignored when performing clustering on a
Table. 1. Comparing different network representations of the same global shipping data.
Network
representation
No. of
edges
No. of
nodes
Network
density
Probability of
returning after two
steps
Probability of
returning after three
steps
Entropy
rate (bits)
Clustering
time (min)
Ranking
time (s)
Conventional first-
order
31,028 2,675 4.3 × 10−3
10.7% 1.5% 3.44 4 1.3
Fixed second-
order
116,611 19,182 3.2 × 10−4
42.8% 8.0% 1.45 73 7.7
HON, maximum
order of two
64,914 17,235 2.2 × 10−4
41.7% 7.3% 1.46 45 4.8
HON, maximum
order of three
78,415 26,577 1.1 × 10−4
45.9% 16.4% 0.90 63 6.2
HON, maximum
order of four
83,480 30,631 8.9 × 10−5
48.9% 18.5% 0.68 67 7.0
HON, maximum
order of five
85,025 31,854 8.4 × 10−5
49.3% 19.2% 0.63 68 7.6
R E S E A R C H A R T I C L E
Los Angeles
SeaƩle
Singapore|Tokyo
Singapore|Shanghai
Los Angeles
SeaƩle
Tokyo
Shanghai
Tokyo
Shanghai
Singapore
B
C
Shanghai|·
Tokyo|·
Tokyo|Busan, Tianjin
Tokyo|Busan, Dalian
...
Tokyo
Singapore
...
Singapore|Shanghai
Singapore|·
Singapore|Tokyo
Singapore|Tokyo, Busan, Tianjin
Singapore|Tokyo, Busan, Dalian
Shanghai
D
Original data
First-order network
Higher-order network (HON)
Variable orders of dependencies in HON
Vessel Depart Sailing_date Arrive Arrival_date
V-001 Shanghai 2013-01-01 Singapore 2013-01-15
V-001 Singapore 2013-01-16 Los Angeles 2013-02-05
V-002 Singapore 2013-02-01 Los Angeles 2013-03-08
… … … … … … … … … …
A
Simulate 3 stepsSimulate 2 steps
Ship-1
Ship-2
Ship-3
Training Testing
a
a
a
a
a
a
a
a
a
a
a
b b b b
b b
c
b
b
b
c c c
First-order network
HON
ab c
a
b a|b
c a|c
Prediction
a
a
a
b b
b c
c b
Prediction
a
a
a
b b
b b
c c
3/3 correct
1/3 correct
A
B
C
D
0%
10%
20%
30%
Accuracy
Simulate 1 step
1st
order
network
2nd
order
network
HON, max
order 2
HON, max
order 3
HON, max
order 4
HON, max
order 5
A
B
Valletta
Malta Freeport
Valletta
Malta Freeport
A First-order network
View photo Upload photo
WDBJ7 home
Other pages
B HON
View photo
WDBJ7 home
Other pages
View photo|Upload photo Upload photo|View photo
Upload photo
Invasive species modeling & prediction: Species may be carried unintentionally by ships
from port to port and cause invasions. Thus, ship movements connect ports in the world
in an implicit species flow network. By identifying higher-order dependencies in ship
movements, namely where a ship is more likely to go next given its previous steps, ship
movements and species flow dynamics can be modeled more accurately.
Social interactions & information diffusion: More accurately representing flow of infor-
mation on networks can give a more accurate representation of the complex social inter-
actions and the flow of information, which can be of interest for telecom companies,
social media and so on.
Healthcare: Representing and highlighting complex patterns in sequences of symptoms,
and revealing their relationships with genes and patient features.
Traffic data: Representing complex taxi movements and human trajectories and high-
lighting emerging movement patterns, which the government can leverage for urban
planning, and merchants can use for customer behavior analysis and prediction.
Anomaly detection: The ability to extract and represent higher-order navigation patterns
can also be used to analyze web clickstreams and network access patterns, with potential
applications from website optimization to intruder detection (based on anomalous
access patterns) for security and defense.
Interdisciplinary applications

More Related Content

What's hot

A3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networksA3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networks
Zhenyun Zhuang
 
Client-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hostsClient-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hosts
Zhenyun Zhuang
 

What's hot (20)

Efficient Utilization of Bandwidth in Location Aided Routing
Efficient Utilization of Bandwidth in Location Aided RoutingEfficient Utilization of Bandwidth in Location Aided Routing
Efficient Utilization of Bandwidth in Location Aided Routing
 
Architecture and Evaluation on Cooperative Caching In Wireless P2P
Architecture and Evaluation on Cooperative Caching In Wireless  P2PArchitecture and Evaluation on Cooperative Caching In Wireless  P2P
Architecture and Evaluation on Cooperative Caching In Wireless P2P
 
Network software
Network softwareNetwork software
Network software
 
S06 S10 P05
S06 S10 P05S06 S10 P05
S06 S10 P05
 
Ag26204209
Ag26204209Ag26204209
Ag26204209
 
Destination Aware APU Strategy for Geographic Routing in MANET
Destination Aware APU Strategy for Geographic Routing in MANETDestination Aware APU Strategy for Geographic Routing in MANET
Destination Aware APU Strategy for Geographic Routing in MANET
 
Cost Effective Routing Protocols Based on Two Hop Neighborhood Information (2...
Cost Effective Routing Protocols Based on Two Hop Neighborhood Information (2...Cost Effective Routing Protocols Based on Two Hop Neighborhood Information (2...
Cost Effective Routing Protocols Based on Two Hop Neighborhood Information (2...
 
Computer network
Computer networkComputer network
Computer network
 
IRJET- Simulation Analysis of a New Startup Algorithm for TCP New Reno
IRJET- Simulation Analysis of a New Startup Algorithm for TCP New RenoIRJET- Simulation Analysis of a New Startup Algorithm for TCP New Reno
IRJET- Simulation Analysis of a New Startup Algorithm for TCP New Reno
 
A3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networksA3: application-aware acceleration for wireless data networks
A3: application-aware acceleration for wireless data networks
 
Concept of node usage probability from complex networks and its applications ...
Concept of node usage probability from complex networks and its applications ...Concept of node usage probability from complex networks and its applications ...
Concept of node usage probability from complex networks and its applications ...
 
Client-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hostsClient-side web acceleration for low-bandwidth hosts
Client-side web acceleration for low-bandwidth hosts
 
WebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hostsWebAccel: Accelerating Web access for low-bandwidth hosts
WebAccel: Accelerating Web access for low-bandwidth hosts
 
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
Spatial ReusabilityAware Routing in Multi-Hop Wireless Networks Using Single ...
 
Comprehensive Path Quality Measurement in Wireless Sensor Networks
Comprehensive Path Quality Measurement in Wireless Sensor NetworksComprehensive Path Quality Measurement in Wireless Sensor Networks
Comprehensive Path Quality Measurement in Wireless Sensor Networks
 
Opportunistic Data Forwarding in Manet
Opportunistic Data Forwarding in ManetOpportunistic Data Forwarding in Manet
Opportunistic Data Forwarding in Manet
 
Evaluation of a topological distance
Evaluation of a topological distanceEvaluation of a topological distance
Evaluation of a topological distance
 
IEEE 2014 JAVA NETWORKING PROJECTS Hop by-hop routing in wireless mesh networ...
IEEE 2014 JAVA NETWORKING PROJECTS Hop by-hop routing in wireless mesh networ...IEEE 2014 JAVA NETWORKING PROJECTS Hop by-hop routing in wireless mesh networ...
IEEE 2014 JAVA NETWORKING PROJECTS Hop by-hop routing in wireless mesh networ...
 
Token Based Packet Loss Control Mechanism for Networks
Token Based Packet Loss Control Mechanism for NetworksToken Based Packet Loss Control Mechanism for Networks
Token Based Packet Loss Control Mechanism for Networks
 
An Insight on Routing
An Insight on RoutingAn Insight on Routing
An Insight on Routing
 

Viewers also liked

Learning the dance of change
Learning the dance of changeLearning the dance of change
Learning the dance of change
LDERMINE
 
[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...
[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...
[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...
AgroTalento
 

Viewers also liked (18)

La inteligencia artificial con un enfoque humano
La inteligencia artificial con un enfoque humanoLa inteligencia artificial con un enfoque humano
La inteligencia artificial con un enfoque humano
 
Presentacion_Marcia_otalora_fase_3_403003_457
Presentacion_Marcia_otalora_fase_3_403003_457Presentacion_Marcia_otalora_fase_3_403003_457
Presentacion_Marcia_otalora_fase_3_403003_457
 
Actividad de analisis 2
Actividad de analisis 2Actividad de analisis 2
Actividad de analisis 2
 
Encuesta de comportamientos y actitudes sobre sexualidad en niñas, niños y ad...
Encuesta de comportamientos y actitudes sobre sexualidad en niñas, niños y ad...Encuesta de comportamientos y actitudes sobre sexualidad en niñas, niños y ad...
Encuesta de comportamientos y actitudes sobre sexualidad en niñas, niños y ad...
 
Fred Miles Junior Developer Resume
Fred Miles Junior Developer ResumeFred Miles Junior Developer Resume
Fred Miles Junior Developer Resume
 
Our favourites(1)
Our favourites(1)Our favourites(1)
Our favourites(1)
 
05 village aid programme
05 village aid programme05 village aid programme
05 village aid programme
 
Grilla de evaluación moodle
Grilla de evaluación moodleGrilla de evaluación moodle
Grilla de evaluación moodle
 
PhotoShop
PhotoShopPhotoShop
PhotoShop
 
Learning the dance of change
Learning the dance of changeLearning the dance of change
Learning the dance of change
 
Γρανάζια
ΓρανάζιαΓρανάζια
Γρανάζια
 
EVOLUCIÓN DE LAS TÉCNICAS DE VIDEO
EVOLUCIÓN DE LAS TÉCNICAS DE VIDEOEVOLUCIÓN DE LAS TÉCNICAS DE VIDEO
EVOLUCIÓN DE LAS TÉCNICAS DE VIDEO
 
[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...
[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...
[Palestra] Temple Grandin: Princípios de comportamento animal - como manejar ...
 
TI en Las Organizaciones - Cadena de Valor - Actividades de Soporte
TI en Las Organizaciones - Cadena de Valor - Actividades de SoporteTI en Las Organizaciones - Cadena de Valor - Actividades de Soporte
TI en Las Organizaciones - Cadena de Valor - Actividades de Soporte
 
5 Tips For A Successful Ending To Your Presentation - Slideshow
5 Tips For A Successful Ending To Your Presentation - Slideshow5 Tips For A Successful Ending To Your Presentation - Slideshow
5 Tips For A Successful Ending To Your Presentation - Slideshow
 
Proses Kembalinya Indonesia menjadi Negara Kesatuan
Proses Kembalinya Indonesia menjadi Negara KesatuanProses Kembalinya Indonesia menjadi Negara Kesatuan
Proses Kembalinya Indonesia menjadi Negara Kesatuan
 
ANIMALIAK hiztegia idatzi
ANIMALIAK hiztegia idatziANIMALIAK hiztegia idatzi
ANIMALIAK hiztegia idatzi
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 

Similar to HON_NetSci_2016

Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docxPeer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
herbertwilson5999
 
CoryCookFinalProject535
CoryCookFinalProject535CoryCookFinalProject535
CoryCookFinalProject535
Cory Cook
 
Software Defined Networking
Software Defined NetworkingSoftware Defined Networking
Software Defined Networking
Anshuman Singh
 
Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892
Aswini Badatya
 
Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892
Saumendra Pradhan
 
Ncct Ieee Software Abstract Collection Volume 2 50+ Abst
Ncct   Ieee Software Abstract Collection Volume 2   50+ AbstNcct   Ieee Software Abstract Collection Volume 2   50+ Abst
Ncct Ieee Software Abstract Collection Volume 2 50+ Abst
ncct
 

Similar to HON_NetSci_2016 (20)

presentation_SB_v01
presentation_SB_v01presentation_SB_v01
presentation_SB_v01
 
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer NetworksScale-Free Networks to Search in Unstructured Peer-To-Peer Networks
Scale-Free Networks to Search in Unstructured Peer-To-Peer Networks
 
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docxPeer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
Peer-to-Peer Architecture Case Study Gnutella NetworkMate.docx
 
Representing higher-order dependencies in networks: methods, applications, an...
Representing higher-order dependencies in networks: methods, applications, an...Representing higher-order dependencies in networks: methods, applications, an...
Representing higher-order dependencies in networks: methods, applications, an...
 
ECCS 2010
ECCS 2010ECCS 2010
ECCS 2010
 
Networking
NetworkingNetworking
Networking
 
Chapter_4_v8.0.pptx
Chapter_4_v8.0.pptxChapter_4_v8.0.pptx
Chapter_4_v8.0.pptx
 
International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)International Refereed Journal of Engineering and Science (IRJES)
International Refereed Journal of Engineering and Science (IRJES)
 
F233842
F233842F233842
F233842
 
CoryCookFinalProject535
CoryCookFinalProject535CoryCookFinalProject535
CoryCookFinalProject535
 
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P System
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P SystemNon Path-Based Mutual Anonymity Protocol for Decentralized P2P System
Non Path-Based Mutual Anonymity Protocol for Decentralized P2P System
 
Software Defined Networking
Software Defined NetworkingSoftware Defined Networking
Software Defined Networking
 
Outdoor lighting system
Outdoor lighting systemOutdoor lighting system
Outdoor lighting system
 
Computer network solution
Computer network solutionComputer network solution
Computer network solution
 
Osi model with neworking overview
Osi model with neworking overviewOsi model with neworking overview
Osi model with neworking overview
 
Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892
 
Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892Osimodelwithneworkingoverview 150618094119-lva1-app6892
Osimodelwithneworkingoverview 150618094119-lva1-app6892
 
Ncct Ieee Software Abstract Collection Volume 2 50+ Abst
Ncct   Ieee Software Abstract Collection Volume 2   50+ AbstNcct   Ieee Software Abstract Collection Volume 2   50+ Abst
Ncct Ieee Software Abstract Collection Volume 2 50+ Abst
 
Analysis of Random Based Mobility Model using TCP Traffic for AODV and DSDV M...
Analysis of Random Based Mobility Model using TCP Traffic for AODV and DSDV M...Analysis of Random Based Mobility Model using TCP Traffic for AODV and DSDV M...
Analysis of Random Based Mobility Model using TCP Traffic for AODV and DSDV M...
 
Iisrt komathi krishna (networks)
Iisrt komathi krishna (networks)Iisrt komathi krishna (networks)
Iisrt komathi krishna (networks)
 

HON_NetSci_2016

  • 1. 12 Data • Ship movements • Web clickstreams • Phone call cascades • … … Network representation • Global shipping network • Web traffic network • Social network • … … Network analysis • Clustering • Ranking • Link prediction • Anomaly detection • … … Representing higher-order dependencies in networks and how it improves the result of a variety of tasks such as random walking, clustering, and ranking Prerequisite: To ensure the correctness of network analysis meth- ods, the network as the input has to be a precise representation of the underlying data. Question: How to effectively represent data as networks, without losing crucial information about higher-order dependencies? Our answer: We propose the Higher-order Network (HON) that can embed higher-orders of dependencies to represent data more accurately, can use variable orders for concise representation, and is directly compatible with the existing suite of network analysis methods. Overview What people usually do: Direct conversion, the sum of pairwise con- nections in the raw data --> the edge weights in the network. What is assumed: The Markov property (first-order dependency). What does it mean: When movements are simulated on the net- work, where the flow moves next dependes only on its current node. Example: In this first-order network weighted by shipping frequen- cy, a ship at Singapore has similar probabilities of going to Los Ange- les and Seattle, no matter where the ship came to Singapore from; but in fact ships from Tokyo are more likely to go to Seattle, and ships from Shanghai are more likely to go to Los Angeles. Problem of first-order representation What people want: A network representing the complex system. In reality: The network is usually not directly available. What people have: Recorded sequences of events, such as trajec- tories of vehicles, retweets of messages, streams of web clicks, etc. Example: Global shipping data, containing ship movements. Representing data as network What we do: Break down nodes into higher-order nodes that carry different dependency relationships. Example: In this higher-order network, by breaking down the node Singapore, the ship’s next step can depend on previous steps when following different paths to Singapore, thus more accurately simulate movement patterns in data. Compatibility: The data structure is consistent with the conventional network representation, allowing for a variety of network analysis methods and algo- rithms to run on HON without modification. Proposed higher-order network (HON) Improving the result of a variety of network analysis methods Random walking Experiment: See how well random walkers on different network representations of the same global shipping data can simulate the true ship movements in the raw data. Result: The accuracy of simulating one step on HON doubles that of the conven- tional first-order network, and is higher by one magnitude when simulating three steps (algorithms based on random walking such as PageRank usually need to simulate multiple steps of movement). Clustering Experiment: Given the global ship movements which drives the diffusion of invasion species, compare the clustering of ports (ports tightly coupled by spe- cies flows) on different network representations. Result on first order network: (Fig. A) non-overlapping clusters are generated. Although Valetta and Malta Freeport are local and international ports respec- tively, the clustering result does not distinguish the two. Result on HON: (Fig. B) clusters may overlap, and international ports (such as Malta Freeport) are effectively identified by belonging to multiple clusters and potentially suffering from multiple sources of invasions. Fourty-four ports belong to five clusters, including NewYork, Shanghai, Gibraltar, and so on. What does it mean: Species may be introduced from multiple previous ports a ship has stopped at; these indirect species introduction pathways are already captured by HON and factored in the clustering. Ranking Experiment: With a clickstream data set recording how users navigate through Web pages, compare Web PageRank scores by using HON in- stead of first-order network representation. Result: More than 90% of the pages show a decrease of PageRank scores, such as pages of news personnels; while a few pages gain consid- erable PageRank scores, such as weather forecasts and obituaries. Why such changes: A case study: (Fig. A) the first-order network repre- sentation indicates that a user is likely to go back to the homepage after viewing or uploading snow photos. (Fig. B) the HON representation uses additional higher-order nodes and edges to represent a natural scenario that once a user views and uploads a photo, the user is likely to repeat this process to upload more photos and is less likely to go back to the home page. What does it mean: HON can better“guide”random walkers to simulate more complex movements such as user’s non-Markovian web browsing behaviors, thus HON may be used to improve the ranking results of web ranking, citation ranking, keyphrase extraction, and more, without modi- fication of the PageRank algorithm. Problem: Previous work using fixed order for networks will result in exponential growth of network size when higher orders are incorporated due to combinatorial explosion. Question: How to build a scalable representation for complex data with high orders of de- pendencies, and how much more space is needed to represent higher-order dependencies? Our solution: Use variable orders of dependency, and add higher-orders nodes and edges to a first-order network only where necessary. Compact representation: HON has less nodes and half the number of edges compared with the fixed second-order network. Even when increasing the maximum order to five, HON still has less edges than the fixed second-order network, while all the useful dependencies up to the fifth order are incorporated in the network. Speeding up analyses: Another important advantage of HON over a fixed-order network is that network analysis algorithms can run faster on HON, due to HON’s compact representa- tion. In addition, HON is sparser than the fixed-order representation, and many network tool- kits are optimized for sparse networks. Compared with the fixed second-order network, these tasks run almost two times faster on HON with a maximum order of two, and about the same speed on HON with a maximum order of five (which embeds more higher-order dependen- cies and is more accurate). Scalability of HON Hypergraph: Although its edges can connect to multiple nodes simulta- neously, the nodes are unordered, thus cannot represent dependencies. Fixed second order network: (Rosvall et al.) is effective but does not scale well. It is an overkill when first order is enough, and insufficient when higher orders are needed. HON scales better by using variable orders. Variable Order Markov: For a network representation, VOM generates unnecessary probabilities while failing to capture some necessary proba- bilities, thus not guaranteeing a network representation. Related works and why we need HONContributions at a glance Full paper: DOI: 10.1126/sciadv.1600028 or scan QR code Science Advances 2, e1600028 (2016) Source code is available at https://github.com/xyjprc/hon Collaborations and discussions are welcomed. More information Acknowledgements: This work is based on research supported by the University of Notre Dame Office of Research via Environmental Change Initiative (ECI) and the National Science Foundation (NSF) Awards EF-1427157, IIS-1447795 and BCS-1229450; the research was supported by the Army Research Labo- ratory (ARL), and was accomplished under Cooperative Agreement Number W911NF-09-2-0053 [the ARL Network Science Collaborative Technology Alliance (NSCTA). We thank David Lodge,Yuxiao Dong and Reid Johnson for their valuable comments. Accurate: HON is a more accurate network representation that is reflec- tive of the underlying real-world phenomena by embedding higher order dependencies. Compact: Use higher orders only where necessary, scales well. Compatible: HON is consistent with existing network representation by using simple nodes and edges only, so that existing network analysis tools can be applied on HON directly with no modifications. Algorithms: Guarantee that HON can extract and represent arbitrary orders mixed in the same data set. Jian Xu (jxu5@nd.edu) Thanuka Wickramarathne (twickram@nd.edu) Nitesh Chawla (nchawla@nd.edu, corresponding author) Los Angeles SeaƩle Singapore|Tokyo Singapore|Shanghai Los Angeles SeaƩle Tokyo Shanghai Tokyo Shanghai Singapore Shanghai|· Tokyo|· Busan, Tianjin Busan, Dalian ... Tokyo Singapore ... Singapore|Shanghai Singapore|· Singapore|Tokyo Singapore|Tokyo, Busan, Tianjin Singapore|Tokyo, Busan, Dalian Shanghai data er network rder network (HON) orders of dependencies in HON Vessel Depart Sailing_date Arrive Arrival_date V-001 Shanghai 2013-01-01 Singapore 2013-01-15 V-001 Singapore 2013-01-16 Los Angeles 2013-02-05 V-002 Singapore 2013-02-01 Los Angeles 2013-03-08 … … … … … … … … … … Los Angeles SeaƩle Singapore|Tokyo Singapore|Shanghai Los Angeles SeaƩle Tokyo Shanghai Tokyo Shanghai Singapore B C Shanghai|· Tokyo|· Tokyo|Busan, Tianjin Tokyo|Busan, Dalian ... Tokyo Singapore ... Singapore|Shanghai Singapore|· Singapore|Tokyo Singapore|Tokyo, Busan, Tianjin Singapore|Tokyo, Busan, Dalian Shanghai D Original data First-order network Higher-order network (HON) Variable orders of dependencies in HON Vessel Depart Sailing_date Arrive Arrival_date V-001 Shanghai 2013-01-01 Singapore 2013-01-15 V-001 Singapore 2013-01-16 Los Angeles 2013-02-05 V-002 Singapore 2013-02-01 Los Angeles 2013-03-08 … … … … … … … … … … A Los Angeles SeaƩle Singapore|Tokyo Singapore|Shanghai Los Angeles SeaƩle Tokyo Shanghai Tokyo Shanghai Singapore B C Shanghai|· Tokyo|· Tokyo|Busan, Tianjin Tokyo|Busan, Dalian ... Tokyo Singapore ... Singapore|Shanghai Singapore|· Singapore|Tokyo Singapore|Tokyo, Busan, Tianjin Singapore|Tokyo, Busan, Dalian Shanghai D Original data First-order network Higher-order network (HON) Variable orders of dependencies in HON Vessel Depart Sailing_date Arrive Arrival_date V-001 Shanghai 2013-01-01 Singapore 2013-01-15 V-001 Singapore 2013-01-16 Los Angeles 2013-02-05 V-002 Singapore 2013-02-01 Los Angeles 2013-03-08 … … … … … … … … … … A By assuming an order of two for the whole network, the accuracies on the fixed second-order network increase considerably as in Fig. 2D, because the network structure can help the random walker remember its last two steps. Meanwhile, the accuracies on HON with a maxi- mum order of two are comparable and slightly better than the fixed second-order network, because HON can capture second-order de- pendencies while avoiding the overfitting caused by splitting all first-order nodes into second-order nodes. Increasing the maximum order of HON can further improve the accuracy and lower the entro- py rate; particularly, ship movements in bigger loops need more steps of memory and can only be captured with higher orders, as reflected in Table 1, where the probability of returning in three steps increases from 7.3 to 16.4% when increasing the maximum order from two to three in HON. By increasing the maximum order to five, HON can capture all dependencies below or equal to the fifth order, and the accuracy of simulating one step on HON doubles that of the conventional first-order network. Furthermore, when simulating multiple steps, the advantage of using HON is even bigger. The reason is that in a first-order network, a random walker “forgets” where it came from after each step and has a higher chance of disobeying higher-order movement patterns. This error is amplified quickly in a few steps—the accuracy of simulating three steps on the first-order network is almost zero. On the contrary, in HON, the higher-order nodes and edges can help the random walk- er remember where it came from and provide the corresponding probability distributions for the next step. As a consequence, the on random walking, following the intuition that random walkers are more likely to move within the same cluster rather than between different clusters. Because using HON instead of a first-order network alters the movement patterns of random walkers on the network, a compelling question becomes: How does HON affect the clustering results? Consider an important real-world application of clustering: identi- fying regions wherein aquatic species invasions are likely to happen. Because the global shipping network is the dominant global vector for the unintentional translocation of non-native aquatic species (45) [species get translocated either during ballast water uptake/discharge or by accumulating on the surfaces of ships (11)], identifying clusters of ports that are tightly coupled by frequent shipping can reveal ports that are likely to introduce non-native species to each other. The lim- itation of the existing approach (10) is that the clustering is based on a first-order network that only accounts for direct species flows, whereas in reality the species introduced to a port by a ship may also come from multiple previous ports at which the ship has stopped because of partial ballast water exchanges and hull fouling. These indirect spe- cies introduction pathways driven by ship movements are already cap- tured by HON and can influence the clustering result. As represented by the HON example in Fig. 1C, following the most likely shipping route, species are more likely to be introduced to Los Angeles from Shanghai (via Singapore) rather than from Tokyo, so the clustering (driven by random walking) on HON prefers grouping Los Angeles with Shanghai rather than with Tokyo. In comparison, indirect species introduction pathways are ignored when performing clustering on a Table. 1. Comparing different network representations of the same global shipping data. Network representation No. of edges No. of nodes Network density Probability of returning after two steps Probability of returning after three steps Entropy rate (bits) Clustering time (min) Ranking time (s) Conventional first- order 31,028 2,675 4.3 × 10−3 10.7% 1.5% 3.44 4 1.3 Fixed second- order 116,611 19,182 3.2 × 10−4 42.8% 8.0% 1.45 73 7.7 HON, maximum order of two 64,914 17,235 2.2 × 10−4 41.7% 7.3% 1.46 45 4.8 HON, maximum order of three 78,415 26,577 1.1 × 10−4 45.9% 16.4% 0.90 63 6.2 HON, maximum order of four 83,480 30,631 8.9 × 10−5 48.9% 18.5% 0.68 67 7.0 HON, maximum order of five 85,025 31,854 8.4 × 10−5 49.3% 19.2% 0.63 68 7.6 R E S E A R C H A R T I C L E Los Angeles SeaƩle Singapore|Tokyo Singapore|Shanghai Los Angeles SeaƩle Tokyo Shanghai Tokyo Shanghai Singapore B C Shanghai|· Tokyo|· Tokyo|Busan, Tianjin Tokyo|Busan, Dalian ... Tokyo Singapore ... Singapore|Shanghai Singapore|· Singapore|Tokyo Singapore|Tokyo, Busan, Tianjin Singapore|Tokyo, Busan, Dalian Shanghai D Original data First-order network Higher-order network (HON) Variable orders of dependencies in HON Vessel Depart Sailing_date Arrive Arrival_date V-001 Shanghai 2013-01-01 Singapore 2013-01-15 V-001 Singapore 2013-01-16 Los Angeles 2013-02-05 V-002 Singapore 2013-02-01 Los Angeles 2013-03-08 … … … … … … … … … … A Simulate 3 stepsSimulate 2 steps Ship-1 Ship-2 Ship-3 Training Testing a a a a a a a a a a a b b b b b b c b b b c c c First-order network HON ab c a b a|b c a|c Prediction a a a b b b c c b Prediction a a a b b b b c c 3/3 correct 1/3 correct A B C D 0% 10% 20% 30% Accuracy Simulate 1 step 1st order network 2nd order network HON, max order 2 HON, max order 3 HON, max order 4 HON, max order 5 A B Valletta Malta Freeport Valletta Malta Freeport A First-order network View photo Upload photo WDBJ7 home Other pages B HON View photo WDBJ7 home Other pages View photo|Upload photo Upload photo|View photo Upload photo Invasive species modeling & prediction: Species may be carried unintentionally by ships from port to port and cause invasions. Thus, ship movements connect ports in the world in an implicit species flow network. By identifying higher-order dependencies in ship movements, namely where a ship is more likely to go next given its previous steps, ship movements and species flow dynamics can be modeled more accurately. Social interactions & information diffusion: More accurately representing flow of infor- mation on networks can give a more accurate representation of the complex social inter- actions and the flow of information, which can be of interest for telecom companies, social media and so on. Healthcare: Representing and highlighting complex patterns in sequences of symptoms, and revealing their relationships with genes and patient features. Traffic data: Representing complex taxi movements and human trajectories and high- lighting emerging movement patterns, which the government can leverage for urban planning, and merchants can use for customer behavior analysis and prediction. Anomaly detection: The ability to extract and represent higher-order navigation patterns can also be used to analyze web clickstreams and network access patterns, with potential applications from website optimization to intruder detection (based on anomalous access patterns) for security and defense. Interdisciplinary applications