Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

3,178 views

Published on

Short lecture on network analysis methods using Python.

Published in:
Technology

No Downloads

Total views

3,178

On SlideShare

0

From Embeds

0

Number of Embeds

30

Shares

0

Downloads

88

Comments

0

Likes

5

No embeds

No notes for slide

“relationships”: This can be as simple as “a relationship exists”, or as complex as “this probability matrix describes the complex relationship between the states in these nodes”

As you look at these examples, I want you to think about the types of questions you could start asking with this network data. For example, in infrastructure, you often only have access to the junctions and the fact that there *is* a connection between two points.

With transport, the dataset gets richer. You not only have the nodes and links between them, you also have timetables that list the average time between stations (and the current state of the network) and the switching costs of changing lines at a station.

Aside: The London Underground map is one of my favorite network visualizations: a wonderful simplification of a complex system.

The bottom line is that, when you look at something as a network, you can start to see which things have the most important relationships in your network, and where to concentrate effort if you want to affect it all (e.g. who do want to retweet your ideas?).

We’re going to look at network analysis at 3 levels: node, group and network.

I’m using computer science language for this. Other groups that study networks and their words for nodes and edges are:

Here: network, node, edge

maths: graph, vertex, arc/edge

Physics: network, site, bond

Sociology: network, actor, relation

Biology: network, node, edge

Different representations are useful for different things (if you’re coding up your own algorithms):

Diagram: good for explaining a network (especially if interactive)

Adjacency matrix: good for dense graphs (can also use scipy.sparse to use this for sparse graphs)

Adjacency list: good for sparse graphs (e.g. social networks tend to be sparse); used by NetworkX

Edge list: good general representation

Maths: good for describing algorithms. V = vertices (nodes); E=edges; e=map from edges to nodes. n is the number of nodes; m is the number of edges

The one we’re using here is NetworkX. It produces ugly graphs, but has a good set of network analysis tools.

NB Use nx.DiGraph() if you want a directed graph

“degree” = how many direct links connect to this node

Note that degree centrality is normalized (divided) by the largest possible number of connections per node: in this case, 9.

Degree centrality is not a great measure of power: what’s important is the number of nodes that the node can easily reach, and the highest-ranked node might be part of a clique (e.g. not well connected to the outside world).

Nodes with high betweenness have influence over the flow of information or goods through a network: they bridge separate communities (good) but also often are a single point of failure in communications between those communities (bad).

PageRank is based on eigenvector centrality.

NB You’ll need to look at the eigenvectors of the adjacency matrix to build this one, and like neural networks, eigenvector centrality algorithms won’t always converge to a solution.

Social networks = short path lengths, high clustering, skewed degree distributions.

Small worlds = lots of highly-connected small groups with fewer connections to other groups: Saw this effect in the Ebola response contact-tracking.

“Small world theory” = there are roughly 6 steps on the shortest path between each pair of nodes in the world (see also “6 degrees of Kevin Bacon” http://en.wikipedia.org/wiki/Six_degrees_of_separation). The maths works out at roughly s = ln(n)/ln(k) where n is the population size and k is the average number of connections per node. For k=30, s is usually roughly 6.

NetworkX community functions: http://networkx.lanl.gov/reference/algorithms.community.html

K-core: Every node in the clique is connected to K or more other nodes in the clique.

Clique-level analysis and node-level analysis interact with each other, e.g. if you find a set of cliques in a network, you can then look for and use the central nodes in those cliques.

N-cliques: “friend of friend” cliques; use Bron and Kerbosch algorithm. Issues include nodes that contribute to the clique aren’t included in it. P-clique addresses some of this.

Other approaches: n-clans, k-plexes etc.: see http://faculty.ucr.edu/~hanneman/nettext/C11_Cliques.html#nclique

Diffusion model used when it’s important that you find *everybody* in contact, e.g. for Ebola, you have to assume that everyone an infectious person is in contact with is a potential carrier. Here, we assume that node 9 changes state first; in the next step of the algorithm, the nodes directly connected to it (0,1,7) change state; in the next step, the nodes connected to (0,1,7) change state, etc. etc.

Thought experiment: infections are time-sensitive, e.g. you get infected, then either get better or die. How would you represent this in a network? What would you expect to happen in a small-world network?

Diffusion models for more complex choices, e.g. whether to go see a movie, based on your friends’ opinions plus reading movie reviews.

In complex contagion, a node changes state based on the state of *all* its neighbors, and often also on outside information; just because 9 is in one state, 1 doesn’t have to change to that state too (but it might change state with probability p).

Edge bundling is useful for small world networks

Metanodes are useful for large networks of communities

An adjacency matrix can help if it’s nicely grouped, but sometimes it’s just more confusing.

Explaining graphs to the C-suite? Use visual cues they’re used to. Carefully. Some examples are in the Visualisation Periodic Table at http://www.visual-literacy.org/periodic_table/periodic_table.html

- 1. Network Analysis Sara Terp, 2015
- 2. Network Analysis • What is a network? • What features does a network have? • What analysis is possible with those features? • How do we explain that analysis?
- 3. “Network” “A group of interconnected people or things” (Oxford English Dictionary) Use networks to understand, use and explain relationships
- 4. Infrastructure Networks NPR: Visualising the US Power Grid Transport for London: London Underground Map
- 5. Social Networks (Sara’s Facebook friends, in Gephi)
- 6. Songs Spotify API reference
- 7. Words (Wise blogpost on word co-occurance matrices)
- 8. Network Analysis Use networks to understand, use and explain relationships
- 9. Network Features C D A B E F G Node Edge Directed edge Undirected edge Clique
- 10. Network Representations • Diagram • Adjacency matrix [[ 0, 1, 1, 1, 0, 0, 0, 0, 0, 1], [ 1, 0, 0, 1, 1, 0, 1, 0, 0, 1], [ 1, 0, 0, 1, 0, 1, 0, 0, 0, 0], [ 1, 1, 1, 0, 1, 1, 1, 0, 0, 0], [ 0, 1, 0, 1, 0, 0, 1, 0, 0, 0], [ 0, 0, 1, 1, 0, 0, 1, 0, 0, 0], [ 0, 1, 0, 1, 1, 1, 0, 0, 0, 0], [ 0, 0, 0, 0, 0, 0, 0, 0, 1, 0], [ 0, 0, 0, 0, 0, 0, 0, 1, 0, 1], [ 1, 1, 0, 0, 0, 0, 0, 0, 1, 0]] • Adjacency list {0: [1, 2, 3, 9], 1: [0, 9, 3, 4, 6], 2: [0, 3, 5], 3: [0, 1, 2, 4, 5, 6], 4: [1, 3, 6], 5: [2, 3, 6], 6: [1, 3, 4, 5], 7: [8], 8: [9, 7], 9: [8, 1, 0]} • Edge list {(0,1),(0,2),(0,3),(0,9),(1,3),(1,4),(1,6),(1,9),(2,3),(2,5),(3,4), (3,5),(3,6),(4,6),(5,6),(7,8),(8,9)} • Maths G = (V,E,e) 3 6 0 1 5 7 2 98 4
- 11. The NetworkX Library • Python network analysis library import networkx as nx edgelist = {(0,1),(0,2),(0,3),(0,9),(1,3),(1,4),(1,6),(1,9),(2,3),(2,5),(3,4),(3,5),(3,6),(4,6),(5,6),(7 ,8),(8,9)} G = nx.Graph() for edge in edgelist: G.add_edge(edge[0], edge[1])
- 12. Node Centrality • Finding the most “important”/“influential” nodes • i.e. how “central” is a node to the network
- 13. Degree centrality: “who has lots of friends?” 3 6 0 1 5 7 2 98 4 3 0.666 0 0.555 1 0.555 5 0.444 6 0.444 2 0.333 4 0.333 9 0.333 8 0.222 7 0.111 nx.degree_centrality(G) = number of edges directly connected to n
- 14. Betweenness centrality: “who are the bridges”? 3 6 0 1 5 7 2 98 4 9 0.38 0 0.23 1 0.23 8 0.22 3 0.10 5 0.02 6 0.02 2 0.00 4 0.00 7 0.00 nx.betweenness_centrality(G) = (number of shortest paths including n / total number of shortest paths) / number of pairs of nodes
- 15. Closeness centrality: “who are the hubs”? 3 6 0 1 5 7 2 98 4 0 0.64 1 0.64 3 0.60 9 0.60 5 0.52 6 0.52 2 0.50 4 0.50 8 0.42 7 0.31 nx.closeness_centrality(G) = sum(distance to each other node) / (number of nodes-1)
- 16. Eigenvalue centrality “who has most network influence”? 3 6 0 1 5 7 2 98 4 3 0.48 0 0.39 1 0.39 5 0.35 6 0.35 2 0.28 4 0.28 9 0.19 8 0.04 7 0.01 nx.eigenvector_centrality(G)
- 17. Network properties • Characteristic path length: average shortest distance between all pairs of nodes • Clustering coefficient: how likely a network is to contain highly-connected groups • Degree distribution: histogram of node degrees
- 18. Community Detection “Are there groups in this network?” “What can I do with that information?”
- 19. Disconnected Networks • Not all nodes are connected to each other • Connected component = every node in the component can be reached from every other node • Giant component = connected component that covers most of the network
- 20. Cliques and K-Cores nx.find_cliques(G) nx.k_clique_communities(G, 3) 3 6 0 1 5 7 2 98 4 3-cores: [[0,2,3,5], [1,3,4,6]] 2-core: [0,1,2,3,4,5,6,9] 4-cliques: [[0,2,3,5],[1,3,4,6]] 3-cliques: [[0,1,3],[0,1,9]] 2-cliques: [[7,8],[8,9]]
- 21. Other Clique methods • N-clique: every node in the clique is connected to all other nodes by a path of length n or less • P-clique: each node is connected to at least p% of the other nodes in the group.
- 22. Network Effects Predict how information or states (e.g. political opinion or rumours) are most likely to move across a network
- 23. Diffusion (Simple contagion) 3 6 0 1 5 7 2 98 4
- 24. Complex contagion 3 6 0 1 5 7 2 98 4
- 25. Describing Networks bl.ocks.org/mbostock/4062045 http://bost.ocks.org/mike/uberdata/ http://bl.ocks.org/mbostock/7607999 Network diagram Edge bundling
- 26. Network Analysis Tools • Python libraries: • NetworkX • iGraph • graph-tool • Matplotlib (visualisation) • Pygraphviz (visualisation) • Mayavi (3d visualisation) Longer list: http://en.wikipedia.org/wiki/Social_network_analysis_software • Standalone tools: • SNAP • GUESS • NetMiner (free for students) • Gephi (visualisation) • GraphViz (visualisation) • NodeXL (excel add-on)

No public clipboards found for this slide

Be the first to comment