(Social) Network Analysis
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/
17 July 2014
What are networks?
Networks (graphs) are set of nodes (verticies) connected by edges (links,
ties, arcs)
What are networks?
Networks (graphs) are set of nodes (verticies) connected by edges (links,
ties, arcs)
Additional details
Whole vs. ego: whole networks have all
nodes within a natural boundary
(platform, organization, etc.). An ego
network has one node and all of its
immediate neighbors.
Edges can be directed or undirected and
weighted or unweighted
Additionally, networks may be multilayer
and/or multimodal.
Why?
Characterize network structure
How far apart / well-connected are nodes?
Are some nodes at more important positions?
Is the network composed of communities?
How does network structure affect processes?
Information diffusion
Coordination/cooperation
Resilience to failure/attack
A network
First questions when approaching a network
What are edges? What are nodes?
What kind of network?
Inclusion/exclusion criteria
Network data repositories
http://www.diggingintodata.org/Repositories/tabid/167/
Default.aspx
http://datamob.org
http://snap.stanford.edu/data
http://www-personal.umich.edu/~mejn/netdata
Python resources
tweepy: Package for Twitter stream and search APIs (only python 2.7 at
the moment)
search and stream API example code along with code to create
mentions/retweet network at
https://github.com/computermacgyver/twitter-python
Python two versions:
2.7.x – many packages, issues with non-English scripts
3.x – less packages, but excellent handling of international scripts
(unicode)
NetworkX
http://networkx.github.io/
Package to represent networks as python objects
Convenient functions to add, delete, iterate nodes/edges
Functions to calculate network statistics (degree, clustering, etc.)
Easily generate comparison graphs based on statistical models
Visualization
Alternatives include igraph (available for Python and R)
Gephi
Open-source, cross-platform GUI interface
Primary strength is to visualize networks
Basic statistical properties are also available
Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more
Network measures
With many nodes visualizations are often difficult/impossible to interpret.
Statistical measures can be very revealing, however.
Node-level
Degree (in, out): How many incoming/outgoing edges does a node have?
Centrality (next slide)
Constraint
Network-level
Components: Number of disconnected subsets of nodes
Density: observed edges
maximum number of edges possible
Clustering coefficient closed triplets
connected triples
Path length distribution
Distributions of node-level measures
Centrality measures
Degree
Closeness: Measures the average geodesic distance to ALL other nodes.
Informally, an indication of the ability of a node to diffuse a property
efficiently.
Betweenness: Number of shortest paths the node lies on. Informally,
the betweenness is high if a node bridges clusters.
Eigenvector: A weighted degree centrality (inbound links from highly
central nodes count more).
PageRank: Not strictly a centrality measure, but similar to eigenvector
but modeled as a random walk with a teleportation parameter
NetworkX: Nodes
import networkx as nx
g=nx.Graph() #A new (empty) undirected graph
g.add_node("Alan") #Add one new node
g.add_nodes_from(["Bob","Carol","Denise"])#Add three new nodes
#Nodes can have attributes
g.node["Alan"]["gender"]="M"
g.node["Bob"]["gender"]="M"
g.node["Carol"]["gender"]="F"
g.node["Denise"]["gender"]="F"
for n in g:
print("{0} has gender {1}".format(n,g.node[n]["gender"]))
NetworkX: Edges
#Interesting graphs have edges
g.add_edge("Alan","Bob") #Add one new edge
#Add two new edges
g.add_edges_from([["Carol","Denise"],["Carol","Bob"]])
#Edge attributes
g.edge["Alan"]["Bob"]["relationship"]="Friends"
g.edge["Carol"]["Denise"]["relationship"]="Friends"
g.edge["Carol"]["Bob"]["relationship"]="Married"
#New edge with an attribute
g.add_edges_from([["Carol","Alan",
{"relationship":"Friends"}]])
NetworkX: Edges
for e in g.edges_iter():
n1=e[0]
n2=e[1]
print("{0} and {1} are {2}".format(n1,n2,
g.edge[n1][n2]["relationship"]))
NetworkX: Measures
g.number_of_nodes()
g.nodes(data=True)
g.number_of_edges()
g.edges(data=True)
nx.info(g)
nx.density(g)
nx.number_connected_components(g)
nx.degree_histogram(g)
nx.betweenness_centrality(g)
nx.clustering(g)
nx.clustering(g, nodes=["Bob"])
NetworkX: Visualize or save
#Save g to the file my_graph.graphml in graphml format
#prettyprint will make it nice for a human to read
nx.write_graphml(g,"my_graph.graphml",prettyprint=True)
#Layout g with the Fruchterman-Reingold force-directed
#algorithm and save the result to my_graph.png
#with_labels will label each node with its id
import matplotlib.pyplot as plt
nx.draw_spring(g,with_labels=True)
plt.savefig("my_graph.png")
plt.clf() #Clear plot
NetworkX: Odds and ends
#Read a graph from the file my_graph.graphml in graphml format
g=nx.read_graphml("my_graph.graphml")
#Create a (empty) directed graph
g=nx.DiGraph()
See http://networkx.github.io/documentation/latest/reference/
index.html for many more commands. Note that some commands are only
available on directed or undirected graphs.
Resources
Newman, M.E.J., Networks: An Introduction
Kadushin, C., Understanding Social Networks: Theories, Concepts, and
Findings
De Nooy, W., et al., Exploratory Social Network Analysis with Pajek
Shneiderman B., and Smith, M., Analyzing Social Media Networks with
NodeXL
(Social) Network Analysis
Scott A. Hale
Oxford Internet Institute
http://www.scotthale.net/
17 July 2014

Oxford Digital Humanities Summer School

  • 1.
    (Social) Network Analysis ScottA. Hale Oxford Internet Institute http://www.scotthale.net/ 17 July 2014
  • 2.
    What are networks? Networks(graphs) are set of nodes (verticies) connected by edges (links, ties, arcs)
  • 3.
    What are networks? Networks(graphs) are set of nodes (verticies) connected by edges (links, ties, arcs) Additional details Whole vs. ego: whole networks have all nodes within a natural boundary (platform, organization, etc.). An ego network has one node and all of its immediate neighbors. Edges can be directed or undirected and weighted or unweighted Additionally, networks may be multilayer and/or multimodal.
  • 4.
    Why? Characterize network structure Howfar apart / well-connected are nodes? Are some nodes at more important positions? Is the network composed of communities? How does network structure affect processes? Information diffusion Coordination/cooperation Resilience to failure/attack
  • 5.
    A network First questionswhen approaching a network What are edges? What are nodes? What kind of network? Inclusion/exclusion criteria
  • 6.
  • 7.
    Python resources tweepy: Packagefor Twitter stream and search APIs (only python 2.7 at the moment) search and stream API example code along with code to create mentions/retweet network at https://github.com/computermacgyver/twitter-python Python two versions: 2.7.x – many packages, issues with non-English scripts 3.x – less packages, but excellent handling of international scripts (unicode)
  • 8.
    NetworkX http://networkx.github.io/ Package to representnetworks as python objects Convenient functions to add, delete, iterate nodes/edges Functions to calculate network statistics (degree, clustering, etc.) Easily generate comparison graphs based on statistical models Visualization Alternatives include igraph (available for Python and R)
  • 9.
    Gephi Open-source, cross-platform GUIinterface Primary strength is to visualize networks Basic statistical properties are also available Alternatives include NodeXL, Pajek, GUESS, NetDraw, Tulip, and more
  • 10.
    Network measures With manynodes visualizations are often difficult/impossible to interpret. Statistical measures can be very revealing, however. Node-level Degree (in, out): How many incoming/outgoing edges does a node have? Centrality (next slide) Constraint Network-level Components: Number of disconnected subsets of nodes Density: observed edges maximum number of edges possible Clustering coefficient closed triplets connected triples Path length distribution Distributions of node-level measures
  • 11.
    Centrality measures Degree Closeness: Measuresthe average geodesic distance to ALL other nodes. Informally, an indication of the ability of a node to diffuse a property efficiently. Betweenness: Number of shortest paths the node lies on. Informally, the betweenness is high if a node bridges clusters. Eigenvector: A weighted degree centrality (inbound links from highly central nodes count more). PageRank: Not strictly a centrality measure, but similar to eigenvector but modeled as a random walk with a teleportation parameter
  • 12.
    NetworkX: Nodes import networkxas nx g=nx.Graph() #A new (empty) undirected graph g.add_node("Alan") #Add one new node g.add_nodes_from(["Bob","Carol","Denise"])#Add three new nodes #Nodes can have attributes g.node["Alan"]["gender"]="M" g.node["Bob"]["gender"]="M" g.node["Carol"]["gender"]="F" g.node["Denise"]["gender"]="F" for n in g: print("{0} has gender {1}".format(n,g.node[n]["gender"]))
  • 13.
    NetworkX: Edges #Interesting graphshave edges g.add_edge("Alan","Bob") #Add one new edge #Add two new edges g.add_edges_from([["Carol","Denise"],["Carol","Bob"]]) #Edge attributes g.edge["Alan"]["Bob"]["relationship"]="Friends" g.edge["Carol"]["Denise"]["relationship"]="Friends" g.edge["Carol"]["Bob"]["relationship"]="Married" #New edge with an attribute g.add_edges_from([["Carol","Alan", {"relationship":"Friends"}]])
  • 14.
    NetworkX: Edges for ein g.edges_iter(): n1=e[0] n2=e[1] print("{0} and {1} are {2}".format(n1,n2, g.edge[n1][n2]["relationship"]))
  • 15.
  • 16.
    NetworkX: Visualize orsave #Save g to the file my_graph.graphml in graphml format #prettyprint will make it nice for a human to read nx.write_graphml(g,"my_graph.graphml",prettyprint=True) #Layout g with the Fruchterman-Reingold force-directed #algorithm and save the result to my_graph.png #with_labels will label each node with its id import matplotlib.pyplot as plt nx.draw_spring(g,with_labels=True) plt.savefig("my_graph.png") plt.clf() #Clear plot
  • 17.
    NetworkX: Odds andends #Read a graph from the file my_graph.graphml in graphml format g=nx.read_graphml("my_graph.graphml") #Create a (empty) directed graph g=nx.DiGraph() See http://networkx.github.io/documentation/latest/reference/ index.html for many more commands. Note that some commands are only available on directed or undirected graphs.
  • 18.
    Resources Newman, M.E.J., Networks:An Introduction Kadushin, C., Understanding Social Networks: Theories, Concepts, and Findings De Nooy, W., et al., Exploratory Social Network Analysis with Pajek Shneiderman B., and Smith, M., Analyzing Social Media Networks with NodeXL
  • 19.
    (Social) Network Analysis ScottA. Hale Oxford Internet Institute http://www.scotthale.net/ 17 July 2014