Social media network analysis and visualization with NodeXL - the network overview discovery and exploration add-in for Excel. Map Twitter, Facebook, email, blogs, and the web with a point and click interface within the familiar spreadsheet.
Charting Collections of
Connections
In Social Media:
Creating Maps &
Measures with
NodeXL
A project from the Social Media Research Foundation: http://www.smrfoundation.org
About Me
Introductions
Marc A. Smith
Chief Social Scientist
Connected Action Consulting Group
Marc@connectedaction.net
http://www.connectedaction.net
http://www.codeplex.com/nodexl
http://www.twitter.com/marc_smith
http://delicious.com/marc_smith/Paper
http://www.flickr.com/photos/marc_smith
http://www.facebook.com/marc.smith.sociologist
http://www.linkedin.com/in/marcasmith
http://www.slideshare.net/Marc_A_Smith
http://www.smrfoundation.org
There are many kinds of ties….
Like, Link, Reply, Rate, Review, Favorite, Friend, Follow, Forward, Edit, Tag, Comment, Check-in…
http://www.flickr.com/photos/stevendepolo/3254238329
Strength of Weak ties
p://www.flickr.com/photos/fullaperture/81266869/
Social
Networks
• History:
from the
dawn of
time!
• Theory and
method:
1934 ->
• Jacob L.
Moreno
• http://en.wiki
pedia.org/wiki
/Jacob_L._Mor
eno
Jacob Moreno’s early social network diagram of positive and negative relationships among members of a football
team.
Originally published in Moreno, J. L. (1934). Who shall survive? Washington, DC: Nervous and Mental Disease
Publishing Company.
A nearly social network diagram of relationships among workers in a factory
illustrates the positions different workers occupy within the workgroup.
Originally published in Roethlisberger, F., and Dickson, W. (1939). Management and
the worker. Cambridge, UK: Cambridge University Press.
NodeXL
Network Overview Discovery and Exploration add-in for Excel 2007/2010
A minimal network can
illustrate the ways different
locations have different values
for centrality and degree
#teaparty
15 November 2011
#occupywallstreet
15 November 2011
http://www.newscientist.com/blogs/onepercent/2011/11/occupy-vs-tea-party-what-their.html
Social Network Theory
http://en.wikipedia.org/wiki/Social_network
• Central tenet
– Social structure emerges from
– the aggregate of relationships (ties)
– among members of a population
• Phenomena of interest
– Emergence of cliques and clusters
– from patterns of relationships
– Centrality (core), periphery (isolates),
Source: Richards, W.
– betweenness (1986). The NEGOPY
• Methods network analysis
program. Burnaby, BC:
– Surveys, interviews, observations, Department of
Communication, Simon
log file analysis, computational Fraser University. pp.7-
analysis of matrices 16
(Hampton &Wellman, 1999; Paolillo, 2001; Wellman, 2001)
SNA 101
• Node
A
– “actor” on which relationships act; 1-mode versus 2-mode networks
• Edge
B – Relationship connecting nodes; can be directional
C • Cohesive Sub-Group
– Well-connected group; clique; cluster A B D E
• Key Metrics
– Centrality (group or individual measure)
D • Number of direct connections that individuals have with others in the group (usually look at
incoming connections only)
E • Measure at the individual node or group level
– Cohesion (group measure)
• Ease with which a network can connect
• Aggregate measure of shortest path between each node pair at network level reflects
average distance
– Density (group measure)
• Robustness of the network
• Number of connections that exist in the group out of 100% possible
– Betweenness (individual measure)
F G • # shortest paths between each node pair that a node is on
• Measure at the individual node level
• Node roles
– Peripheral – below average centrality C
H – Central connector – above average centrality D
I – Broker – above average betweenness E
NodeXL
Free/Open Social Network Analysis add-in for Excel 2007/2010 makes graph
theory as easy as a pie chart, with integrated analysis of social media sources.
http://nodexl.codeplex.com
Goal: Make SNA easier
• Existing Social Network Tools are challenging
for many novice users
• Tools like Excel are widely used
• Leveraging a spreadsheet as a host for SNA
lowers barriers to network data analysis and
display
This graph represents a
directed network of
1,360 Twitter users
whose recent tweets
contained "contraceptive
OR contraception". The
network was obtained
on Friday, 08 June 2012
at 13:22 UTC. There is
an edge for each follows
relationship. There is an
edge for each "replies-
to" relationship in a
tweet. There is an edge
for each "mentions"
relationship in a
tweet. There is a self-
loop edge for each tweet
that is not a "replies-to"
or "mentions". The
tweets were made over
the 2-day period from
Thursday, 07 June 2012
at 18:46 UTC to
Friday, 08 June 2012 at
13:06 UTC. The graph's
vertices were grouped by
cluster using the Clauset-
Newman-Moore cluster
algorithm. The edge
colors are based on
relationship values. The
vertex sizes are based on
each user’s number of
followers. Table 1
reports the summary
network metrics that
describe the graph.
Summary network metrics
Table 1. Summary network metrics for the graph in Figure 1
Network Metric Value
Graph Type Directed
Vertices 1360
Unique Edges 5641
Edges With Duplicates 771
Total Edges 6412
Self-Loops 1096
Connected Components 427
Single-Vertex Connected Components 395
Maximum Vertices in a Connected Component 880
Max Edges in a Connected Component 5818
Maximum Geodesic Distance (Diameter) 12
Average Geodesic Distance 3.557807
Graph Density 0.002705817
Modularity 0.446145
The Vertices spreadsheet lists users who contributed a
tweet containing the terms “contraception OR
contraceptives” over two days in early June 2012. Users are
ranked by their computed betweenness centrality within
the network of follows, replies, and mentions edges. The
top 10 vertices, ranked by betweenness centrality are the
accounts at the center of the network. These include:
@thinkprogress, @gatesfoundation, @SandraFluke, @male
eek, @Change, @foxandfriends, @melindagates, @AshleyJu
dd, @cnalive, and @SOHLTC.
Welser, Howard T., Eric Gleave, Danyel Fisher,
and Marc Smith. 2007. Visualizing the Signatures
of Social Roles in Online Discussion Groups.
The Journal of Social Structure. 8(2).
Experts and “Answer People” Discussion people, Topic setters
Discussion starters, Topic setters
The Content summary
spreadsheet displays the most
frequently used URLs, hashtags,
and user names within the
network as a whole and within
each calculated sub-group.
Social Media Research Foundation
People Disciplines Institutions
University Computer Science University of Maryland
Faculty
Students HCI, CSCW Oxford Internet Institute
Industry Machine Learning Stanford University
Independent Information Visualization Microsoft Research
Researchers UI/UX Illinois Institute of
Technology
Developers Social Science/Sociology Connected Action
Network Analysis Cornell
Collective Action Morningside Analytics
What we are trying to do:
Open Tools, Open Data, Open Scholarship
• Build the “Firefox of GraphML” – open tools for
collecting and visualizing social media data
• Connect users to network analysis – make
network charts as easy as making a pie chart
• Connect researchers to social media data sources
• Archive: Be the “Allen Very Large Telescope Array”
for Social Media data – coordinate and aggregate
the results of many user’s data collection and
analysis
• Create open access research papers & findings
• Make “collections of connections” easy for users
to manage
What we have done: Open Tools
• NodeXL
• Data providers (“spigots”)
– ThreadMill Message Board
– Exchange Enterprise Email
– Voson Hyperlink
– SharePoint
– Facebook
– Twitter
– YouTube
– Flickr
What we have done: Open Data
• NodeXLGraphGallery.org
– User generated collection
of network graphs,
datasets and annotations
– Collective repository for
the research community
– Published collections of
data from a range of social
media data sources to help
students and researchers
connect with data of
interest and relevance
What we want to do:
(Build the tools to) map the social web
• Move NodeXL to the web: (Node[NOT]XL)
– Node for Google Doc Spreadsheets?
– WebGL Canvas? D3.JS? Sigma.JS
• Connect to more data sources of interest:
– RDF, MediaWikis, Gmail, NYT, Citation Networks
• Solve hard network manipulation UI problems:
– Modal transform, Time series, Automated layouts
• Grow and maintain archives of social media network data sets for
research use.
• Improve network science education:
– Workshops on social media network analysis
– Live lectures and presentations
– Videos and training materials
How you can help
• Sponsor a feature
• Sponsor workshops
• Sponsor a student
• Schedule training
• Sponsor the foundation
• Donate your money, code, computation, storage,
bandwidth, data or employee’s time
• Help promote the work of the Social Media
Research Foundation
Who is the mayor of your hashtag?
Find out at: http://netbadges.com
Who is the mayor of your hashtag?
Find out at: http://netbadges.com
Who is the mayor of your hashtag?
http://netbadges.com
Find out at: http://netbadges.com
Charting Collections of
Connections
In Social Media:
Creating Maps &
Measures with
NodeXL
A project from the Social Media Research Foundation: http://www.smrfoundation.org
A tutorial on analyzing social media networks is available from: casci.umd.edu/NodeXL_TeachingDifferent positions within a network can be measured using network metrics.
The network of connections among people who tweeted “#My2K” over the 1-day, 21-hour, 39-minute period from Sunday, 06 January 2013 at 03:30 UTC to Tuesday, 08 January 2013 at 01:09 UTC.
The graph represents a network of 268 Twitter users whose recent tweets contained "#cmgrchat OR #smchat. The network was obtained on Friday, 18 January 2013 at 15:44 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 3-day, 21-hour, 15-minute period from Monday, 14 January 2013 at 18:23 UTC to Friday, 18 January 2013 at 15:38 UTC.
The graph represents a network of 1,227 Twitter users whose recent tweets contained "lumia. The network was obtained on Saturday, 12 January 2013 at 19:52 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 5-hour, 1-minute period from Saturday, 12 January 2013 at 14:36 UTC to Saturday, 12 January 2013 at 19:37 UTC.
The graph represents a network of 1,260 Twitter users whose recent tweets contained "flotus". The network was obtained on Friday, 18 January 2013 at 18:26 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 3-hour, 3-minute period from Friday, 18 January 2013 at 15:16 UTC to Friday, 18 January 2013 at 18:20 UTC.
The graph represents a network of 399 Twitter users whose recent tweets contained "http://www.nytimes.com/2013/01/11/opinion/krugman-coins-against-crazies.html. The network was obtained on Friday, 11 January 2013 at 14:27 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 12-hour, 32-minute period from Friday, 11 January 2013 at 01:52 UTC to Friday, 11 January 2013 at 14:24 UTC.
The graph represents a network of 388 Twitter users whose recent tweets contained "delllistens OR dellcares”. The network was obtained on Tuesday, 19 February 2013 at 17:44 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 6-day, 21-hour, 58-minute period from Tuesday, 12 February 2013 at 19:34 UTC to Tuesday, 19 February 2013 at 17:33 UTC.