Networks are everywhere, but the tools for end users to access, analyze, visualize and share insights into connected structures have been absent. NodeXL, the network overview discovery and exploration add-in for Excel makes network analysis as easy as making a pie chart.
Testing tools and AI - ideas what to try with some tool examples
Think Link: Network Insights with No Programming Skills
1. A project from the Social Media Research Foundation: http://www.smrfoundation.org
2. About Me
Introductions
Marc A. Smith
Chief Social Scientist
Connected Action Consulting Group
Marc@connectedaction.net
http://www.connectedaction.net
http://www.codeplex.com/nodexl
http://www.twitter.com/marc_smith
http://www.flickr.com/photos/marc_smith
http://www.facebook.com/marc.smith.sociologist
http://www.linkedin.com/in/marcasmith
http://www.slideshare.net/Marc_A_Smith
http://www.smrfoundation.org
4. Social Media Research Foundation
People Disciplines Institutions
University
Faculty
Computer Science University of Maryland
Students HCI, CSCW Oxford Internet Institute
Industry Machine Learning Stanford University
Independent Information Visualization Microsoft Research
Researchers UI/UX Illinois Institute of
Technology
Developers Social Science/Sociology Connected Action
Network Analysis Cornell
Collective Action Morningside Analytics
5. What we are trying to do:
Open Tools, Open Data, Open Scholarship
• Build the “Firefox of GraphML” – open tools for
collecting and visualizing social media data
• Connect users to network analysis – make
network charts as easy as making a pie chart
• Connect researchers to social media data sources
• Archive: Be the “Allen Very Large Telescope Array”
for Social Media data – coordinate and aggregate
the results of many user’s data collection and
analysis
• Create open access research papers & findings
• Make “collections of connections” easy for users
to manage
8. What we have done: Open Tools
• NodeXL
• Data providers (“spigots”)
– ThreadMill Message Board
– Exchange Enterprise Email
– Voson Hyperlink
– SharePoint
– Facebook
– Twitter
– YouTube
– Flickr
9. What we have done: Open Data
• NodeXLGraphGallery.org
– User generated collection
of network graphs,
datasets and annotations
– Collective repository for
the research community
– Published collections of
data from a range of social
media data sources to help
students and researchers
connect with data of
interest and relevance
21. Vertex1 Vertex 2 “Edge”
Attribute
“Vertex1”
Attribute
“Vertex2”
Attribute
@UserName1 @UserName2 value value value
A network is born whenever two GUIDs are joined.
Username Attributes
@UserName1 Value, value
Username Attributes
@UserName2 Value, value
A B
24. Social
Networks
• History:
from the
dawn of
time!
• Theory and
method:
1934 ->
• Jacob L.
Moreno
• http://en.wik
ipedia.org/wi
ki/Jacob_L._
Moreno
Jacob Moreno’s early social network diagram of positive and negative relationships among members of a football
team.
Originally published in Moreno, J. L. (1934). Who shall survive? Washington, DC: Nervous and Mental Disease
Publishing Company.
25. A nearly social network diagram of relationships among workers in a factory
illustrates the positions different workers occupy within the workgroup.
Originally published in Roethlisberger, F., and Dickson, W. (1939). Management and
the worker. Cambridge, UK: Cambridge University Press.
52. Welser, Howard T., Eric Gleave, Danyel
Fisher, and Marc Smith. 2007. Visualizing the
Signatures of Social Roles in Online Discussion
Groups.
The Journal of Social Structure. 8(2).
Experts and “Answer People”
Discussion starters, Topic setters
Discussion people, Topic setters
53. NodeXL
Network Overview Discovery and Exploration add-in for Excel 2007/2010
A minimal network can
illustrate the ways different
locations have different values
for centrality and degree
64. SNA questions for social media:
1. What does my topic network look like?
2. What does the topic I aspire to be look like?
3. What is the difference between #1 and #2?
4. How does my map change as I intervene?
What does #YourHashtag look like?
66. strataconf Twitter NodeXL SNA Map and Report for 2014-02-11 12-53-27
Top 10 Vertices, Ranked by
Betweenness Centrality:
@strataconf
@peteskomoroch
@acroll
@oreillymedia
@orthonormalruss
@ayirpelle
@bigdata
@furrier
@marketpowerplus
@sassoftware
67. datavis Twitter NodeXL SNA Map and Report for Tuesday, 11 February 2014 at 18:55 UTC
Top 10 Vertices, Ranked by
Betweenness Centrality:
@bigpupazzoverde
@randal_olson
@twitterdata
@7of13
@yochum
@edwardtufte
@twittersports
@grandjeanmartin
@smfrogers
@albertocairo
71. [Divided]Polarize
d Crowds
[Unified]Tig
ht Crowd
[Fragmented]
Brand Clusters
[Clustered]
Communities
[In-Hub &
Spoke]Broadcast
Network
[Out-Hub &
Spoke]Support
Network
[Low probability]
Find bridge users.
Encourage shared
material.
[Low probability]
Get message out to
disconnected
communities.
[Possible transition]
Draw in new
participants.
[Possible transition]
Regularly create
content.
[Possible transition]
Reply to multiple
users.
[Undesirable
transition]
Remove bridges,
highlight divisions.
[Low probability]
Get message out to
disconnected
communities.
[High probability]
Draw in new
participants.
[Possible transition]
Regularly create
content.
[Possible transition]
Reply to multiple
users.
[Undesirable
transition]
Increase density of
connections in two
groups.
[Low probability]
Dramatically increase
density of
connections.
[High probability]
Increase
retention, build
connections.
[Possible transition]
Regularly create
content.
[Possible transition]
Reply to multiple
users.
[Undesirable
transition]
Increase density of
connections in two
groups.
[Low probability]
Dramatically increase
density of
connections.
[Undesirable
transition]
Increase population,
reduce connections.
[Possible transition]
Regularly create
content.
[Possible transition]
Reply to multiple
users.
[Undesirable
transition]
Increase density of
connections in two
groups.
[Low probability]
Dramatically increase
density of
connections.
[Low probability]
Get message out to
disconnected
communities.
[Possible transition]
Increase retention,
build connections.
[High probability]
Increase reply
rate, reply to multiple
users.
[Undesirable
transition]
Increase density of
connections in two
groups.
[Low probability]
Dramatically increase
density of
connections.
[Possible transition]
Get message out to
disconnected
communities.
[High probability]
Increase retention,
build connections.
[High probability]
Increase publication
of new content and
regularly create
content.
72. Request your own network map and report
http://connectedaction.net
73. • Central tenet
– Social structure emerges from
– the aggregate of relationships (ties)
– among members of a population
• Phenomena of interest
– Emergence of cliques and clusters
– from patterns of relationships
– Centrality (core), periphery (isolates),
– betweenness
• Methods
– Surveys, interviews, observations,
log file analysis, computational
analysis of matrices
(Hampton &Wellman, 1999; Paolillo, 2001; Wellman, 2001)
Source: Richards, W.
(1986). The NEGOPY
network analysis
program. Burnaby, BC:
Department of
Communication, Simon
Fraser University. pp.7-
16
Social Network Theory
http://en.wikipedia.org/wiki/Social_network
74. SNA 101
• Node
– “actor” on which relationships act; 1-mode versus 2-mode networks
• Edge
– Relationship connecting nodes; can be directional
• Cohesive Sub-Group
– Well-connected group; clique; cluster
• Key Metrics
– Centrality (group or individual measure)
• Number of direct connections that individuals have with others in the group (usually look at
incoming connections only)
• Measure at the individual node or group level
– Cohesion (group measure)
• Ease with which a network can connect
• Aggregate measure of shortest path between each node pair at network level reflects
average distance
– Density (group measure)
• Robustness of the network
• Number of connections that exist in the group out of 100% possible
– Betweenness (individual measure)
• # shortest paths between each node pair that a node is on
• Measure at the individual node level
• Node roles
– Peripheral – below average centrality
– Central connector – above average centrality
– Broker – above average betweenness
E
D
F
A
CB
H
G
I
C
D
E
A B D E
75. NodeXL
Free/Open Social Network Analysis add-in for Excel 2007/2010 makes graph
theory as easy as a pie chart, with integrated analysis of social media sources.
http://nodexl.codeplex.com
77. Goal: Make SNA easier
• Existing Social Network Tools are challenging
for many novice users
• Tools like Excel are widely used
• Leveraging a spreadsheet as a host for SNA
lowers barriers to network data analysis and
display
86. What is Social Network Analysis?
How is it useful for the humanities?
1. New framework for analysis
2. Data visualization allows new perspectives – less linear, more comprehensive
Social Network Analysis and Ancient History
Diane H. Cline, Ph.D.
University of Cincinnati
89. The Content summary
spreadsheet displays the most
frequently used URLs, hashtags,
and user names within the
network as a whole and within
each calculated sub-group.
98. What we want to do:
(Build the tools to) map the social web
• Move NodeXL to the web: (Node[NOT]XL)
– Node for Google Doc Spreadsheets?
– WebGL Canvas? D3.JS? Sigma.JS
• Connect to more data sources of interest:
– RDF, MediaWikis, Gmail, NYT, Citation Networks
• Solve hard network manipulation UI problems:
– Modal transform, Time series, Automated layouts
• Grow and maintain archives of social media network data sets for
research use.
• Improve network science education:
– Workshops on social media network analysis
– Live lectures and presentations
– Videos and training materials
99. How you can help
• Sponsor a feature
• Sponsor workshops
• Sponsor a student
• Schedule training
• Sponsor the foundation
• Donate your money, code, computation, storage,
bandwidth, data or employee’s time
• Help promote the work of the Social Media
Research Foundation
100. A project from the Social Media Research Foundation: http://www.smrfoundation.org
A tutorial on analyzing social media networks is available from: casci.umd.edu/NodeXL_TeachingDifferent positions within a network can be measured using network metrics.
The network of connections among people who tweeted “#My2K” over the 1-day, 21-hour, 39-minute period from Sunday, 06 January 2013 at 03:30 UTC to Tuesday, 08 January 2013 at 01:09 UTC.
The graph represents a network of 268 Twitter users whose recent tweets contained "#cmgrchat OR #smchat. The network was obtained on Friday, 18 January 2013 at 15:44 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 3-day, 21-hour, 15-minute period from Monday, 14 January 2013 at 18:23 UTC to Friday, 18 January 2013 at 15:38 UTC.
The graph represents a network of 1,227 Twitter users whose recent tweets contained "lumia. The network was obtained on Saturday, 12 January 2013 at 19:52 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 5-hour, 1-minute period from Saturday, 12 January 2013 at 14:36 UTC to Saturday, 12 January 2013 at 19:37 UTC.
The graph represents a network of 1,260 Twitter users whose recent tweets contained "flotus". The network was obtained on Friday, 18 January 2013 at 18:26 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 3-hour, 3-minute period from Friday, 18 January 2013 at 15:16 UTC to Friday, 18 January 2013 at 18:20 UTC.
The graph represents a network of 399 Twitter users whose recent tweets contained "http://www.nytimes.com/2013/01/11/opinion/krugman-coins-against-crazies.html. The network was obtained on Friday, 11 January 2013 at 14:27 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 12-hour, 32-minute period from Friday, 11 January 2013 at 01:52 UTC to Friday, 11 January 2013 at 14:24 UTC.
The graph represents a network of 388 Twitter users whose recent tweets contained "delllistens OR dellcares”. The network was obtained on Tuesday, 19 February 2013 at 17:44 UTC. There is an edge for each follows relationship. There is an edge for each "replies-to" relationship in a tweet. There is an edge for each "mentions" relationship in a tweet. There is a self-loop edge for each tweet that is not a "replies-to" or "mentions". The tweets were made over the 6-day, 21-hour, 58-minute period from Tuesday, 12 February 2013 at 19:34 UTC to Tuesday, 19 February 2013 at 17:33 UTC.
https://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=16540strataconf Twitter NodeXL SNA Map and Report for 2014-02-11 12-53-27The graph represents a network of 1,685 Twitter users whose recent tweets contained "strataconf", tweeted over the 8-day, 0-hour, 44-minute period from Monday, 03 February 2014 at 19:55 UTC to Tuesday, 11 February 2014 at 20:39 UTC.Top Hashtags in Tweet in Entire Graph:#Strataconf, #bigdata, #hds, #BigDataSV, #hadoop, #ddbd
https://www.nodexlgraphgallery.org/Pages/Graph.aspx?graphID=16541datavis Twitter NodeXL SNA Map and Report for Tuesday, 11 February 2014 at 18:55 UTCThe graph represents a network of Twitter users whose tweets in the requested date range contained "dataviz OR datavis“ over the 41-day, 4-hour, 5-minute period from Wednesday, 01 January 2014 at 00:01 UTC to Tuesday, 11 February 2014 at 04:06 UTCTop Hashtags in Tweet in Entire Graph:#dataviz, #bigdata,#analytics,#map,#Europe, #Datavis,#Audit,#Logs
http://portal.sliderocket.com/ATWBE/Using-SNA-to-find-and-manage-RICsC. Scott Dempwolf, PhDResearch Assistant Professor & DirectorUMD - Morgan State Center for Economic Developmenthttp://www.terpconnect.umd.edu/~dempy/Insights: many clusters are based around a county and local enterprises. E.g., the middle-left cluster is Pittsburgh metro area, with large orange Westinghouse Electric. The Philadelphia cluster in the top-right is highly connected to the bottom left, which are adjacent counties. An exception to location grouping is the top-left pharma and medical cluster, composed of several companies, universities, HHS, and an interesting arrangement of inventors in several connected fans.https://plus.google.com/photos/116499393494903612852/albums/5659635437858992593/5659734868308985794?banner=pwa&pid=5659734868308985794&oid=116499393494903612852
Prof. Diane Clinehttp://www.academia.edu/2153390/The_Social_network_of_Alexander_the_Great_Social_Network_Analysis_in_Ancient_HistoryIt’s about who you know, and who those people know, and how everyone knows each other.Data visualization tool – to see data differently.