SlideShare a Scribd company logo
Analyzing Complex Networks
Using Open Source Software
@ODSC
OPEN
DATA
SCIENCE
CONFERENCE
Ken Cherven
@kc2519
visual-baseball.com
visualidity.com
Boston | May 20-22nd 2016
A Brief Outline
• Network Graph Analysis overview
• Tools
• Case Studies
• Conclusions
Network Graph Analysis – aka Social
Network Analysis (SNA), is the study of
connections (links) between actors
(nodes) within a network
node
node
node
node
node
Network Graph Analysis has many use
cases, ranging from the familiar SNA
(Facebook, Twitter networks) to the
more specialized visual and statistical
investigation of political, criminal, or
terrorist networks
The use cases for Network Graph
Analysis are almost endless – any
dataset where relationships can be
mapped can be analyzed both
statistically and visually; all we need are
nodes and links
We have two primary approaches to
assess patterns in a network:
• Statistical measures are used to
understand the underlying structure and
relationships between nodes
• Visual assessment allows us to leverage
size, color, spacing, and structure to
understand patterns at a network level
Statistical measures are employed to
understand structural patterns within the
network:
• Degrees (# of connections)
• Centrality (influence)
• Density (level of network connectedness)
• Homophily (common groupings)
• Diameter (max distance between nodes)
Visual assessment allows us to use our
visual sense to interpret network
patterns:
• Node location to represent related nodes
• Node sizes to represent degrees
• Node coloring to represent common
groupings (clusters, categories)
• Edge weights that show the strength of
connections between nodes
Some open source network graph tools:
• Gephi (http://gephi.org)
• Cytoscape (http://cytoscape.org)
• GraphViz (http://graphviz.org)
• Sigma.js (http://sigmajs.org)
• NodeXL (http://nodexl.codeplex.com/)
• Pajek (http://mrvar.fdv.uni-lj.si/pajek/)
• Tulip (http://tulip.labri.fr/TulipDrupal/)
We’ll use Gephi and Sigma.js for the
following examples:
• Miles Davis album network (tripartite
network)
• Boston Red Sox player network
• GDELT event networks
Miles Davis Album Network
The desire behind the Miles Davis
network is to understand the multiple
phases within his long and varied
career, and to see the shifting
patterns in his musical partnerships
and styles
http://visual-baseball.com/gephi/jazz/miles_davis/#
Miles Davis Network Topology
Miles Davis
Albums
(pink)
Musicians
(colored by instrument)
Five Album Clusters to Investigate
2
3
1
4
5
What do these
clusters represent?
Five Album Clusters Revealed
Early 60s
Big
Bands
Mid-
60s
small
group
1950s
small
groups
1970s
fusion,
electric
sounds
Late career – 1980s,
experimentation, eclectic
instrumentation
A quick exploration of the network
reveals information about the elements
of time, instrumentation, number of
musicians, and types of instruments.
With just a few minutes of traversing the
network, we gain a greater
understanding of Miles Davis’ musical
career
Red Sox Historical Player Network
The goal for the Red Sox player
network is to understand connections
between players across eras, and to
understand influence and groupings
within the network, as defined by
degrees and other centrality
measures
http://visual-baseball.com/gephi/teams/redsox_network/
Red Sox Network Topology
Player nodes are sized and
colored based on number
of years with team and
cluster assignment
Players are positioned based
on common years with team
Links are built using the number
of seasons two players were
on the team roster together
Individual Network Footprints
19 Seasons
269 Degrees
6 Eccentricity
126,355 Betweenness
3.30 Closeness
Ted Williams
Individual Network Footprints
23 Seasons
283 Degrees
5 Eccentricity
596,003 Betweenness
2.64 Closeness
Carl Yastrzemski
Individual Network Footprints
15 Seasons
379 Degrees
7 Eccentricity
120,696 Betweenness
3.36 Closeness
Jason Varitek
A simple look at 3 prominent players
showed us some quickly observable
differences using centrality measures:
• Despite playing several fewer seasons
than either Williams or Yastrzemski,
Varitek has the most connections; but
Yastrzemski could get you to more
players faster by being very central to
the network structure
GDELT Network Analysis
GDELT data exposes an incredible
number of opportunities for viewing
network data based on published
accounts of news events around the
world. Our exploration focuses on US
Government threats reported
between March 1st and April 30, 2016
GDELT Network Topology (Geo Layout)
Using Geo Layout
Connections are between Actor1
and Actor2 within a specific event
instance; Actor1 is often the
Protagonist, Actor2 the Target
Nodes are positioned by lat/lon coordinates;
most are concentrated in the Northeast US
Node and edge colors are based on the
GDELT GoldsteinScale variable; darker colors
are indicative of higher destabilization potential
Exploring the Graph Geographically
Using Geo Layout
GDELT Network Topology (Dual Circle)
Using Dual Circle Layout
Prominent nodes are positioned in the inner
circle, based on the number of articles on
cumulative events (speeches, press
conference, negotiations, etc.)
Secondary nodes are positioned around the
outer circle; these may be either primary or
secondary actors in an event
Node colors are again based on the GDELT
GoldsteinScale variable
Exploring Nodes Using Sigma.js
Using Dual Circle Layout
Exploring Nodes Using Sigma.js
Using Dual Circle Layout
A few minutes of network exploration
reveals topic patterns based on news
reporting, and allows us to understand
which actors are directing actions
against others, and what is the tone of
those actions. Tracking these measures
over time will enable us to spot trends
both positive and negative.
Conclusions
• Network graph analysis is a powerful tool for
visually and statistically assessing complex
networks
• Network graphs are proliferating, due to the
availability of multiple open source tools and
increasing amounts of open data
• Network graph analysis can be used to tell
powerful stories wherever connected data is
present
Thanks –
and happy networking!
Backup
Miles Davis network specs:
• Data sourced from Wikipedia
• Nodes and edges created in Excel
• Graph created in Gephi using the Yifan Hu
Proportional algorithm
• Exported to Sigma.js (json format)
• 348 nodes, 596 edges
Red Sox Player Network specs:
• Data sourced from Lahman Database at
seanlahman.com
• Nodes and edges created using SQL code in
Toad for MySQL
• Graphs created in Gephi using the ARF layout
algorithm
• JSON file exported to Sigma.js
• 1668 nodes, 51,223 edges
GDELT classifications:
• Type refers to groupings such as
Government, Media, Education, and many
more
• Event codes reference the type of event –
riots, protests, sanctions, and so on
• The GoldsteinScale runs from -10 to 10 in
describing the relative destabilizing potential
of the event
GDELT Network specs:
• Data sourced from the GDELT event database
at gdeltproject.org (3/1 to 4/30/16)
• Nodes and edges refined using SQL code in
Toad for MySQL
• Graphs created in Gephi using the Geo Layout
and Dual Circle algorithms
• GEXF files exported for use with Sigma.js
• 414 nodes, 11,975 edges

More Related Content

What's hot

Social Network Analysis (SNA)
Social Network Analysis (SNA)Social Network Analysis (SNA)
Social Network Analysis (SNA)
Development Innovations
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
Wael Elrifai
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
librarianrafia
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
Symeon Papadopoulos
 
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on TwitterSocial Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
Axel Bruns
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Xiaohan Zeng
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
Todd Rutherford
 
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Labic Ufes
 
Data Science career mixer poster
Data Science career mixer posterData Science career mixer poster
Data Science career mixer poster
Tom Jeon
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
Marc Smith
 
Phd Colloquium Spatial Analysis
Phd Colloquium Spatial AnalysisPhd Colloquium Spatial Analysis
Phd Colloquium Spatial Analysis
alistairleak
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
Marco Brambilla
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
Giorgos Cheliotis
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
Hendrik Speck
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
World Agroforestry (ICRAF)
 
Data-mining the Semantic Web
Data-mining the Semantic WebData-mining the Semantic Web
Data-mining the Semantic Web
Frank Lynam
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
Pete Burnap
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
dnac
 
From Geographic Location to Network Location: The Potential of Big Social Data
From Geographic Location to Network Location: The Potential of Big Social DataFrom Geographic Location to Network Location: The Potential of Big Social Data
From Geographic Location to Network Location: The Potential of Big Social Data
Axel Bruns
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
Research Data Alliance
 

What's hot (20)

Social Network Analysis (SNA)
Social Network Analysis (SNA)Social Network Analysis (SNA)
Social Network Analysis (SNA)
 
Social network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and moreSocial network analysis & Big Data - Telecommunications and more
Social network analysis & Big Data - Telecommunications and more
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
Predicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online DiscussionsPredicting News Popularity by Mining Online Discussions
Predicting News Popularity by Mining Online Discussions
 
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on TwitterSocial Media in Australia: A ‘Big Data’ Perspective on Twitter
Social Media in Australia: A ‘Big Data’ Perspective on Twitter
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Team CDTW Capstone Presentation
Team CDTW Capstone Presentation Team CDTW Capstone Presentation
Team CDTW Capstone Presentation
 
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
 
Data Science career mixer poster
Data Science career mixer posterData Science career mixer poster
Data Science career mixer poster
 
2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis2010 Catalyst Conference - Trends in Social Network Analysis
2010 Catalyst Conference - Trends in Social Network Analysis
 
Phd Colloquium Spatial Analysis
Phd Colloquium Spatial AnalysisPhd Colloquium Spatial Analysis
Phd Colloquium Spatial Analysis
 
Data Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extractionData Cleaning for social media knowledge extraction
Data Cleaning for social media knowledge extraction
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Data-mining the Semantic Web
Data-mining the Semantic WebData-mining the Semantic Web
Data-mining the Semantic Web
 
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"#ICCSS2015 - Computational Human Security Analytics using "Big Data"
#ICCSS2015 - Computational Human Security Analytics using "Big Data"
 
01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures01 Introduction to Networks Methods and Measures
01 Introduction to Networks Methods and Measures
 
From Geographic Location to Network Location: The Potential of Big Social Data
From Geographic Location to Network Location: The Potential of Big Social DataFrom Geographic Location to Network Location: The Potential of Big Social Data
From Geographic Location to Network Location: The Potential of Big Social Data
 
RDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOneRDA, Data Citation, and PIDs for DataOne
RDA, Data Citation, and PIDs for DataOne
 

Viewers also liked

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
DataTactics
 
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...
Haewoon Kwak
 
Οι Λάπωνες
Οι ΛάπωνεςΟι Λάπωνες
Οι Λάπωνες
Despoina Angelaki
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
DataTactics
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
Rich Heimann
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
DataTactics
 
Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1
DataTactics
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
DataTactics
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
DataTactics
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and Reports
DataTactics
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3
DataTactics
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
DataTactics
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
Rich Heimann
 
Horizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataHorizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence Data
DataTactics
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
freshdatabos
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Spark Summit
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the Hairball
OReillyStrata
 

Viewers also liked (17)

NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATANETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
NETWORK CENTRALITY IN SUB-NATIONAL AREAS OF INTEREST USING GDELT DATA
 
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...
Revealing the Hidden Patterns of News Photos: Analysis of Millions of News Ph...
 
Οι Λάπωνες
Οι ΛάπωνεςΟι Λάπωνες
Οι Λάπωνες
 
Data Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtcData Tactics dhs introduction to cloud technologies wtc
Data Tactics dhs introduction to cloud technologies wtc
 
A Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics CorporationA Blended Approach to Analytics at Data Tactics Corporation
A Blended Approach to Analytics at Data Tactics Corporation
 
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013Data Tactics Semantic and Interoperability Summit Feb 12, 2013
Data Tactics Semantic and Interoperability Summit Feb 12, 2013
 
Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1Multi Discipline Intelligence Production Teams 1
Multi Discipline Intelligence Production Teams 1
 
Data Tactics Open Source Brief
Data Tactics Open Source BriefData Tactics Open Source Brief
Data Tactics Open Source Brief
 
Big Data Conference
Big Data ConferenceBig Data Conference
Big Data Conference
 
Ontology and Reports
Ontology and ReportsOntology and Reports
Ontology and Reports
 
Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3Data Tactics and Nervve Integrated Big Data v3
Data Tactics and Nervve Integrated Big Data v3
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?Why L-3 Data Tactics Data Science?
Why L-3 Data Tactics Data Science?
 
Horizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence DataHorizontal Integration of Big Intelligence Data
Horizontal Integration of Big Intelligence Data
 
Visualizing Networks
Visualizing NetworksVisualizing Networks
Visualizing Networks
 
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
Dynamic Community Detection for Large-scale e-Commerce data with Spark Stream...
 
Visualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the HairballVisualizing Networks: Beyond the Hairball
Visualizing Networks: Beyond the Hairball
 

Similar to ODSC_Cherven_20160518

Exploring Social Media with NodeXL
Exploring Social Media with NodeXL Exploring Social Media with NodeXL
Exploring Social Media with NodeXL
Shalin Hai-Jew
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Daniel Katz
 
SSRI_pt1.ppt
SSRI_pt1.pptSSRI_pt1.ppt
SSRI_pt1.ppt
9260SahilPatil
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
Arsalan Khan
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
CameliaN
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
Tin180 VietNam
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Jonathan Stray
 
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Daniel Katz
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
Charalampos Chelmis
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013
The Pathway Group
 
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Symeon Papadopoulos
 
OccupyWallStreetNetworkAnalysis.pptx
OccupyWallStreetNetworkAnalysis.pptxOccupyWallStreetNetworkAnalysis.pptx
OccupyWallStreetNetworkAnalysis.pptx
FabrizioLanubile
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
Duke Network Analysis Center
 
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Dave King
 
R Packages Unpacked
R Packages UnpackedR Packages Unpacked
R Packages Unpacked
Shana White
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
webuploader
 
A Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions DocxA Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions Docx
Webometrics Class
 
Evolving social data mining and affective analysis
Evolving social data mining and affective analysis  Evolving social data mining and affective analysis
Evolving social data mining and affective analysis
Athena Vakali
 
Sharma social networks (1)
Sharma social networks (1)Sharma social networks (1)
Sharma social networks (1)
Kuldeep Chand
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networks
Houw Liong The
 

Similar to ODSC_Cherven_20160518 (20)

Exploring Social Media with NodeXL
Exploring Social Media with NodeXL Exploring Social Media with NodeXL
Exploring Social Media with NodeXL
 
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
Legal Analytics Course - Class 11 - Network Analysis and Law - Professors Dan...
 
SSRI_pt1.ppt
SSRI_pt1.pptSSRI_pt1.ppt
SSRI_pt1.ppt
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012Sylva workshop.gt that camp.2012
Sylva workshop.gt that camp.2012
 
Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)Socialnetworkanalysis (Tin180 Com)
Socialnetworkanalysis (Tin180 Com)
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
Network Analysis and Law: Introductory Tutorial @ Jurix 2011 Meeting (Vienna)
 
Predicting Communication Intention in Social Media
Predicting Communication Intention in Social MediaPredicting Communication Intention in Social Media
Predicting Communication Intention in Social Media
 
Data visualisationsummit 2013
Data visualisationsummit 2013Data visualisationsummit 2013
Data visualisationsummit 2013
 
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...Community Structure, Interaction and Evolution Analysis of Online Social Netw...
Community Structure, Interaction and Evolution Analysis of Online Social Netw...
 
OccupyWallStreetNetworkAnalysis.pptx
OccupyWallStreetNetworkAnalysis.pptxOccupyWallStreetNetworkAnalysis.pptx
OccupyWallStreetNetworkAnalysis.pptx
 
01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)01 Introduction to Networks Methods and Measures (2016)
01 Introduction to Networks Methods and Measures (2016)
 
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
Mining and analyzing social media   part 2 - hicss47 tutorial - dave kingMining and analyzing social media   part 2 - hicss47 tutorial - dave king
Mining and analyzing social media part 2 - hicss47 tutorial - dave king
 
R Packages Unpacked
R Packages UnpackedR Packages Unpacked
R Packages Unpacked
 
Contractor-Borner-SNA-SAC
Contractor-Borner-SNA-SACContractor-Borner-SNA-SAC
Contractor-Borner-SNA-SAC
 
A Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions DocxA Picture Is Worth A Thousand Questions Docx
A Picture Is Worth A Thousand Questions Docx
 
Evolving social data mining and affective analysis
Evolving social data mining and affective analysis  Evolving social data mining and affective analysis
Evolving social data mining and affective analysis
 
Sharma social networks (1)
Sharma social networks (1)Sharma social networks (1)
Sharma social networks (1)
 
Sharma : social networks
Sharma : social networksSharma : social networks
Sharma : social networks
 

ODSC_Cherven_20160518

  • 1. Analyzing Complex Networks Using Open Source Software @ODSC OPEN DATA SCIENCE CONFERENCE Ken Cherven @kc2519 visual-baseball.com visualidity.com Boston | May 20-22nd 2016
  • 2. A Brief Outline • Network Graph Analysis overview • Tools • Case Studies • Conclusions
  • 3. Network Graph Analysis – aka Social Network Analysis (SNA), is the study of connections (links) between actors (nodes) within a network node node node node node
  • 4. Network Graph Analysis has many use cases, ranging from the familiar SNA (Facebook, Twitter networks) to the more specialized visual and statistical investigation of political, criminal, or terrorist networks
  • 5. The use cases for Network Graph Analysis are almost endless – any dataset where relationships can be mapped can be analyzed both statistically and visually; all we need are nodes and links
  • 6. We have two primary approaches to assess patterns in a network: • Statistical measures are used to understand the underlying structure and relationships between nodes • Visual assessment allows us to leverage size, color, spacing, and structure to understand patterns at a network level
  • 7. Statistical measures are employed to understand structural patterns within the network: • Degrees (# of connections) • Centrality (influence) • Density (level of network connectedness) • Homophily (common groupings) • Diameter (max distance between nodes)
  • 8. Visual assessment allows us to use our visual sense to interpret network patterns: • Node location to represent related nodes • Node sizes to represent degrees • Node coloring to represent common groupings (clusters, categories) • Edge weights that show the strength of connections between nodes
  • 9. Some open source network graph tools: • Gephi (http://gephi.org) • Cytoscape (http://cytoscape.org) • GraphViz (http://graphviz.org) • Sigma.js (http://sigmajs.org) • NodeXL (http://nodexl.codeplex.com/) • Pajek (http://mrvar.fdv.uni-lj.si/pajek/) • Tulip (http://tulip.labri.fr/TulipDrupal/)
  • 10. We’ll use Gephi and Sigma.js for the following examples: • Miles Davis album network (tripartite network) • Boston Red Sox player network • GDELT event networks
  • 11. Miles Davis Album Network
  • 12. The desire behind the Miles Davis network is to understand the multiple phases within his long and varied career, and to see the shifting patterns in his musical partnerships and styles http://visual-baseball.com/gephi/jazz/miles_davis/#
  • 13. Miles Davis Network Topology Miles Davis Albums (pink) Musicians (colored by instrument)
  • 14. Five Album Clusters to Investigate 2 3 1 4 5 What do these clusters represent?
  • 15. Five Album Clusters Revealed Early 60s Big Bands Mid- 60s small group 1950s small groups 1970s fusion, electric sounds Late career – 1980s, experimentation, eclectic instrumentation
  • 16. A quick exploration of the network reveals information about the elements of time, instrumentation, number of musicians, and types of instruments. With just a few minutes of traversing the network, we gain a greater understanding of Miles Davis’ musical career
  • 17. Red Sox Historical Player Network
  • 18. The goal for the Red Sox player network is to understand connections between players across eras, and to understand influence and groupings within the network, as defined by degrees and other centrality measures http://visual-baseball.com/gephi/teams/redsox_network/
  • 19. Red Sox Network Topology Player nodes are sized and colored based on number of years with team and cluster assignment Players are positioned based on common years with team Links are built using the number of seasons two players were on the team roster together
  • 20. Individual Network Footprints 19 Seasons 269 Degrees 6 Eccentricity 126,355 Betweenness 3.30 Closeness Ted Williams
  • 21. Individual Network Footprints 23 Seasons 283 Degrees 5 Eccentricity 596,003 Betweenness 2.64 Closeness Carl Yastrzemski
  • 22. Individual Network Footprints 15 Seasons 379 Degrees 7 Eccentricity 120,696 Betweenness 3.36 Closeness Jason Varitek
  • 23. A simple look at 3 prominent players showed us some quickly observable differences using centrality measures: • Despite playing several fewer seasons than either Williams or Yastrzemski, Varitek has the most connections; but Yastrzemski could get you to more players faster by being very central to the network structure
  • 25. GDELT data exposes an incredible number of opportunities for viewing network data based on published accounts of news events around the world. Our exploration focuses on US Government threats reported between March 1st and April 30, 2016
  • 26. GDELT Network Topology (Geo Layout) Using Geo Layout Connections are between Actor1 and Actor2 within a specific event instance; Actor1 is often the Protagonist, Actor2 the Target Nodes are positioned by lat/lon coordinates; most are concentrated in the Northeast US Node and edge colors are based on the GDELT GoldsteinScale variable; darker colors are indicative of higher destabilization potential
  • 27. Exploring the Graph Geographically Using Geo Layout
  • 28. GDELT Network Topology (Dual Circle) Using Dual Circle Layout Prominent nodes are positioned in the inner circle, based on the number of articles on cumulative events (speeches, press conference, negotiations, etc.) Secondary nodes are positioned around the outer circle; these may be either primary or secondary actors in an event Node colors are again based on the GDELT GoldsteinScale variable
  • 29. Exploring Nodes Using Sigma.js Using Dual Circle Layout
  • 30. Exploring Nodes Using Sigma.js Using Dual Circle Layout
  • 31. A few minutes of network exploration reveals topic patterns based on news reporting, and allows us to understand which actors are directing actions against others, and what is the tone of those actions. Tracking these measures over time will enable us to spot trends both positive and negative.
  • 32. Conclusions • Network graph analysis is a powerful tool for visually and statistically assessing complex networks • Network graphs are proliferating, due to the availability of multiple open source tools and increasing amounts of open data • Network graph analysis can be used to tell powerful stories wherever connected data is present
  • 33. Thanks – and happy networking!
  • 35. Miles Davis network specs: • Data sourced from Wikipedia • Nodes and edges created in Excel • Graph created in Gephi using the Yifan Hu Proportional algorithm • Exported to Sigma.js (json format) • 348 nodes, 596 edges
  • 36. Red Sox Player Network specs: • Data sourced from Lahman Database at seanlahman.com • Nodes and edges created using SQL code in Toad for MySQL • Graphs created in Gephi using the ARF layout algorithm • JSON file exported to Sigma.js • 1668 nodes, 51,223 edges
  • 37. GDELT classifications: • Type refers to groupings such as Government, Media, Education, and many more • Event codes reference the type of event – riots, protests, sanctions, and so on • The GoldsteinScale runs from -10 to 10 in describing the relative destabilizing potential of the event
  • 38. GDELT Network specs: • Data sourced from the GDELT event database at gdeltproject.org (3/1 to 4/30/16) • Nodes and edges refined using SQL code in Toad for MySQL • Graphs created in Gephi using the Geo Layout and Dual Circle algorithms • GEXF files exported for use with Sigma.js • 414 nodes, 11,975 edges

Editor's Notes

  1. A tripartite network