SlideShare a Scribd company logo
1 of 41
Download to read offline
Dynamics in Graph Analysis
Adding Time as Structure for Visual
and Statistical Insight
Benjamin Bengfort
@bbengfort
District Data Labs
Are graphs effective for analytics?
Or why use graphs at all?
Algorithm Performance
More understandable implementations
and native parallelism provide benefits
particularly to machine learning.
Visual Analytics
Humans can understand and interpret
interconnection structures, leading to
immediate insights.
“Graph technologies ease the modeling of
your domain and improve the simplicity
and speed of your queries.”
— Marko A. Rodriguez
http://bit.ly/2cthd2L
Construction
Given a set of [paths,
vertices] is a [constraint]
graph construction
possible?
Existence
Does there exist a [path,
vertex, set] within
[constraints]?
Optimization
Given several [paths,
subgraphs, vertices, sets] is
one the best?
Enumeration
How many [vertices, edges]
exist with [constraints], is it
possible to list them?
Traversals
Property Graphs
How do you model time?
Relational Database
Time Properties
Time Modifies Traversal
Example of Time Filtered Traversal: Data Model
Name: Emails Sent Network
Number of nodes: 6,174
Number of edges: 343,702
Average degree: 111.339
def sent_range(g, before=None, after=None):
# Create filtering function based on date range.
def inner(edge):
if before:
return g.ep.sent[edge] < before
if after:
return g.ep.sent[edge] > after
return inner
def degree_filter(degree=0):
# Create filtering function based on min degree.
def inner(vertex):
return vertex.out_degree() > degree
return inner
Example of Time Filtered Traversal
print("{} vertices and {} edges".format(
g.num_vertices(), g.num_edges()
))
# 6174 vertices and 343702 edges
aug = sent_range(g,
after=dateparse("Aug 1, 2016 09:00:00 EST")
)
view = gt.GraphView(g, efilt=aug)
view = gt.GraphView(view, vfilt=degree_filter())
print("{} vertices and {} edges".format(
view.num_vertices(), view.num_edges()
))
# 853 vertices and 24813 edges
Example of Time Filtered Traversal
What makes a graph dynamic?
Time Structures
Perform static analysis on dynamic
components with time as a structure.
Dynamic Graphs
Multiple subgraphs representing the
graph state at a discrete timestep.
Keyphrases over Time
Natural Language Graph Analysis: Data Ingestion
Natural Language Graph Analysis: Data Modeling
Name: Baleen Keyphrase Graph
Number of nodes: 2,682,624
Number of edges: 46,958,599
Average degree: 35.0095
Name: Sampled Keyphrase Graph
Number of nodes: 139,227
Number of edges: 257,316
Average degree: 3.6964
def degree_filter(degree=0):
def inner(vertex):
return vertex.out_degree() > degree
return inner
g = gt.GraphView(g, vfilt=degree_filter(3))
Name: High Degree Phrase Graph
Number of nodes: 8,520
Number of edges: 112,320
Average degree: 26.366
Natural Language Graph Analysis: Data Wrangling
Basic Keyphrase Graph Information
Vertex Type Analysis
Primarily keyphrases and documents.
Degree Distribution
Power laws distribution of degree.
Natural Language Graph Analysis: Data Wrangling
def ego_filter(g, ego, hops=2):
def inner(v):
dist = gt.shortest_distance(g, ego, v)
return dist <= hops
return inner
# Get a random document
v = random.choice([
v for v in g.vertices()
if g.vp.type[v] == 'document'
])
ego = gt.GraphView(
g, vfilt=ego_filter(g,v, 1)
)
The Centrality of Time
Extract Week of the Year as Time Structure
# Construct Time Structures to Keyphrase
h = gt.Graph(directed=False)
h.gp.name = h.new_graph_property('string')
h.gp.name = "Phrases by Week"
# Add vertex properties
h.vp.label = h.new_vertex_property('string')
h.vp.vtype = h.new_vertex_property('string')
# Create graph from the keyphrase graph
for vertex in g.vertices():
if g.vp.type[vertex] == 'document':
dt = g.vp.pubdate[vertex]
weekno = dt.isocalendar()[1]
week = h.add_vertex()
h.vp.label[week] = "Week %d" % weekno
h.vp.vtype[week] = 'week'
for neighbor in vertex.out_neighbours():
if g.vp.type[neighbor] == 'phrase':
phrase = h.add_vertex()
h.vp.vtype[vidmap[phrase]] = 'phrase'
h.add_edge(week, phrase)
PageRank Centrality
A variant of Eigenvector
centrality that has a scaling
factor and prioritizes
incoming links.
Eigenvector Centrality
A measure of relative
influence where closeness
to important nodes matters
as much as other metrics.
Degree Centrality
A vertex is more important
the more connections it
has. E.g. “celebrity”.
Betweenness Centrality
How many shortest paths
pass through the given
vertex. E.g. how often is
information flow through?
What are the central weeks and phrases?
Betweenness Centrality Katz Centrality
Keyphrase Dynamics
Create Sequences of Time Ordered Subgraphs
Animating Dynamics
Network Visualization
Layout: Edge and Vertex Positioning
Fruchterman
Reingold
SFDP (Yifan-Hu)
Force Directed
Radial Tree Layout
by MST
ARF Spring Block
Visual Properties of Vertices
Lane Harrison, The Links that Bind Us: Network Visualizations
http://blog.visual.ly/network-visualizations
Visual Properties of Edges
Lane Harrison, The Links that Bind Us: Network Visualizations
http://blog.visual.ly/network-visualizations
Visual Analysis
The Visual Analytics Mantra
Overview First Zoom and Filter Details on Demand
Questions?

More Related Content

What's hot

Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network SciencePavel Loskot
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBenjamin Bengfort
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningChetan Khatri
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduceVarad Meru
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionActiveEon
 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageWilliam de Vazelhes
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpankit_ppt
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)James McMurray
 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemComparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemCSCJournals
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: ClusteringDeepak George
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityFarah M. Altufaili
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lectureSara-Jayne Terp
 

What's hot (20)

Networkx tutorial
Networkx tutorialNetworkx tutorial
Networkx tutorial
 
Minicourse on Network Science
Minicourse on Network ScienceMinicourse on Network Science
Minicourse on Network Science
 
Beginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix FactorizationBeginners Guide to Non-Negative Matrix Factorization
Beginners Guide to Non-Negative Matrix Factorization
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Python networkx library quick start guide
Python networkx library quick start guidePython networkx library quick start guide
Python networkx library quick start guide
 
Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
An Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learningAn Introduction Linear Algebra for Neural Networks and Deep learning
An Introduction Linear Algebra for Neural Networks and Deep learning
 
Data clustering using map reduce
Data clustering using map reduceData clustering using map reduce
Data clustering using map reduce
 
Principal component analysis
Principal component analysisPrincipal component analysis
Principal component analysis
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
Matrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer VisionMatrix and Tensor Tools for Computer Vision
Matrix and Tensor Tools for Computer Vision
 
Metric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible packageMetric-learn, a Scikit-learn compatible package
Metric-learn, a Scikit-learn compatible package
 
Matrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlpMatrix decomposition and_applications_to_nlp
Matrix decomposition and_applications_to_nlp
 
The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)The Gaussian Process Latent Variable Model (GPLVM)
The Gaussian Process Latent Variable Model (GPLVM)
 
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path ProblemComparative Analysis of Algorithms for Single Source Shortest Path Problem
Comparative Analysis of Algorithms for Single Source Shortest Path Problem
 
Unsupervised learning: Clustering
Unsupervised learning: ClusteringUnsupervised learning: Clustering
Unsupervised learning: Clustering
 
A Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image SimilarityA Correlative Information-Theoretic Measure for Image Similarity
A Correlative Information-Theoretic Measure for Image Similarity
 
Network analysis lecture
Network analysis lectureNetwork analysis lecture
Network analysis lecture
 
MS Thesis
MS ThesisMS Thesis
MS Thesis
 
Spark algorithms
Spark algorithmsSpark algorithms
Spark algorithms
 

Viewers also liked

Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Benjamin Bengfort
 
An Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportAn Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportBenjamin Bengfort
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonBenjamin Bengfort
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with PythonBenjamin Bengfort
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Benjamin Bengfort
 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Kyunghoon Kim
 
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2Kyunghoon Kim
 
Visualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber SecurityVisualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber SecurityCambridge Intelligence
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-feiTianlu Wang
 
Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Kyunghoon Kim
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataBenjamin Bengfort
 
Network theory - PyCon 2015
Network theory - PyCon 2015Network theory - PyCon 2015
Network theory - PyCon 2015Sarah Guido
 
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...Birst
 
Solving graph problems using networkX
Solving graph problems using networkXSolving graph problems using networkX
Solving graph problems using networkXKrishna Sangeeth KS
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra JohnsonSigOpt
 
The five essential steps to building a data product
The five essential steps to building a data productThe five essential steps to building a data product
The five essential steps to building a data productBirst
 
Communities and dynamics in social networks
Communities and dynamics in social networksCommunities and dynamics in social networks
Communities and dynamics in social networksFrancisco Restivo
 

Viewers also liked (20)

Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
Visualizing Model Selection with Scikit-Yellowbrick: An Introduction to Devel...
 
An Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation ReportAn Interactive Visual Analytics Dashboard for the Employment Situation Report
An Interactive Visual Analytics Dashboard for the Employment Situation Report
 
Fast Data Analytics with Spark and Python
Fast Data Analytics with Spark and PythonFast Data Analytics with Spark and Python
Fast Data Analytics with Spark and Python
 
Natural Language Processing with Python
Natural Language Processing with PythonNatural Language Processing with Python
Natural Language Processing with Python
 
Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)Building Data Products with Python (Georgetown)
Building Data Products with Python (Georgetown)
 
Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2Network Analysis with networkX : Real-World Example-2
Network Analysis with networkX : Real-World Example-2
 
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
(Tentative) Network Analysis with networkX : Fundamentals of network theory-2
 
Visualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber SecurityVisualizing Threats: Network Visualization for Cyber Security
Visualizing Threats: Network Visualization for Cyber Security
 
Lecture7 xing fei-fei
Lecture7 xing fei-feiLecture7 xing fei-fei
Lecture7 xing fei-fei
 
Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1Network Analysis with networkX : Real-World Example-1
Network Analysis with networkX : Real-World Example-1
 
Annotation with Redfox
Annotation with RedfoxAnnotation with Redfox
Annotation with Redfox
 
Rasta processing of speech
Rasta processing of speechRasta processing of speech
Rasta processing of speech
 
Graph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational DataGraph Based Machine Learning on Relational Data
Graph Based Machine Learning on Relational Data
 
Network theory - PyCon 2015
Network theory - PyCon 2015Network theory - PyCon 2015
Network theory - PyCon 2015
 
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
The Art and Science of Sales Forecasting: A Webinar for Sales Managers and Co...
 
Solving graph problems using networkX
Solving graph problems using networkXSolving graph problems using networkX
Solving graph problems using networkX
 
Plotcon 2016 Visualization Talk by Alexandra Johnson
Plotcon 2016 Visualization Talk  by Alexandra JohnsonPlotcon 2016 Visualization Talk  by Alexandra Johnson
Plotcon 2016 Visualization Talk by Alexandra Johnson
 
The five essential steps to building a data product
The five essential steps to building a data productThe five essential steps to building a data product
The five essential steps to building a data product
 
Communities and dynamics in social networks
Communities and dynamics in social networksCommunities and dynamics in social networks
Communities and dynamics in social networks
 
PROTEUS H2020
PROTEUS H2020 PROTEUS H2020
PROTEUS H2020
 

Similar to Dynamics in graph analysis (PyData Carolinas 2016)

Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!OSCON Byrum
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Spark Summit
 
Visualization of Big Data in Web Apps
Visualization of Big Data in Web AppsVisualization of Big Data in Web Apps
Visualization of Big Data in Web AppsEPAM
 
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)Oleksii Prohonnyi
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jugGerald Muecke
 
Map reduce hackerdojo
Map reduce hackerdojoMap reduce hackerdojo
Map reduce hackerdojonagwww
 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSudipta Mukherjee
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineeringJulian Hyde
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And MiningSrinath Srinivasa
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingDatabricks
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatreRaginiRatre
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Dieter Plaetinck
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster InnardsMartin Dvorak
 
Everything is composable
Everything is composableEverything is composable
Everything is composableVictor Igor
 

Similar to Dynamics in graph analysis (PyData Carolinas 2016) (20)

Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
Time-evolving Graph Processing on Commodity Clusters: Spark Summit East talk ...
 
3DRepo
3DRepo3DRepo
3DRepo
 
Visualization of Big Data in Web Apps
Visualization of Big Data in Web AppsVisualization of Big Data in Web Apps
Visualization of Big Data in Web Apps
 
Introduction to D3.js
Introduction to D3.jsIntroduction to D3.js
Introduction to D3.js
 
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
D3.JS Tips & Tricks (export to svg, crossfilter, maps etc.)
 
Making sense of your data jug
Making sense of your data   jugMaking sense of your data   jug
Making sense of your data jug
 
Map reduce hackerdojo
Map reduce hackerdojoMap reduce hackerdojo
Map reduce hackerdojo
 
Squirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for AllSquirrel – Enabling Accessible Analytics for All
Squirrel – Enabling Accessible Analytics for All
 
Tactical data engineering
Tactical data engineeringTactical data engineering
Tactical data engineering
 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
 
Trends In Graph Data Management And Mining
Trends In Graph Data Management And MiningTrends In Graph Data Management And Mining
Trends In Graph Data Management And Mining
 
Real-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to StreamingReal-Time Spark: From Interactive Queries to Streaming
Real-Time Spark: From Interactive Queries to Streaming
 
PPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini RatrePPT ON MACHINE LEARNING by Ragini Ratre
PPT ON MACHINE LEARNING by Ragini Ratre
 
Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014Rethinking metrics: metrics 2.0 @ Lisa 2014
Rethinking metrics: metrics 2.0 @ Lisa 2014
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
Google Cluster Innards
Google Cluster InnardsGoogle Cluster Innards
Google Cluster Innards
 
Everything is composable
Everything is composableEverything is composable
Everything is composable
 
For project
For projectFor project
For project
 
Dex Technical Seminar (April 2011)
Dex Technical Seminar (April 2011)Dex Technical Seminar (April 2011)
Dex Technical Seminar (April 2011)
 

Recently uploaded

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Recently uploaded (20)

科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Call Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort ServiceCall Girls in Saket 99530🔝 56974 Escort Service
Call Girls in Saket 99530🔝 56974 Escort Service
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 

Dynamics in graph analysis (PyData Carolinas 2016)

  • 1. Dynamics in Graph Analysis Adding Time as Structure for Visual and Statistical Insight Benjamin Bengfort @bbengfort District Data Labs
  • 2. Are graphs effective for analytics? Or why use graphs at all?
  • 3. Algorithm Performance More understandable implementations and native parallelism provide benefits particularly to machine learning. Visual Analytics Humans can understand and interpret interconnection structures, leading to immediate insights.
  • 4. “Graph technologies ease the modeling of your domain and improve the simplicity and speed of your queries.” — Marko A. Rodriguez http://bit.ly/2cthd2L
  • 5. Construction Given a set of [paths, vertices] is a [constraint] graph construction possible? Existence Does there exist a [path, vertex, set] within [constraints]? Optimization Given several [paths, subgraphs, vertices, sets] is one the best? Enumeration How many [vertices, edges] exist with [constraints], is it possible to list them?
  • 8. How do you model time?
  • 11.
  • 13. Example of Time Filtered Traversal: Data Model Name: Emails Sent Network Number of nodes: 6,174 Number of edges: 343,702 Average degree: 111.339
  • 14. def sent_range(g, before=None, after=None): # Create filtering function based on date range. def inner(edge): if before: return g.ep.sent[edge] < before if after: return g.ep.sent[edge] > after return inner def degree_filter(degree=0): # Create filtering function based on min degree. def inner(vertex): return vertex.out_degree() > degree return inner Example of Time Filtered Traversal
  • 15. print("{} vertices and {} edges".format( g.num_vertices(), g.num_edges() )) # 6174 vertices and 343702 edges aug = sent_range(g, after=dateparse("Aug 1, 2016 09:00:00 EST") ) view = gt.GraphView(g, efilt=aug) view = gt.GraphView(view, vfilt=degree_filter()) print("{} vertices and {} edges".format( view.num_vertices(), view.num_edges() )) # 853 vertices and 24813 edges Example of Time Filtered Traversal
  • 16. What makes a graph dynamic?
  • 17. Time Structures Perform static analysis on dynamic components with time as a structure. Dynamic Graphs Multiple subgraphs representing the graph state at a discrete timestep.
  • 18.
  • 19.
  • 21. Natural Language Graph Analysis: Data Ingestion
  • 22. Natural Language Graph Analysis: Data Modeling Name: Baleen Keyphrase Graph Number of nodes: 2,682,624 Number of edges: 46,958,599 Average degree: 35.0095 Name: Sampled Keyphrase Graph Number of nodes: 139,227 Number of edges: 257,316 Average degree: 3.6964
  • 23. def degree_filter(degree=0): def inner(vertex): return vertex.out_degree() > degree return inner g = gt.GraphView(g, vfilt=degree_filter(3)) Name: High Degree Phrase Graph Number of nodes: 8,520 Number of edges: 112,320 Average degree: 26.366 Natural Language Graph Analysis: Data Wrangling
  • 24. Basic Keyphrase Graph Information Vertex Type Analysis Primarily keyphrases and documents. Degree Distribution Power laws distribution of degree.
  • 25. Natural Language Graph Analysis: Data Wrangling def ego_filter(g, ego, hops=2): def inner(v): dist = gt.shortest_distance(g, ego, v) return dist <= hops return inner # Get a random document v = random.choice([ v for v in g.vertices() if g.vp.type[v] == 'document' ]) ego = gt.GraphView( g, vfilt=ego_filter(g,v, 1) )
  • 26.
  • 28. Extract Week of the Year as Time Structure # Construct Time Structures to Keyphrase h = gt.Graph(directed=False) h.gp.name = h.new_graph_property('string') h.gp.name = "Phrases by Week" # Add vertex properties h.vp.label = h.new_vertex_property('string') h.vp.vtype = h.new_vertex_property('string') # Create graph from the keyphrase graph for vertex in g.vertices(): if g.vp.type[vertex] == 'document': dt = g.vp.pubdate[vertex] weekno = dt.isocalendar()[1] week = h.add_vertex() h.vp.label[week] = "Week %d" % weekno h.vp.vtype[week] = 'week' for neighbor in vertex.out_neighbours(): if g.vp.type[neighbor] == 'phrase': phrase = h.add_vertex() h.vp.vtype[vidmap[phrase]] = 'phrase' h.add_edge(week, phrase)
  • 29. PageRank Centrality A variant of Eigenvector centrality that has a scaling factor and prioritizes incoming links. Eigenvector Centrality A measure of relative influence where closeness to important nodes matters as much as other metrics. Degree Centrality A vertex is more important the more connections it has. E.g. “celebrity”. Betweenness Centrality How many shortest paths pass through the given vertex. E.g. how often is information flow through?
  • 30. What are the central weeks and phrases? Betweenness Centrality Katz Centrality
  • 32. Create Sequences of Time Ordered Subgraphs
  • 35. Layout: Edge and Vertex Positioning Fruchterman Reingold SFDP (Yifan-Hu) Force Directed Radial Tree Layout by MST ARF Spring Block
  • 36. Visual Properties of Vertices Lane Harrison, The Links that Bind Us: Network Visualizations http://blog.visual.ly/network-visualizations
  • 37. Visual Properties of Edges Lane Harrison, The Links that Bind Us: Network Visualizations http://blog.visual.ly/network-visualizations
  • 38.
  • 40. The Visual Analytics Mantra Overview First Zoom and Filter Details on Demand