Network analyses are powerful methods for both visual analytics and machine learning but can suffer as their complexity increases. By embedding time as a structural element rather than a property, we will explore how time series and interactive analysis can be improved on Graph structures. Primarily we will look at decomposition in NLP-extracted concept graphs using NetworkX and Graph Tool.
3. Algorithm Performance
More understandable implementations
and native parallelism provide benefits
particularly to machine learning.
Visual Analytics
Humans can understand and interpret
interconnection structures, leading to
immediate insights.
4. “Graph technologies ease the modeling of
your domain and improve the simplicity
and speed of your queries.”
— Marko A. Rodriguez
http://bit.ly/2cthd2L
5. Construction
Given a set of [paths,
vertices] is a [constraint]
graph construction
possible?
Existence
Does there exist a [path,
vertex, set] within
[constraints]?
Optimization
Given several [paths,
subgraphs, vertices, sets] is
one the best?
Enumeration
How many [vertices, edges]
exist with [constraints], is it
possible to list them?
13. Example of Time Filtered Traversal: Data Model
Name: Emails Sent Network
Number of nodes: 6,174
Number of edges: 343,702
Average degree: 111.339
14. def sent_range(g, before=None, after=None):
# Create filtering function based on date range.
def inner(edge):
if before:
return g.ep.sent[edge] < before
if after:
return g.ep.sent[edge] > after
return inner
def degree_filter(degree=0):
# Create filtering function based on min degree.
def inner(vertex):
return vertex.out_degree() > degree
return inner
Example of Time Filtered Traversal
15. print("{} vertices and {} edges".format(
g.num_vertices(), g.num_edges()
))
# 6174 vertices and 343702 edges
aug = sent_range(g,
after=dateparse("Aug 1, 2016 09:00:00 EST")
)
view = gt.GraphView(g, efilt=aug)
view = gt.GraphView(view, vfilt=degree_filter())
print("{} vertices and {} edges".format(
view.num_vertices(), view.num_edges()
))
# 853 vertices and 24813 edges
Example of Time Filtered Traversal
17. Time Structures
Perform static analysis on dynamic
components with time as a structure.
Dynamic Graphs
Multiple subgraphs representing the
graph state at a discrete timestep.
22. Natural Language Graph Analysis: Data Modeling
Name: Baleen Keyphrase Graph
Number of nodes: 2,682,624
Number of edges: 46,958,599
Average degree: 35.0095
Name: Sampled Keyphrase Graph
Number of nodes: 139,227
Number of edges: 257,316
Average degree: 3.6964
23. def degree_filter(degree=0):
def inner(vertex):
return vertex.out_degree() > degree
return inner
g = gt.GraphView(g, vfilt=degree_filter(3))
Name: High Degree Phrase Graph
Number of nodes: 8,520
Number of edges: 112,320
Average degree: 26.366
Natural Language Graph Analysis: Data Wrangling
24. Basic Keyphrase Graph Information
Vertex Type Analysis
Primarily keyphrases and documents.
Degree Distribution
Power laws distribution of degree.
25. Natural Language Graph Analysis: Data Wrangling
def ego_filter(g, ego, hops=2):
def inner(v):
dist = gt.shortest_distance(g, ego, v)
return dist <= hops
return inner
# Get a random document
v = random.choice([
v for v in g.vertices()
if g.vp.type[v] == 'document'
])
ego = gt.GraphView(
g, vfilt=ego_filter(g,v, 1)
)
28. Extract Week of the Year as Time Structure
# Construct Time Structures to Keyphrase
h = gt.Graph(directed=False)
h.gp.name = h.new_graph_property('string')
h.gp.name = "Phrases by Week"
# Add vertex properties
h.vp.label = h.new_vertex_property('string')
h.vp.vtype = h.new_vertex_property('string')
# Create graph from the keyphrase graph
for vertex in g.vertices():
if g.vp.type[vertex] == 'document':
dt = g.vp.pubdate[vertex]
weekno = dt.isocalendar()[1]
week = h.add_vertex()
h.vp.label[week] = "Week %d" % weekno
h.vp.vtype[week] = 'week'
for neighbor in vertex.out_neighbours():
if g.vp.type[neighbor] == 'phrase':
phrase = h.add_vertex()
h.vp.vtype[vidmap[phrase]] = 'phrase'
h.add_edge(week, phrase)
29. PageRank Centrality
A variant of Eigenvector
centrality that has a scaling
factor and prioritizes
incoming links.
Eigenvector Centrality
A measure of relative
influence where closeness
to important nodes matters
as much as other metrics.
Degree Centrality
A vertex is more important
the more connections it
has. E.g. “celebrity”.
Betweenness Centrality
How many shortest paths
pass through the given
vertex. E.g. how often is
information flow through?
30. What are the central weeks and phrases?
Betweenness Centrality Katz Centrality