Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

- Visualizing the Model Selection Pro... by Benjamin Bengfort 1776 views
- Data Product Architectures by Benjamin Bengfort 3328 views
- Visualizing Model Selection with Sc... by Benjamin Bengfort 3196 views
- An Interactive Visual Analytics Das... by Benjamin Bengfort 853 views
- A Fast and Dirty Intro to NetworkX ... by Lynn Cherny 51116 views
- Fast Data Analytics with Spark and ... by Benjamin Bengfort 28954 views

906 views

Published on

Published in:
Data & Analytics

No Downloads

Total views

906

On SlideShare

0

From Embeds

0

Number of Embeds

26

Shares

0

Downloads

27

Comments

0

Likes

3

No embeds

No notes for slide

- 1. Dynamics in Graph Analysis Adding Time as Structure for Visual and Statistical Insight Benjamin Bengfort @bbengfort District Data Labs
- 2. Are graphs effective for analytics? Or why use graphs at all?
- 3. Algorithm Performance More understandable implementations and native parallelism provide benefits particularly to machine learning. Visual Analytics Humans can understand and interpret interconnection structures, leading to immediate insights.
- 4. “Graph technologies ease the modeling of your domain and improve the simplicity and speed of your queries.” — Marko A. Rodriguez http://bit.ly/2cthd2L
- 5. Construction Given a set of [paths, vertices] is a [constraint] graph construction possible? Existence Does there exist a [path, vertex, set] within [constraints]? Optimization Given several [paths, subgraphs, vertices, sets] is one the best? Enumeration How many [vertices, edges] exist with [constraints], is it possible to list them?
- 6. Traversals
- 7. Property Graphs
- 8. How do you model time?
- 9. Relational Database
- 10. Time Properties
- 11. Time Modifies Traversal
- 12. Example of Time Filtered Traversal: Data Model Name: Emails Sent Network Number of nodes: 6,174 Number of edges: 343,702 Average degree: 111.339
- 13. def sent_range(g, before=None, after=None): # Create filtering function based on date range. def inner(edge): if before: return g.ep.sent[edge] < before if after: return g.ep.sent[edge] > after return inner def degree_filter(degree=0): # Create filtering function based on min degree. def inner(vertex): return vertex.out_degree() > degree return inner Example of Time Filtered Traversal
- 14. print("{} vertices and {} edges".format( g.num_vertices(), g.num_edges() )) # 6174 vertices and 343702 edges aug = sent_range(g, after=dateparse("Aug 1, 2016 09:00:00 EST") ) view = gt.GraphView(g, efilt=aug) view = gt.GraphView(view, vfilt=degree_filter()) print("{} vertices and {} edges".format( view.num_vertices(), view.num_edges() )) # 853 vertices and 24813 edges Example of Time Filtered Traversal
- 15. What makes a graph dynamic?
- 16. Time Structures Perform static analysis on dynamic components with time as a structure. Dynamic Graphs Multiple subgraphs representing the graph state at a discrete timestep.
- 17. Keyphrases over Time
- 18. Natural Language Graph Analysis: Data Ingestion
- 19. Natural Language Graph Analysis: Data Modeling Name: Baleen Keyphrase Graph Number of nodes: 2,682,624 Number of edges: 46,958,599 Average degree: 35.0095 Name: Sampled Keyphrase Graph Number of nodes: 139,227 Number of edges: 257,316 Average degree: 3.6964
- 20. def degree_filter(degree=0): def inner(vertex): return vertex.out_degree() > degree return inner g = gt.GraphView(g, vfilt=degree_filter(3)) Name: High Degree Phrase Graph Number of nodes: 8,520 Number of edges: 112,320 Average degree: 26.366 Natural Language Graph Analysis: Data Wrangling
- 21. Basic Keyphrase Graph Information Vertex Type Analysis Primarily keyphrases and documents. Degree Distribution Power laws distribution of degree.
- 22. Natural Language Graph Analysis: Data Wrangling def ego_filter(g, ego, hops=2): def inner(v): dist = gt.shortest_distance(g, ego, v) return dist <= hops return inner # Get a random document v = random.choice([ v for v in g.vertices() if g.vp.type[v] == 'document' ]) ego = gt.GraphView( g, vfilt=ego_filter(g,v, 1) )
- 23. The Centrality of Time
- 24. Extract Week of the Year as Time Structure # Construct Time Structures to Keyphrase h = gt.Graph(directed=False) h.gp.name = h.new_graph_property('string') h.gp.name = "Phrases by Week" # Add vertex properties h.vp.label = h.new_vertex_property('string') h.vp.vtype = h.new_vertex_property('string') # Create graph from the keyphrase graph for vertex in g.vertices(): if g.vp.type[vertex] == 'document': dt = g.vp.pubdate[vertex] weekno = dt.isocalendar()[1] week = h.add_vertex() h.vp.label[week] = "Week %d" % weekno h.vp.vtype[week] = 'week' for neighbor in vertex.out_neighbours(): if g.vp.type[neighbor] == 'phrase': phrase = h.add_vertex() h.vp.vtype[vidmap[phrase]] = 'phrase' h.add_edge(week, phrase)
- 25. PageRank Centrality A variant of Eigenvector centrality that has a scaling factor and prioritizes incoming links. Eigenvector Centrality A measure of relative influence where closeness to important nodes matters as much as other metrics. Degree Centrality A vertex is more important the more connections it has. E.g. “celebrity”. Betweenness Centrality How many shortest paths pass through the given vertex. E.g. how often is information flow through?
- 26. What are the central weeks and phrases? Betweenness Centrality Katz Centrality
- 27. Keyphrase Dynamics
- 28. Create Sequences of Time Ordered Subgraphs
- 29. Animating Dynamics
- 30. Network Visualization
- 31. Layout: Edge and Vertex Positioning Fruchterman Reingold SFDP (Yifan-Hu) Force Directed Radial Tree Layout by MST ARF Spring Block
- 32. Visual Properties of Vertices Lane Harrison, The Links that Bind Us: Network Visualizations http://blog.visual.ly/network-visualizations
- 33. Visual Properties of Edges Lane Harrison, The Links that Bind Us: Network Visualizations http://blog.visual.ly/network-visualizations
- 34. Visual Analysis
- 35. The Visual Analytics Mantra Overview First Zoom and Filter Details on Demand
- 36. Questions?

No public clipboards found for this slide

Be the first to comment