Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

This is a novice-track talk, so all concepts and examples are kept simple
1. Basic graph theory concepts and definitions
2. A few real-world scenarios framed as graph data
3. Working with graphs in Python
The overall goal of this talk is to spark your interest in and show you what’s
out there as a jumping off point for you to go deeper

Graph: “A structure amounting to a set of objects in which some
pairs of the objects are in some sense ‘related’. The objects
correspond to mathematical abstractions called vertices (also called
nodes or points) and each of the related pairs of vertices is called an
edge (also called an arc or line)” – Richard Trudeau, Introduction to
Graph Theory (1st edition, 1993)
Graph Analytics: “Analysis of data structured as a graph
(sometimes also part of network analysis or link analysis depending
on scope and context)” – Me, talking to a stress ball as I made these
slides

• We see two vertices joined by
a single edge
• Vertex 1 is adjacent to vertex 2
• The neighborhood of vertex 1
is all adjacent vertices (vertex
2 in this case)

• We see that there is a loop on
vertex a
• Vertices a and b have multiple
edges between them
• Vertex c has a degree of 3
• There exists a path from vertex a
to vertex e
• Vertices f, g, and h form a 3-
cycle

• We have no single cut vertex or cut
edge (one that would create more
disjoint vertex/edge sets if
removed)
• We can separate this graph into two
disconnected sets:
1) Vertex Set 1 = {a, b, c, d, e}
2) Vertex Set 2 = {f, g, h}

• Imagine symmetric vertex
labels along the top and
left hand sides of the
matrix
• A one in a particular slot
tells us that the two
vertices are adjacent

• In this graph two vertices are
joined by a single directed
edge
• There is a dipath from vertex 1
to vertex 2 but not from vertex
2 to vertex 1

• Every vertex has ‘played’ every
other vertex
• We can see that there is no clear
winner (every vertex has
indegree and outdegree of 2)

• Vertices from Set 1 = {a, b, c, d} are
only adjacent to vertices from Set 2
= {e, f, g, h}
• This can be extended to tripartite
graphs (3 sets) or as many sets as we
like (n-partite graphs)
• Can we pair vertices from each set
together?

We can pair every vertex
from one set to a vertex
from the other using only
existing edges

• We can assign weights to edges
of a graph
• As we follow a path through the
graph, these weights accumulate
• For example, the path a -
> b -> c has an associated
weight of 0.5 + 0.4 = 0.9

• We can assign colors to vertices
• The graph we see here has a
proper coloring (no two vertices
of the same color are adjacent)
• We can also color edges!

• Are we focused more on objects or the relationships/interactions
between them?
• Are we looking at transition states?
• Is orientation important?
If you can imagine a graph to represent it, it’s probably worth giving it a
shot, if only for your own learning and exploration!

• If the lines represent
connections, what can we say
about the people highlighted
in red?
• What kinds of questions might
a graph be able to answer?

• e and d have the highest
degree
• What might the c-d-e cycle
tell us?
• What can we say about cut
vertices?

If we have page view
data with timestamps
how might we
represent this as a
graph?

• What might loops or multiple edges
between vertices represent?
• What types of data might we want to
use as values on the edges?
• What might comparing indegrees and
outdegrees on different vertices
represent?

If we have to regularly pick up a
load at the train station, make
deliveries to every factory and
then return to the garage how can
a graph help us find an optimal
route?

• We can assign weights to each edge to
represent distance, travel time, gas cost
for the distance, etc
• The path with the lowest total weight
represents the
shortest/cheapest/fastest/etc
• Note that edge weights are only
displayed for f-e and f-a

If the following people want to
attend the following talks (a-h),
what’s the minimum number of
sessions we need to satisfy
everyone?

• We can use the talks as
vertices and add edges
between talks that have the
same person interested
• The minimum number of
colors needed for a proper
coloring shows us the
minimum number of
sessions we need to satisfy
everyone

https://github.com/igraph/python-igraph https://github.com/networkx

• GraphML (XML-based)
• GML (ASCII-based)
• NetworkX has built in functions to work with a Pandas DataFrame or a
NumPy array/matrix

import networkx as nx
import matplotlib.pyplot as plt
G = nx.Graph()
vertices = []
for x in range(1, 6):
vertices.append(x)
G.add_nodes_from(vertices)
G.add_edges_from([(1, 2), (2, 3), (5, 4),
(4, 2), (1, 3), (5, 1), (5, 2), (3, 4)])
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=20)
nx.draw_networkx_edges(G, pos, width=5)
nx.draw_networkx_labels(G, pos,
font_size=14)
nx.draw(G, pos)
plt.show()

G = nx.Graph()
G.add_nodes_from(['a', 'b', 'c'])
G.add_edge('a', 'b', weight=0.5)
G.add_edge('b', 'c', weight=0.2)
G.add_edge('c', 'a', weight=0.7)
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=500)
nx.draw_networkx_edges(G, pos, width=6)
nx.draw_networkx_labels(G, pos, font_size=14)
nx.draw_networkx_edge_labels(G, pos,
font_size=14)
nx.draw(G, pos)
plt.show()

>>> G.nodes()
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,
12, 13, 14, 15, 16, 17, 18, 19, 20]
>>> nx.shortest_path(G, 1, 18)
[1, 3, 18]
>>> G.degree()
{1: 4, 2: 3, 3: 4, 4: 4, 5: 4, 6: 3,
7: 3, 8: 3, 9: 4, 10: 3, 11: 2,
12: 2, 13: 2, 14: 4, 15: 3, 16: 3,
17: 2, 18: 3, 19: 3, 20: 3}

>>> nx.greedy_color(G)
{'d': 0, 'a': 0, 'e': 1, 'b': 1,
'c': 1, 'f': 2, 'h': 1, 'g': 0}
>>> temp = nx.greedy_color(G)
>>> len(set(temp.values()))
3

G = nx.DiGraph([(1, 2), (1, 3), (4, 1),
(1, 5), (2, 3), (2, 4), (2, 5), (3, 4),
(3, 5), (4, 5)])
pos = nx.circular_layout(G)
nx.draw_networkx_nodes(G, pos,
node_size=200)
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos,
fontsize=14)
>>> nx.has_path(G, 1, 5)
True
>>> nx.has_path(G, 5, 1)
False
>>> nx.shortest_path(G, 1, 4)
[1, 2, 4]

>>> nx.maximal_matching(G)
{(1, 4), (5, 2), (6, 3)}

• There’s a NetworkX tutorial tomorrow!
• In-browser Graphviz: webgraphviz.com
• Free graph theory textbook: An Introduction to Combinatorics and
Graph Theory, David Guichard
• Open problems in graph theory: openproblemgarden.org
• Graph databases
• Association for Computational Linguistics (ACL) 2010 Workshop on
Graph-based Methods for Natural Language Processing
• Free papers: researchgate.net

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma

Similar to Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma (20)

More from PyData

More from PyData (20)

Recently uploaded

Recently uploaded (20)

Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma