Confidential 1
Can you connect the
dots, like the the graph
below, without lifting up
your pencil?
Can you connect each
dot on the top to each
dot on the bottom, like
above, without crossing
lines?
There are pens at the sign-in table if you want to try these out
Silicon Valley | Silicon Harbor
Can You Trust The Internet?
An intro to graph invariants, computational
complexity, and graph algorithms
(with a little bit of cryptography)
Denise K. Gosnell, Ph.D.
Data Scientist, PokitDok
Confidential 3
Graph Theory?
G = (V,E)
Confidential 4
The Plan
1.  Graph Properties and Reddit
2.  Computational Complexity
P: Bipartite Graph Matching
NP: Graph Coloring
3. A little bit of RSA
Confidential 5
Why should you care?
1.  Everything is graph theoretic in nature.
2.  Much easier to code.
1.  or… you just want to be here to look cool.
Confidential 6
The Seven Bridges of Königsberg
"Konigsberg bridges" by Bogdan Giuşcă - Public domain (PD)
http://commons.wikimedia.org/wiki/File:Konigsberg_bridges.png#/media/File:Konigsberg_bridges.png
Confidential 7
The Seven Bridges of Königsberg:
Solved by Euler in 1735
Thank you Wikipedia: http://en.wikipedia.org/wiki/Seven_Bridges_of_Konigsberg
Confidential 8
Eulerian Paths
vs.
3
3
5 3
2
4 4
3 3
Confidential 9
screen shot from vizit: http://redditstuff.github.io/sna/vizit/
Silicon Valley | Silicon Harbor
P
Confidential 11
P
The set of all decision problems
that can be solved by a deterministic
Turing machine using a polynomial
amount of computational time
Confidential 12
Classes of Algorithm Complexity
O(1)
•  Accessing an element of an array
•  Determining if a number is even or odd
O(log(n))
•  Binary search
[eg: intelligently searching through a dictionary]
O(n)
•  Linear search: finding a min or max in an unsorted list
•  Graph Search
O(n2)
•  looking for a word in a word search
•  Bad sorting algorithms like bubble sort, insertion sort
…
Confidential 13
Graph Search: O(|V| + |E|)
Depth vs. Breadth
Confidential 14
Graph Matching
Given a bipartite graph G = (V,E),
a matching M in G is a set
of edges in which no two
edges share a common vertex
Confidential 15
The User Preferences Problem
Candidates à Jobs
Confidential 16
The User Preferences Problem
Goal: assign candidates to jobs to fill as many jobs as possible
Confidential 17
The User Preferences Problem
Greedy Algorithm:
Keep adding edges until no more edges
can be added
Confidential 18
The User Preferences Problem
Greedy Approach
Confidential 19
The User Preferences Problem
A better solution: via augmenting paths
Confidential 20
Augmenting Paths
•  A path P is M-alternating if the edges of the path alternate
between being in the matching M and not in the matching M.
•  A path P is M-augmenting if it is m-alternating
and the first and last edges are not in the matching M.
Confidential 21
Augmenting Paths
You can improve the matching M by:
1. Remove from the matching M the edges of the path P
that are in the matching M
2. Add to the matching M the edges of the path P that are
not in the matching M
3. This will have one more edge than in the matching
Confidential 22
The User Preferences Problem
Augmenting Path Example:
Confidential 23
The User Preferences Problem
Augmenting Path Example:
Confidential 24
Berge’s Theorem:
A matching in a graph is maximum
if and only if there does not exist an
augmenting path in a graph
Confidential 25
Maximal Matching by
Constructing the Auxiliary Graph
Confidential 26
Maximal Matching by
Constructing the Auxiliary Graph:
Theorem:
G has an augmenting path if and only if it
has a directed path from the source to
the sink in the auxiliary graph
Confidential 27
Maximal Matching by
Constructing the Auxiliary Graph:
Confidential 28
Maximal Matching Final Solution:
EdmondsAlgorithm(G):
M = empty matching!
while there is an augmenting path P for M!
!M = M +- P!
output M!
!
AugmentingPath(G,M):
G’ = Auxiliary graph for G, M!
P = Path from source to sink (via BFS)!
if P is null:!
!return false!
else:!
!delete s and t from P and return P!
Confidential 29
Graph Algorithms in P
•  Maximum (minimum) degree
•  Finding connected components
[BFS, DFS]
•  Pairwise shortest path algorithms
[Dijkstra’s, Bellman-Ford, Floyd-Warshall]
•  Diameter
•  Girth [shortest cycle]
•  Edge Covering Number
•  …
Silicon Valley | Silicon Harbor
NP
Confidential 31
NP
Nondeterministic Polynomial: the set
of all decision problems where a
“yes” instance can be verified by
a non-deterministic Turing machine
in polynomial time
Confidential 32
P à solvable in polynomial time
NP à verifiable in polynomial time
Confidential 33
Classic NP Problems:
Integer (prime?) Factorization
Graph Coloring (this isn’t what you think)
The Knapsack Problem
The Traveling Salesman
Confidential 34
Integer (prime?) Factorization
Confidential 35
Integer (prime?) Factorization
Factoring a 232 digit number
took over two years and utilized
hundreds of machines.
Paper: “Factorization of a 768-
bit RSA modulus”.
Kleinjung, et al. 2010.
Confidential 36
Graph Coloring
Minimum number of colors required
to color the vertices of G such that
no two adjacent vertices are the
same color
Confidential 37
Graph Coloring: Greedy approach
Confidential 38
Graph Coloring: Greedy approach
Confidential 39
Greedy Graph Coloring via a BFS:
ColorGraph(G,v):!
!colors = []; !
!let Q be a queue!
!Q.enqueue(v)!
!v.color = new color!
!while Q is not empty:!
! v ß Q.dequeue()!
! for all edges from v to w:!
! if w is not labeled as discovered!
! ! Q.enqueue(w)!
! ! label w as discovered!
! ! neighbor_colors = set of color assignments
! ! !of the neighbors of w!
! ! if neighbor_colors == colors:!
! ! w.color = new color, update colors!
! ! else: w.color = a color from !
! ! ! colors - neighbor_colors !
!!
! ! ! ! ! !!
! ! ! ! !!
Confidential 40
The Four Color Theorem
Every planar graph is four colorable.
1997: Thomas
Confidential 41
Non-Planar Graphs
Can you connect each
dot on the top to each
dot on the bottom (like
above), without
crossing lines?
Confidential 42
Finding non-planar graphs
K5 K3,3
Confidential 43
Graph Coloring
The chromatic number of a graph has a
constrained optimization version that is
impossible to approximate within any
constant factor unless P = NP.
-1996: Zuckerman
Silicon Valley | Silicon Harbor
About this P =? NP.
Confidential 45
The P versus NP Problem:
Essentially: can every problem whose
solution can be checked by a computer in
polynomial time also be solved by a
computer in polynomial time?
Formal Conjecture:
1971 by Stephen Cook
Confidential 46
Why should you care?
Integer (prime?) Factorization
Graph Coloring (this isn’t what you think)
The Knapsack Problem
The Traveling Salesman
Confidential 47
RSA:
1. Pick two extremely large prime numbers p and q
2. Public Key: (e,n) where:
n = p Ÿ q
e in [3, (p – 1)(q – 1)]
3. Private key: (d,n) where:
n = p Ÿ q
e Ÿ d = 1 mod ((p – 1)(q – 1))
Confidential 48
RSA:
The foundation of RSA’s security relies
upon the fact that given a composite
number, it is considered a hard problem
to determine it’s prime factors.
An NP problem, in fact.
Confidential 49
What just happened?
1.  Graph Properties and Reddit
2.  Computational Complexity
P: Bipartite Graph Matching
NP: Graph Coloring
3. A little bit of RSA
Confidential 50
Graph Theory Resources:
Introduction to Graph Theory 2nd Ed (West)
Introduction to Graph Theory (Chartrand)
Social Network Analysis (Wasserman)
Introduction to Algorithms 3rd Edition (Cormen, …, Stein)
… the Wikipedia pages aren’t too shabby.
Confidential 51
Links in the Presentation Notes:
Reddit Viz:
http://redditstuff.github.io/sna/vizit/#
YouTube Lecture on Graph Matching:
https://www.youtube.com/watch?v=NlQqmEXuiC8
Graph Matching Code:
http://www.geeksforgeeks.org/maximum-bipartite-matching/
RSA Detailed Example:
http://doctrina.org/How-RSA-Works-With-Examples.html
Factorization of a 768-bit RSA modulus:
http://eprint.iacr.org/2010/006.pdf
Confidential 52
Graph Tech Stack:
Databases:
Titan
Neo4J
OrientDB
Visualization:
Gephi
sigma.js
GraphViz
Graph Tech Stack:
Algorithm Libraries:
Spark
Gremlin (Gremthon)
Boost (c++)
JGraphT (java)
NetworkX
Python-Graph
… there are plenty more to
dive into. This is just a start.
Silicon Valley | Silicon Harbor
Can You Trust The Internet?
An intro to graph invariants, computational
complexity, and graph algorithms
(with a little bit of cryptography)
Denise K. Gosnell, Ph.D.
Data Scientist, PokitDok
T: @DeniseKGosnell
Confidential 54
Can you connect the
dots like the the graph
below without lifting up
your pencil?
yes.
Can you connect each
dot on the top to each
dot on the bottom (like
above), without
crossing lines?
no.

Can you trust the internet? An introduction to graph theory, computational complexity, and a little bit of RSA.

  • 1.
    Confidential 1 Can youconnect the dots, like the the graph below, without lifting up your pencil? Can you connect each dot on the top to each dot on the bottom, like above, without crossing lines? There are pens at the sign-in table if you want to try these out
  • 2.
    Silicon Valley |Silicon Harbor Can You Trust The Internet? An intro to graph invariants, computational complexity, and graph algorithms (with a little bit of cryptography) Denise K. Gosnell, Ph.D. Data Scientist, PokitDok
  • 3.
  • 4.
    Confidential 4 The Plan 1. Graph Properties and Reddit 2.  Computational Complexity P: Bipartite Graph Matching NP: Graph Coloring 3. A little bit of RSA
  • 5.
    Confidential 5 Why shouldyou care? 1.  Everything is graph theoretic in nature. 2.  Much easier to code. 1.  or… you just want to be here to look cool.
  • 6.
    Confidential 6 The SevenBridges of Königsberg "Konigsberg bridges" by Bogdan Giuşcă - Public domain (PD) http://commons.wikimedia.org/wiki/File:Konigsberg_bridges.png#/media/File:Konigsberg_bridges.png
  • 7.
    Confidential 7 The SevenBridges of Königsberg: Solved by Euler in 1735 Thank you Wikipedia: http://en.wikipedia.org/wiki/Seven_Bridges_of_Konigsberg
  • 8.
  • 9.
    Confidential 9 screen shotfrom vizit: http://redditstuff.github.io/sna/vizit/
  • 10.
    Silicon Valley |Silicon Harbor P
  • 11.
    Confidential 11 P The setof all decision problems that can be solved by a deterministic Turing machine using a polynomial amount of computational time
  • 12.
    Confidential 12 Classes ofAlgorithm Complexity O(1) •  Accessing an element of an array •  Determining if a number is even or odd O(log(n)) •  Binary search [eg: intelligently searching through a dictionary] O(n) •  Linear search: finding a min or max in an unsorted list •  Graph Search O(n2) •  looking for a word in a word search •  Bad sorting algorithms like bubble sort, insertion sort …
  • 13.
    Confidential 13 Graph Search:O(|V| + |E|) Depth vs. Breadth
  • 14.
    Confidential 14 Graph Matching Givena bipartite graph G = (V,E), a matching M in G is a set of edges in which no two edges share a common vertex
  • 15.
    Confidential 15 The UserPreferences Problem Candidates à Jobs
  • 16.
    Confidential 16 The UserPreferences Problem Goal: assign candidates to jobs to fill as many jobs as possible
  • 17.
    Confidential 17 The UserPreferences Problem Greedy Algorithm: Keep adding edges until no more edges can be added
  • 18.
    Confidential 18 The UserPreferences Problem Greedy Approach
  • 19.
    Confidential 19 The UserPreferences Problem A better solution: via augmenting paths
  • 20.
    Confidential 20 Augmenting Paths • A path P is M-alternating if the edges of the path alternate between being in the matching M and not in the matching M. •  A path P is M-augmenting if it is m-alternating and the first and last edges are not in the matching M.
  • 21.
    Confidential 21 Augmenting Paths Youcan improve the matching M by: 1. Remove from the matching M the edges of the path P that are in the matching M 2. Add to the matching M the edges of the path P that are not in the matching M 3. This will have one more edge than in the matching
  • 22.
    Confidential 22 The UserPreferences Problem Augmenting Path Example:
  • 23.
    Confidential 23 The UserPreferences Problem Augmenting Path Example:
  • 24.
    Confidential 24 Berge’s Theorem: Amatching in a graph is maximum if and only if there does not exist an augmenting path in a graph
  • 25.
    Confidential 25 Maximal Matchingby Constructing the Auxiliary Graph
  • 26.
    Confidential 26 Maximal Matchingby Constructing the Auxiliary Graph: Theorem: G has an augmenting path if and only if it has a directed path from the source to the sink in the auxiliary graph
  • 27.
    Confidential 27 Maximal Matchingby Constructing the Auxiliary Graph:
  • 28.
    Confidential 28 Maximal MatchingFinal Solution: EdmondsAlgorithm(G): M = empty matching! while there is an augmenting path P for M! !M = M +- P! output M! ! AugmentingPath(G,M): G’ = Auxiliary graph for G, M! P = Path from source to sink (via BFS)! if P is null:! !return false! else:! !delete s and t from P and return P!
  • 29.
    Confidential 29 Graph Algorithmsin P •  Maximum (minimum) degree •  Finding connected components [BFS, DFS] •  Pairwise shortest path algorithms [Dijkstra’s, Bellman-Ford, Floyd-Warshall] •  Diameter •  Girth [shortest cycle] •  Edge Covering Number •  …
  • 30.
    Silicon Valley |Silicon Harbor NP
  • 31.
    Confidential 31 NP Nondeterministic Polynomial:the set of all decision problems where a “yes” instance can be verified by a non-deterministic Turing machine in polynomial time
  • 32.
    Confidential 32 P àsolvable in polynomial time NP à verifiable in polynomial time
  • 33.
    Confidential 33 Classic NPProblems: Integer (prime?) Factorization Graph Coloring (this isn’t what you think) The Knapsack Problem The Traveling Salesman
  • 34.
  • 35.
    Confidential 35 Integer (prime?)Factorization Factoring a 232 digit number took over two years and utilized hundreds of machines. Paper: “Factorization of a 768- bit RSA modulus”. Kleinjung, et al. 2010.
  • 36.
    Confidential 36 Graph Coloring Minimumnumber of colors required to color the vertices of G such that no two adjacent vertices are the same color
  • 37.
  • 38.
  • 39.
    Confidential 39 Greedy GraphColoring via a BFS: ColorGraph(G,v):! !colors = []; ! !let Q be a queue! !Q.enqueue(v)! !v.color = new color! !while Q is not empty:! ! v ß Q.dequeue()! ! for all edges from v to w:! ! if w is not labeled as discovered! ! ! Q.enqueue(w)! ! ! label w as discovered! ! ! neighbor_colors = set of color assignments ! ! !of the neighbors of w! ! ! if neighbor_colors == colors:! ! ! w.color = new color, update colors! ! ! else: w.color = a color from ! ! ! ! colors - neighbor_colors ! !! ! ! ! ! ! !! ! ! ! ! !!
  • 40.
    Confidential 40 The FourColor Theorem Every planar graph is four colorable. 1997: Thomas
  • 41.
    Confidential 41 Non-Planar Graphs Canyou connect each dot on the top to each dot on the bottom (like above), without crossing lines?
  • 42.
  • 43.
    Confidential 43 Graph Coloring Thechromatic number of a graph has a constrained optimization version that is impossible to approximate within any constant factor unless P = NP. -1996: Zuckerman
  • 44.
    Silicon Valley |Silicon Harbor About this P =? NP.
  • 45.
    Confidential 45 The Pversus NP Problem: Essentially: can every problem whose solution can be checked by a computer in polynomial time also be solved by a computer in polynomial time? Formal Conjecture: 1971 by Stephen Cook
  • 46.
    Confidential 46 Why shouldyou care? Integer (prime?) Factorization Graph Coloring (this isn’t what you think) The Knapsack Problem The Traveling Salesman
  • 47.
    Confidential 47 RSA: 1. Picktwo extremely large prime numbers p and q 2. Public Key: (e,n) where: n = p Ÿ q e in [3, (p – 1)(q – 1)] 3. Private key: (d,n) where: n = p Ÿ q e Ÿ d = 1 mod ((p – 1)(q – 1))
  • 48.
    Confidential 48 RSA: The foundationof RSA’s security relies upon the fact that given a composite number, it is considered a hard problem to determine it’s prime factors. An NP problem, in fact.
  • 49.
    Confidential 49 What justhappened? 1.  Graph Properties and Reddit 2.  Computational Complexity P: Bipartite Graph Matching NP: Graph Coloring 3. A little bit of RSA
  • 50.
    Confidential 50 Graph TheoryResources: Introduction to Graph Theory 2nd Ed (West) Introduction to Graph Theory (Chartrand) Social Network Analysis (Wasserman) Introduction to Algorithms 3rd Edition (Cormen, …, Stein) … the Wikipedia pages aren’t too shabby.
  • 51.
    Confidential 51 Links inthe Presentation Notes: Reddit Viz: http://redditstuff.github.io/sna/vizit/# YouTube Lecture on Graph Matching: https://www.youtube.com/watch?v=NlQqmEXuiC8 Graph Matching Code: http://www.geeksforgeeks.org/maximum-bipartite-matching/ RSA Detailed Example: http://doctrina.org/How-RSA-Works-With-Examples.html Factorization of a 768-bit RSA modulus: http://eprint.iacr.org/2010/006.pdf
  • 52.
    Confidential 52 Graph TechStack: Databases: Titan Neo4J OrientDB Visualization: Gephi sigma.js GraphViz Graph Tech Stack: Algorithm Libraries: Spark Gremlin (Gremthon) Boost (c++) JGraphT (java) NetworkX Python-Graph … there are plenty more to dive into. This is just a start.
  • 53.
    Silicon Valley |Silicon Harbor Can You Trust The Internet? An intro to graph invariants, computational complexity, and graph algorithms (with a little bit of cryptography) Denise K. Gosnell, Ph.D. Data Scientist, PokitDok T: @DeniseKGosnell
  • 54.
    Confidential 54 Can youconnect the dots like the the graph below without lifting up your pencil? yes. Can you connect each dot on the top to each dot on the bottom (like above), without crossing lines? no.