Graphs for AI and ML
(a personal journey)
Dr. Jim Webber
Chief Scientist, Neo4j
● Some no-BS definitions
● Social history
● Accidental Skynet
● Diapers and beer
● Graph theory
● Contemporary graph ML
● The future of graph AI
Overview
● ML - Machine Learning
○ Finding functions from historical data to guide future
interactions within a given domain
● AI - Artificial Intelligence
● The property of a system that it appears intelligent to its users
● Often, but not always, using ML techniques
● Or ML implementations that can be cheaply retrained to address
neighboring domains
A Bluffer’s Guide to AI-cronyms
● Predictive analytics
● Use past data to predict the future
● General purpose AI
● ML with transfer learning such that learned experiences in one
domain can be applied elsewhere
● Human-like AI
Often conflated with
ML all the things
What we do today
Extract all the features!
• What do we do? Turn it to
vectors and pump it through a
classification or regression
model
• That’s actually not a bad
thing
• But we can do so much before
we even get to ML
• If we have graph data
Take a step back
We can be smarter about this
Realtime Predictive Analytics
(circa 2008)
+ +
=
Not AI, but extremely effective
Credit: https://medium.com/basecs/breaking-down-breadth-first-search-cebe696709d9
Credit:
https://www.networkworld.com/article/3211410
/lan-wan/the-10-most-powerful-companies-in-
enterprise-networking.html
Toolkit matures into
proper database
• Cypher and Neo4j server make
real time graph analytical
patterns simple to apply
• Amazing and humane to
implement
Firstname:
Mickey
Surname: Smith
DoB: 19781006
SKU: 5e175641
Product:
Badgers
Nadgers Ale
SKU: 2555f258
Product:
Peewee Pilsner
Category: beer
SKU: 49d102bc
Product: Baby
Dry Nights
Category:
nappies
Category: baby Category:
alcoholic
drinks
SKU: 49d102bc
Product: XBox
360
Category:
consumer
electronics
Category:
console
BOUGHTBOUGHT
MEMBER_OF
MEMBER_OFMEMBER_OF
MEMBER_OFMEMBER_OF
Firstname: *
Surname: *
DoB: 1996 > x
> 1972
Category: beerCategory:
nappies
BOUGHTCategory: game
console
Young fathers pattern
Firstname: *
Surname: *
DoB: 1996 > x
> 1972
Category: beerCategory:
nappies
!BOUGHTCategory: game
console
Business opportunity
(beer)(nappies)
(console)
(daddy)
() ()
()
(d)-[:BOUGHT]->()-[:MEMBER_OF]->(n)
(d)-[:BOUGHT]->()-[:MEMBER_OF]->(b)
(d)-[:BOUGHT]->()-[:MEMBER_OF]->(c)
Flatten the graph
(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category)
(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category)
(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(c:Category)
Include any labels
MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),
(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category)
Add a MATCH clause
MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),
(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category),
(c:Category)
WHERE NOT((d)-[:BOUGHT]->()-[:MEMBER_OF]->(c))
Constrain the Pattern
MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),
(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category),
(c:Category)
WHERE n.category = "nappies" AND
b.category = "beer" AND
c.category = "console" AND
NOT((d)-[:BOUGHT]->()-[:MEMBER_OF]->(c))
Add property constraints
MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category),
(d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category),
(c:Category)
WHERE n.category = "nappies" AND
b.category = "beer" AND
c.category = "console" AND
NOT((d)-[:BOUGHT]->()-[:MEMBER_OF]->(c))
RETURN DISTINCT d AS daddy
Profit!
==> +---------------------------------------------+
==> | daddy |
==> +---------------------------------------------+
==> | Node[15]{name:"Rory Williams",dob:19880121} |
==> +---------------------------------------------+
==> 1 row
==> 0 ms
==>
neo4j-sh (0)$
Results
Which sushi restaurants
in NYC do my friends
like?
Facebook Graph Search
See http://maxdemarzi.com/
Graph Structure
Simple Query, Intelligent Results
MATCH (:Person {name: 'Jim'})
-[:IS_FRIEND_OF]->(:Person)
-[:LIKES]->(restaurant:Restaurant)
-[:LOCATED_IN]->(:Place {location: 'New York'}),
(restaurant)-[:SERVES]->(:Cuisine {cuisine: 'Sushi'})
RETURN restaurant
Search structure
Graph Theory
• Rich knowledge of how graphs
operate in many domains
• Off the shelf algorithms to
process those graphs for
information, insight, predictions
• Low barrier to entry
• Amazingly powerful
Triadic Closure
name: Kyle
name: Stan name: Kenny
Triadic Closure
name: Kyle
name: Stan name: Kenny
name: Kyle
name: Stan name: Kenny
FRIEND
Structural Balance
name:
Cartman
name: Craig name: Tweek
Structural Balance
name:
Cartman
name: Craig name: Tweek
name:
Cartman
name: Craig name: Tweek
FRIEND
Structural Balance
name:
Cartman
name: Craig name: Tweek
name:
Cartman
name: Craig name: Tweek
ENEMY
Structural Balance
name: Kyle
name: Stan name: Kenny
name: Kyle
name: Stan name: Kenny
FRIEND
Structural Balance is a key
predictive technique
And it’s domain-agnostic
Allies and Enemies
UK
GermanyFrance
Russia Italy
Austria
Allies and Enemies
UK
GermanyFrance
Russia Italy
Austria
Allies and Enemies
UK
GermanyFrance
Russia Italy
Austria
Allies and Enemies
UK
GermanyFrance
Russia Italy
Austria
Allies and Enemies
UK
GermanyFrance
Russia Italy
Austria
Allies and Enemies
UK
GermanyFrance
Russia Italy
Austria
Predicting WWI
[Easley and Kleinberg]
• Relationships can have “strength” as well as intent
• Think: weighting on a relationship in a property graph
• Weak links play another important structural role in graph theory
• They bridge neighborhoods
Weak relationships
Triadic Closure
(weak relationship)
name: Kenny
name: Stan name: Cartman
Triadic Closure
(weak relationship)
name: Kenny
name: Stan name: Cartman
name: Kenny
name: Stan name: Cartman
FRIEND 50%
Local Bridges
FRIEND
name: Kenny
name: Stanname: Kyle
FRIEND
FRIEND
name: Sally
name: Bebename: Wendy
FRIEND
FRIEND 50%
name:
Cartman
FRIEND
ENEMY
“If a node A in a network satisfies the Strong Triadic Closure Property
and is involved in at least two strong relationships, then any local
bridge it is involved in must be a weak relationship.”
[Easley and Kleinberg]
Local Bridge Property
University Karate Club
• (NP) Hard problem
• Repeatedly remove the spanning links between dense regions
• Or recursively merge nodes into ever larger “subgraph” nodes
• Choose your algorithm carefully – some are better than others for
a given domain
• Can use to (almost exactly) predict the break up of the karate club!
Graph Partitioning
University Karate Clubs
(predicted by Graph Theory)
9
University Karate Clubs
(what actually happened!)
• Label Propagation
• Union Find / Weakly Connected Components
• Strongly Connected Components
• Triangle-Count / Clustering Coefficient
ClusteringCentrality
• PageRank
• Betweeness
• Closeness
• Degree
Path Finding
• Breadth-first search
• Depth-first search
• Single-source shortest path
• All-pairs shortest path
• Minimum weight spanning
tree
Graph Algorithms in Neo4j
Amazing Native Graph Performance
Credit: https://reezocar.blob.core.windows.net/blog/2015/09/k2000.jpg
Find and stop spammers
Extract graph structure over time
Not message content!
(Fakhraei et al, KDD 2015)
Learning to stop bad guys
Result: find and classify 70% spammers with 90% accuracy
Much of modern graph ML is still about turning graphs to vectors
Graph2Vec and friends
Highly complementary techniques
Mixing structural data and features gives better results
Better data into the model, better results out
But we don’t have to always vectorize graphs...
Graph ML
Knowledge Graphs
• Semantic domain knowledge for
inference and understanding
• E.g. eBay shopbot
• What’s the next best question to ask
when a potential customer says they
want a bag?
• Price? Function? Color?
• Depends on context! Demographic,
history, user journey.
• Richly connected data makes the
system seem intelligent
• But it’s “just” data and algorithms in
reality
Graph Convolutional
Neural Networks
A general architecture for
predicting node and relationship
attributes in graphs.
(Kipf and Welling, ICLR 2017)
Credit: Andrew Docherty (CSIRO), YowData 2017
https://www.youtube.com/watch?v=Gmxz41L70Fg
Graph Networks for
Structured Causal Models
• Position paper from Google,
MIT, Edinburgh
• Structured representations and
computations (graphs) are key
• Goal: generalize beyond direct
experience
• Like human infants can
https://arxiv.org/pdf/1806.01261.pdf
credit: @markhneedham
Q&A
Dr. Jim Webber
Chief Scientist, Neo4j

Graphs for AI & ML, Jim Webber, Neo4j

Editor's Notes

  • #2 This is a talk about my history with graphs and AI. It is peppered with surprises, and inflexion points, and anecdotes. But what we derive from this, is that we’ve had graph data and algorithms
  • #4 ML - this is what nerds do. Sometimes ML is so compelling that it seems intelligent, but in reality it’s data and algorithms all the way down. AI - train a system to classify animals, might also work on shoes. See: hot dog; not hot dog! GP-AI - systems like AlphaGo might be an architecture to support this in future, but we’re not there today
  • #5 GP-AI - systems like AlphaGo might be an architecture to support this in future, but we’re not there today
  • #7 Here’s where we are mostly today. Row-oriented data. Maybe some documents, maybe some columns, but mostly rows of data from arcane data models.
  • #10 All the way back to Fall 2008 Perhaps some of you in finance remember that period, right?
  • #11 November 2007 met Emil at Øredev in Malmö Sweden Java and Maven build-your-own-DBMS toolkit called Neo4j Java Core API only Long afternoon of loading data and writing a recommendation query...
  • #13 Find the current customer Find things they own Find things that depend on the things they own Sell Repeat All we did at first was understand the dependencies between products and bundles. We never tried to upsell something incompatible. Never tried to sell them something they already owned. Never undersold them. And it opened a world of possibilities to combine other graphs: demographic, social, geographical, municipal, network...
  • #14 Unexpectedly Powerful Solved a problem in a long afternoon was meant to take years with OTS software Applied same pattern to PoS retail recommendations, fraud detection… in subsequent months Still amazed! Effect: join Neo4j as Chief Scientist in 2010.
  • #15 Realtime retail recommendations. Historical anecdote about beer and nappies.
  • #16 We had a data model Some of it taxonomical Some of it stock-centric. Some transactional
  • #17 START n=node(*) MATCH n-[r?]->() DELETE n,r CREATE (daddy1:Person { name: 'Mickey Smith', dob: 19781006 }) CREATE (alcohol:Category { category : 'alcoholic drinks'}) CREATE (beer:Category { category : 'beer'}) CREATE beer-[:MEMBER_OF]->alcohol CREATE (peeweePilsner:Product { sku: '2555f258', product: 'Peewee Pilsner' }) CREATE (badgersNadgers:Product { sku: '5e175641', product: 'Badgers Nadgers Ale' }) CREATE peeweePilsner-[:MEMBER_OF]->beer CREATE badgersNadgers-[:MEMBER_OF]->beer CREATE daddy1-[:BOUGHT]->peeweePilsner CREATE daddy1-[:BOUGHT]->badgersNadgers CREATE (baby:Category { category: 'baby' }) CREATE (nappies:Category { category: 'nappies' }) CREATE nappies-[:MEMBER_OF]->baby CREATE (babyDryNights:Product { sku: '49d102bc', product: 'Baby Dry Nights'}) CREATE babyDryNights-[:MEMBER_OF]->nappies CREATE daddy1-[:BOUGHT]->babyDryNights CREATE (consumerElectronics:Category { category: 'consumer electronics' }) CREATE (console:Category { category: 'console' }) CREATE (xbox:Product { sku: '49d102bc', product: 'XBox 360' }) CREATE xbox-[:MEMBER_OF]->(console)-[:MEMBER_OF]->consumerElectronics CREATE daddy1-[:BOUGHT]->xbox CREATE (mummy1:Person { name: 'Rose Tyler', dob: 19800317 }) CREATE (wine:Product { sku:'3a3f22bc', product: 'Shiraz' }) CREATE wine-[:MEMBER_OF]->alcohol CREATE mummy1-[:BOUGHT]->wine CREATE mummy1-[:BOUGHT]->babyDryNights CREATE (daddy2:Person { name: 'Rory Williams', dob: 19880121 }) CREATE daddy2-[:BOUGHT]->peeweePilsner CREATE daddy2-[:BOUGHT]->babyDryNights // Cypher 1.0 query START beer=node(2), nappies=node(7), xbox=node(11) MATCH (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(beer), (daddy)-[:BOUGHT]->()-[:MEMBER_OF]->(nappies), (daddy)-[b?:BOUGHT]->(xbox) WHERE b is null RETURN distinct daddy // Cypher 2.0 query MATCH (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(n:Category), (d:Person)-[:BOUGHT]->()-[:MEMBER_OF]->(b:Category), (c:Category) WHERE n.category = "nappies" AND b.category = "beer" AND c.category = "console" AND NOT((d)-[:BOUGHT]->()-[:MEMBER_OF]->(c)) RETURN DISTINCT d AS daddy
  • #18 The insight here is that we have a typical young father who buys beer, nappies and a game console simply by reducing subgraph We have a pattern to search for
  • #19 Now we look for young fathers – implied by beer and nappies purchases – who haven’t bought a game console.
  • #20 Turn it to text. And…
  • #23 Neo4j 2.0: MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType) WHERE n.name = "nappies" AND b.name = "Beer" AND x.name = "Xbox" AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x)) RETURN u
  • #24 Neo4j 2.0: MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType) WHERE n.name = "nappies" AND b.name = "Beer" AND x.name = "Xbox" AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x)) RETURN u
  • #25 Neo4j 2.0: MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType) WHERE n.name = "nappies" AND b.name = "Beer" AND x.name = "Xbox" AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x)) RETURN u
  • #26 Neo4j 2.0: MATCH (u:User), (n:ProductType), (b:ProductType), (x:ProductType) WHERE n.name = "nappies" AND b.name = "Beer" AND x.name = "Xbox" AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(n) AND (u)-[:BOUGHT]->()<-[:MEMBER_OF]-(b) AND NOT((u)-[:BOUGHT]->()<-[:MEMBER_OF]-(x)) RETURN u
  • #32 This is fast: query latency is proportional to the amount of graph searched
  • #33 Now called “network science”
  • #34 First we need to talk about some local properties
  • #35 A triadic closure is a local property of (social) graphs whereby if two nodes are connected via a path involving a third node, there is an increased likelihood that the two nodes will become directly connected in future. This is a familiar enough situation for us in a social setting whereby if we happen to be friends with two people, ultimately there's an increased chance that those people will become direct friends too, since by being our friend in the first place, it's an indication of social similarity and suitability. It’s called triadic closure, because we try to close the triangle.
  • #36 We see this all the time – it’s likely that if we have two friends, that they will also become at least acquaintances and potentially friends themselves! In general, if a node A has relationships to B & C then the relationship between B&C is likely to form – especially if the existing relationships are both strong. This is an incredibly strong assertion and will not be typically upheld by all subgraphs in a graph. Nonetheless it is sufficiently commonplace (particularly in social networks) to be trusted as a predictive aid.
  • #37 Sentiment plays a role in how closures form too – there is a notion of balance.
  • #38 From a triadic closure perspective this is OK, but intuitively it seems odd. Cartman’s friends shouldn’t be friends with his enemies. Nor should Cartman’s enemies be friends with his friends.
  • #39 This makes sense – Cartman’s friend Craig is also an enemy of Cartman’s enemy Tweek Two negative sentiments and one positive sentiment is a balanced structure – and it makes sense too since we gang up with our friends on our poor beleaguered enemy
  • #40 Another balanced – and more pleasant – arrangement is for three positive sentiments, in this case mutual friends.
  • #42 A starting point for a network of friends and enemies Red links indicate enemy of relationship Black links indicate friend of relationship The Three Emperor’s league
  • #43 Italy forms the with Austria and Germany – a balanced +++ triadic closure If Italy had made only a single alliance (or enemy) it would have been unstable and another relationship would be likely to form anyway! Triple Alliance
  • #44 Russia becomes hostile to Austria and Germany – a balance --+ d triadic closure becomes agnostic towards France. German-Russian Lapse
  • #45 The French and Russians ally, forming a balanced --+ triadic closure with the UK French-Russian Alliance
  • #46 The UK and France enter into the famous Entente Cordiale This produces an unbalanced ++- triadic closure with Russia, and the graph doesn’t like it.
  • #47 The British and Russians form an alliance, thereby changing their previously unbalanced triadic closure into a balanced one. Other local pressures on the graph make other closures form. Italy becomes hostile to Russia, forming a balanced --+ closure with the France, and another balanced --+ closure with the UK. Germany and the UK become hostile forming a balanced --+ closure with Austria and another balanced --+ closure with Italy British-Russian Alliance
  • #48 That WWI can be predicted without domain knowledge by iterating a graph and applying local structural constraints is nothing short of astonishing to me. Note how the network slides into a balanced labeling — and into World War I.
  • #50 In this case the string triadic closure property still holds – though it is a weak link that characterises the relationship between Stan and Cartman. Given a starting graph, we can apply this simple local principal to see how it would evolve.
  • #51 In this case the string triadic closure property still holds – though it is a weak link that characterises the relationship between Stan and Cartman. Given a starting graph, we can apply this simple local principal to see how it would evolve.
  • #52 A local bridge acts as a link – perhaps the only realistic link - between two otherwise distant (or separate) subgraphs. Local bridges are semantically rich – they provide conduits for information flow between otherwise independent groups. In this case DATING is a local bridge – it must also be a weak relationship according to our definition of a local bridge Intuitively this makes sense – your girl/boyfriend is rather less important at age 8 than your regular friends, IIRC.
  • #53 How do we identify local bridges? Any weak link which would cause a component of the graph to become disconnected. Being able to identify local bridges is important – in this case it’s the only know conduit to allow the girls and boys to communicate. In real life local bridges are apparent in your organisation as experts (or managers); appear as nexus in fraud cases;
  • #54 Zachary in the Journal of Anthropological Research 1977 Intuitively we can see “clumps” in this graph. But how do we separate them out? It’s called minimum cut.
  • #55 What’s interesting is that it’s mechanical – no domain knowledge is necessary. There’s only one failure with the method Zachary chose to partition the graph: node 9 should have gone to the instructor’s club but instead went with the original president of the club (node 34). Why? Because the student was three weeks away from completing a four-year quest to obtain a black belt, which he could only do with the instructor (node 1) Other minimum cut approaches might deliver slightly different results, but on the whole it’s amazing you get such insight from an algorithm!
  • #57 Student 9 was about to take their 1st dan under instructor 1. Though social pressure said they should defect, they stayed for practical reasons.
  • #58 Actually neo4j already has a bunch of these algorithms. Call them easily from Cypher Emergent intelligence from the graph!
  • #59 Efficiency for graph operations is paramount. You don’t need huge macho clusters to do this.
  • #60 Large payment provider, transaction history A 300M node, ~18B rel graph pageranked with 20 iterations in less than 2 hours using the graph algos. On commodity hardware.
  • #61 Contemporary AI
  • #62 Graph structure itself is rich. In this example we don’t need to know the content of the messages to know they’re spam at high confidence, just their position in the graph. Mine a vector of graph features, feed it into the trained model. Graphs have a key advantage: structural context. Where is the node in the graph? Who are its neighbours? Etc. That richness feeds into the model and makes it better, more accurate, more dependable. But we’re still back in a vector! Can we do better?
  • #65 ICLR = International Conference on Learning Representations Graph of movies that a user liked. Feed into neural net Graph of users who rated one of those movies. Feed into neural net. Recurse through the data until you get to all the movies and all the users which are just embedding vectors (fancy hashes that place like near like in a vector space). [Can change these vectors for features to avoid cold-starts, without changing overall architecture.] Graph of back-propagated trained neural nets. Incremental: Scalable for both training and prediction. Extensible: bring in other graph layers! Better than collaborative filtering because it can work on any graph, not just bipartite user-likes-movies graphs. User likes users who likes movies. Have you ever sat through some dull scifi or excruciating period drama for your partner? Of course you have! A bipartite graph, also called a bigraph, is a set of graph vertices decomposed into two disjoint sets such that no two graph vertices within the same set are adjacent. I.e. Users don’t connect to users, only to movies. This is already happening - it’s YouTube’s recommender algorithm.
  • #66 A growing realisation from leaders in the AI community: graph networks as the foundational building block for human-like AI. Argue: combinatorial generalization must be a top priority for AI to achieve human-like abilities. Must be able to compose a finite set of elements in infinite ways (eg like language) We draw analogies by aligning the relational structure between two domains and drawing inferences about one based on corresponding knowledge about the other (Gentner and Markman, 1997; Hummel and Holyoak, 2003). Inductive bias: how the algorithm prioritises solutions. Relational inductive biases to guide deep learning about entities, relations, and rules for composing them. I.e. the learning understands graphs
  • #67 All this might seem hard at first The ML community needs on data, but it really hasn’t been good at exploiting advances in data: extracting features from rows is still commonplace. Graphs changes this for the better. Once you get graphs, all the other things seem hard
  • #68 “a vast gap between human and machine intelligence remains, especially with respect to efficient, generalizable learning” 70% of graph ML today is still turning graphs to vectors E.g. deep walk - random walk through graph, assign vector node when encountered based on neighborhood 30% is truly graph AI - “differential neural computer” -> discern patterns that users can’t; write sophisticated algorithms (fraud, shortest path, etc) from incentive declarations. E.g. no longer need a human expert to discover the “young father” pattern in our data, the machine learns it’s a valuable query in some contexts. Finally ML is being applied to operations: Pavlo of CMU’s “Peloton” tunes databases better than professional DBAs - makes the DB self-driving. Neo4j will head in this direction too.