● What is the Graph Catalog?
● Named graphs versus Anonymous graphs
● Native projection versus Cypher projection
● Mutability
● Graph Catalog management
We are still on the gameofthrones database and you can either
run the following guide inside the Neo4j Browser
(note that this requires a neo4j.conf setting to whitelist the host)
or you open a regular browser session too and go to
and cut-and-paste the commands from there
The shape of the graph you use for analytics (and algorithms) is
significantly different from the one you have to run the complex
business queries in real time and do the transactional work. To
reiterate the technical terms …
● is a single set of nodes that are interconnected
● is what you need for the majority of the graph algorithms
If you ever wondered why Facebook (or people leveraging Facebook
data) is so - notoriously - good at analytics … think about what the core
Facebook graph is like ...
● two set of nodes that are connected but the sets themselves
are not interconnected
● great as input for algorithms (such as node similarity) that are
used to create a monopartite graph
If you've done basic Neo4j trainings … the Movie graph is also a
bipartite graph.
● lots of sets of nodes and lots of types of relationships between
● ideal for describing a domain or business and for real time
complex queries
This is how we teach you to model in graph modeling classes … did I hit
the point home enough now?
Procedures (part of the GDS library) that let you reshape and
subset your transactional graph so you have the right data in the
right shape to run analytical algorithms.
This is what you
already know ...
Native Graph Storage
Page Cache
Procedures (part of the GDS library) that let you reshape and
subset your transactional graph so you have the right data in the
right shape to run analytical algorithms.
Mutable In-Memory
While the in-memory workspace disappears when the database is
stopped (it's ephemeral to use a fancy word) it is also not just a one
reshape, one algorithm run, do-it-all-over-again setup. You can
re-use previous reshapes, mutate them, name them, reuse them.
It's a catalog.
In order to fully grasp that we'll shortly list all the modes in which
you can do the Graph Data Science and then explore them in detail
Rather than give you some dry explanation … try it out. I (or rather
pageranking) give(s) you … Jon Snow!
nodeProjection: "Person",
relationshipProjection: "INTERACTS"
}) YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
Bummer, that didn't work out ...
● I can't even show you the real Graph Catalog stuff here
(although it is used under the hood) because this really is the
one-shot-fire-and-forget-doing-the-algorithm method.
● Which is relatively easy to learn.
● And as the Person, INTERACTS subgraph is a monopartite
graph, a native projection (aka Look ma, no hands) was possible
● ...
You're not remembering the series or the books wrong though, Jon
Snow should have come out on top … so something was wrong!
This time we're going for those that are most prominent in the
battles ...
nodeQuery: "MATCH (p:Person) RETURN id(p) AS id",
relationshipQuery: "MATCH (p1:Person)-[]->(:Battle)<-[]-(p2:Person)
RETURN id(p1) AS source, id(p2) AS target"
}) YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
● Again, not a lot of Graph Catalog stuff to show, the
monopartite graph is shaped on the fly …
● While somewhat more complex (you need to write the queries
to do the projection), the results should immediately be more
relevant (as you're in control) … a great approach for proof of
● ...
Exactly the same question as we had in Mode II, but this time we're
going to name the graph.
CALL gds.graph.create.cypher(
"MATCH (p:Person) RETURN id(p) AS id",
"MATCH (p1:Person)-[]->(:Battle)<-[]-(p2:Person) RETURN id(p1) AS source,
id(p2) AS target"
) YIELD graphName, nodeCount, relationshipCount
Wait … we haven't actually done the algorithm yet ...
CALL'gds-brutes') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC LIMIT 10;
CALL'gds-brutes') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
But now … we can just keep going ...
● Now we're getting somewhere … a named graph remains
available in between runs of (potentially) different algorithms.
● Rather than going for an adhoc fire-and-forget, this moves the
ball more towards flexible workflows.
● While Cypher projection is a great tool, it comes with the
downside of being - relatively - slow for huge workloads, …
● ...
Don't get impatient, we'll dig deeper into Catalog management in a
minute … allow me to finish the Fab Four first though … also, did you
notice the difference in who came out on top?
Exactly the same question as we had in Mode I, but this time we're
going to name the graph.
CALL gds.graph.create(
) YIELD graphName, nodeCount, relationshipCount
And run ...
CALL'gds-interaction') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
And keep going ...
CALL'gds-interaction') YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
● So this is the whole nine yards. And it runs at huge scale
(which you can't see here so you'll have to take my word for it)
● There's a chicken and egg problem though, the monopartite
graph must be in the database already.
● ...
So we finally did get Jon Snow, but pagerank should also have gotten
him. Can anybody venture a guess by now on what we're doing wrong
Cypher Performant at
Easy to learn
Quick proof of
● The in-memory workspace is the secret sauce of the Graph
Data Science library and is super-efficient. It can handle huge
graph projections.
● It does however require memory and you will quickly run out if
you don't manage it properly.
● Also … you will forget what you put in there if you look at it as a
bottomless pit, thus creating overhead for yourself.
● ...
There's a very interesting tool that gives you an overview of the
in-memory workspace. Try it
CALL gds.graph.list();
If you followed along so far, you should get two results …
gds-brutes and gds-interacts. You can also examine them
individually. Try it
CALL gds.graph.list('gds-brutes');
Btw, a CALL requires a YIELD … except when it is a statement by itself.
Hence the missing YIELD and RETURN (for brevity) here ...
Done with a named graph? Drop it! As there is something not right
with our interactions one, lets get rid of it
CALL gds.graph.drop('gds-interaction');
And verify with the list command that it's indeed gone ...
CALL gds.graph.list();
By popular request the engineering team has been working on a
way to actually persist the complete named projection. And as of
the very latest GDS that tool is there (unpolished for now though) ...
CALL gds.graph.export('gds-brutes',{dbName:"brutes"});
You will not find this in the guides and I do not want you to try it
now as the steps will confuse a lot of people. Do try this (and
everything else) at home though!
I'm not really supposed to show you this one and there's no
guarantee it will stay in the future, but I find this one extremely
useful myself ...
CALL gds.debug.sysInfo();
Very useful for say … quickly figuring out how low you are on heap and
such ...
No, not really
● Unless you're improvising a one-shot thing and even then … the
syntax of these things (unless you're doing a trivial demo) is not
easy, you should follow a workflow and use a Named graph.
● Unless you're using an algorithm that hasn't been converted to
using the workspace yet … well … you don't really have a choice
then … (Pathfinding comes to mind)
I tried all of the syntax for all of my presentations during these two
days … as you would/should …
● The original decks still had 3.5.x syntax, Emil Eifrem (our CEO)
has sworn to shoot everybody that still shows 3.5.x stuff
● Obviously I also want to show you the latest GDS library
● There are subtle differences about how to write the projections
in the named syntax versus those in the anonymous syntax
● ...
So spare yourself the frustration and pain and learn the syntax you'll
be using for production. Named graphs. Thank me later!
Jon Snow didn't show up as the top dog based on the pagerank
algorithm. And I actually showed you earlier what the issue is ...
A person's interaction with another person is obviously undirected
(or bi-directional, whichever you prefer), but the Property Graph
is directed and in modeling trainings you'll hear to not create a
second relationship (as that would duplicate data) then.
However, how would an algorithm know that the domain implies
an undirected relationship as the Property Graph has no schema that
specifies / enforces such information?
The algorithm makes the reasonable (default) assumption that
INTERACTS is a directed relationship. Persons that are on the target
end of them are thus not considered in the pagerank. And it turns
out (and this is purely based on how the data was loaded) that Jon
Snow is frequently the target, rarely the source.
nodeProjection: "Person",
relationshipProjection: {
type: "INTERACTS",
orientation: "UNDIRECTED"
}) YIELD nodeId, score
RETURN gds.util.asNode(nodeId).name AS name, score
ORDER BY score DESC, name ASC
● It takes the Person nodes and puts them in the workspace
(again as Person and note that it didn't have to be).
● It takes the INTERACTS relationships and puts them in the
workspace (again as INTERACTS … idem). Because we specify
the orientation as undirected this will effectively result in
doubling the number of them in the workspace ...
I don't always find all this reshaping that obvious myself. Planning
upfront what you are aiming for is a good idea!
I just showed you how to fix the problem for an Anonymous graph,
but now we want it as a Named graph …
● Take the syntax from the Mode IV example and create the
named graph again, this time as gds-interaction-natural
● Try to modify the syntax and create a second named graph,
● Using gds.graph.list on both named graphs, can you recognize
the difference? Note it down!
When you are ready (give everybody a bit of a chance though), paste
your solution (to second and third bulletpoint) in the chat ...
CALL gds.graph.create("gds-interaction-undirected","Person",{
type: "INTERACTS",
orientation: "UNDIRECTED"
The relationshipCount should have doubled, for me (yours may
be slightly different) they are 3907 and 7814
● label(s)
● properties
● type(s)
● orientation
● aggregation
● properties
And all those can (but also must) be
controlled with either a Native or a Cypher
● Cypher gives you complete flexiblity,
Native gives you complete
● Cypher leaves your original graph
standing as is, Native may require
Lets consider the financial practices of Dewey, Cheatum and Howe ...
Less clutter, same information … right … right … RIGHT???
Instead of going to jail for 25 years, Dewey, Cheatum and Howe avoided
the law for another 10 years of money laundering. False names, true
story ...
Because … while aggregation is great for most analytics usecases,
it also destroyed the clear 1% mule kickback scheme that you could
almost literally see with the naked eye … Transactional fraud
If only there was a way to shape data efficiently - depending on the
usecase - without destroying the more expressive set that describes our
business ...
If you remember one thing (ok, one thing + the puppies) of this
session about the Graph Catalog, that is it. That is the purpose of it
and that's why Neo4j can rightfully claim a prominent place in this
And as an aside … the Native Projection can very efficiently (much more
efficient than Cypher Projection) do aggregations for analytical
Yes, I know it's an empty slide … how could I possibly fit all of it on such
a thing … allow me to swap to my code editor for a second ...
CALL gds.graph.create.cypher('gds-ultimate-cypher',
'MATCH (p:Person) RETURN id(p) as id, p.birth_year as birthyear',
RETURN id(p1) as source, id(p2) as target, count(DISTINCT b) as
Who cares as long as we all agree that this and not Jon Snow is the top
Each of the algorithms comes with eight procedures.
Try typing
CALL gds.wcc
in the browser without completing the line (or entering) and see
what you get ...
Algorithm Task
gds.wcc.stats statistics about the run
gds.wcc.write writes result back to database
gds.wcc.mutate writes result back to in-memory graph streams result
gds.wcc.stats.estimate estimated memory usage statistics
gds.wcc.write.estimate estimated memory usage write
gds.wcc.mutate.estimate estimated memory usage mutate estimated memory usage stream
A result-stream out of an algorithm
is quite like the printouts we used
to get at work. Nobody ever looked
at the things and they end up as
drawing paper for the kids … ok, the
similarity stopped a bit before that
point, but you get what I mean.
Yes, that is how that is spelled, it's not Segway, that's one of those weird
electrical devices that has you balance on two wheels ...
Any-way … have you ever wondered about how underused the
results of a machine learning pipeline often are? You've spend tons
of energy into learning something and then … it ends up on a four
coloured bar chart in Tableau?
So while we're on the topic … there's this thing called a Property
Graph that allows very flexible modeling of your data and would
happily take good care of your newly learned fact ...
One of the reasons I've been using the Graph Data Science library
right from the start (back when it was still called algo) is that it can
write back the results to the database.
Unsure who originally thought of that (I suspect it was by incident),
but it was a stroke of genious. And in order to corroborate that, I
have to talk about ...
Did you know about this monopartite and bipartite stuff? And how
it relates to analytics? I mean, know before you heard about it
today and had it spelled out to you?
All of you did? Wow … I'm superimpressed now ...
What has been impressing customers ever since we have Graph
Data Science is the unfailing (golden) combination of similarity
followed by community detection.
Similarity turns bipartite subgraphs into monopartite graphs.
Community detection then segments <whatever it is you want to
segment>. Kerching!
Has that become a not-PC sentence yet? It will soon no doubt ...
Writing similarity back (as a relationship) to a graph has some other
nice effects. Suddenly doing recommendations becomes a whole
lot easier. If you know (with a simple pointerhop) who is similar to
me … I'm sure you can find ways to tell me what I like.
Those relationships do clutter up the graph though. Wouldn't it be nice
if I could do the golden combination and only get the communities back
as properties?
It has taken a while to make my point but I wanted you to fully
understand why being able to mutate the in-memory workspace is
so useful. Now let us finish this session by putting it in practice ...
CALL gds.graph.create('house-bipartite',
{ BELONGS_TO: { type: 'BELONGS_TO', orientation: 'REVERSE'}});
CALL gds.nodeSimilarity.mutate('house-bipartite', {
similarityCutoff: 0.05,
mutateRelationshipType: 'SIMILAR',
mutateProperty: 'score'
CALL gds.louvain.write('house-bipartite', { writeProperty:
MATCH (h:House)
WITH as community, count(*) as members,
collect( as membernames
05   neo4j gds graph catalog

  • 2. ● What is the Graph Catalog? ● Named graphs versus Anonymous graphs ● Native projection versus Cypher projection ● Mutability ● Graph Catalog management
  • 3. We are still on the gameofthrones database and you can either run the following guide inside the Neo4j Browser :play (note that this requires a neo4j.conf setting to whitelist the host) or you open a regular browser session too and go to and cut-and-paste the commands from there
  • 4.
  • 5.
  • 6. The shape of the graph you use for analytics (and algorithms) is significantly different from the one you have to run the complex business queries in real time and do the transactional work. To reiterate the technical terms …
  • 7. ● is a single set of nodes that are interconnected ● is what you need for the majority of the graph algorithms If you ever wondered why Facebook (or people leveraging Facebook data) is so - notoriously - good at analytics … think about what the core Facebook graph is like ...
  • 8. ● two set of nodes that are connected but the sets themselves are not interconnected ● great as input for algorithms (such as node similarity) that are used to create a monopartite graph If you've done basic Neo4j trainings … the Movie graph is also a bipartite graph.
  • 9. ● lots of sets of nodes and lots of types of relationships between them ● ideal for describing a domain or business and for real time complex queries This is how we teach you to model in graph modeling classes … did I hit the point home enough now?
  • 10. Procedures (part of the GDS library) that let you reshape and subset your transactional graph so you have the right data in the right shape to run analytical algorithms. This is what you already know ... Native Graph Storage Page Cache
  • 11. Procedures (part of the GDS library) that let you reshape and subset your transactional graph so you have the right data in the right shape to run analytical algorithms. Mutable In-Memory Workspace
  • 12. While the in-memory workspace disappears when the database is stopped (it's ephemeral to use a fancy word) it is also not just a one reshape, one algorithm run, do-it-all-over-again setup. You can re-use previous reshapes, mutate them, name them, reuse them. It's a catalog. In order to fully grasp that we'll shortly list all the modes in which you can do the Graph Data Science and then explore them in detail ...
  • 13. Rather than give you some dry explanation … try it out. I (or rather pageranking) give(s) you … Jon Snow! CALL{ nodeProjection: "Person", relationshipProjection: "INTERACTS" }) YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 20; Bummer, that didn't work out ...
  • 14. ● I can't even show you the real Graph Catalog stuff here (although it is used under the hood) because this really is the one-shot-fire-and-forget-doing-the-algorithm method. ● Which is relatively easy to learn. ● And as the Person, INTERACTS subgraph is a monopartite graph, a native projection (aka Look ma, no hands) was possible ● ... You're not remembering the series or the books wrong though, Jon Snow should have come out on top … so something was wrong!
  • 15. This time we're going for those that are most prominent in the battles ... CALL{ nodeQuery: "MATCH (p:Person) RETURN id(p) AS id", relationshipQuery: "MATCH (p1:Person)-[]->(:Battle)<-[]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target" }) YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 10;
  • 16. ● Again, not a lot of Graph Catalog stuff to show, the monopartite graph is shaped on the fly … ● While somewhat more complex (you need to write the queries to do the projection), the results should immediately be more relevant (as you're in control) … a great approach for proof of concepts! ● ...
  • 17. Exactly the same question as we had in Mode II, but this time we're going to name the graph. CALL gds.graph.create.cypher( "gds-brutes", "MATCH (p:Person) RETURN id(p) AS id", "MATCH (p1:Person)-[]->(:Battle)<-[]-(p2:Person) RETURN id(p1) AS source, id(p2) AS target" ) YIELD graphName, nodeCount, relationshipCount RETURN *;
  • 18. Wait … we haven't actually done the algorithm yet ... CALL'gds-brutes') YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 10; CALL'gds-brutes') YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 10; But now … we can just keep going ...
  • 19. ● Now we're getting somewhere … a named graph remains available in between runs of (potentially) different algorithms. ● Rather than going for an adhoc fire-and-forget, this moves the ball more towards flexible workflows. ● While Cypher projection is a great tool, it comes with the downside of being - relatively - slow for huge workloads, … ● ... Don't get impatient, we'll dig deeper into Catalog management in a minute … allow me to finish the Fab Four first though … also, did you notice the difference in who came out on top?
  • 20. Exactly the same question as we had in Mode I, but this time we're going to name the graph. CALL gds.graph.create( "gds-interaction", "Person", "INTERACTS" ) YIELD graphName, nodeCount, relationshipCount RETURN *;
  • 21. And run ... CALL'gds-interaction') YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 10; And keep going ... CALL'gds-interaction') YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 10;
  • 22. ● So this is the whole nine yards. And it runs at huge scale (which you can't see here so you'll have to take my word for it) ● There's a chicken and egg problem though, the monopartite graph must be in the database already. ● ... So we finally did get Jon Snow, but pagerank should also have gotten him. Can anybody venture a guess by now on what we're doing wrong there?
  • 23. Anonymous Named Native Cypher Performant at scale Easy to learn Flexible workflows Quick proof of concepts
  • 24.
  • 25.
  • 26. ● The in-memory workspace is the secret sauce of the Graph Data Science library and is super-efficient. It can handle huge graph projections. ● It does however require memory and you will quickly run out if you don't manage it properly. ● Also … you will forget what you put in there if you look at it as a bottomless pit, thus creating overhead for yourself. ● ...
  • 27. There's a very interesting tool that gives you an overview of the in-memory workspace. Try it CALL gds.graph.list(); If you followed along so far, you should get two results … gds-brutes and gds-interacts. You can also examine them individually. Try it CALL gds.graph.list('gds-brutes'); Btw, a CALL requires a YIELD … except when it is a statement by itself. Hence the missing YIELD and RETURN (for brevity) here ...
  • 28. Done with a named graph? Drop it! As there is something not right with our interactions one, lets get rid of it CALL gds.graph.drop('gds-interaction'); And verify with the list command that it's indeed gone ... CALL gds.graph.list();
  • 29. By popular request the engineering team has been working on a way to actually persist the complete named projection. And as of the very latest GDS that tool is there (unpolished for now though) ... CALL gds.graph.export('gds-brutes',{dbName:"brutes"}); WARNING You will not find this in the guides and I do not want you to try it now as the steps will confuse a lot of people. Do try this (and everything else) at home though!
  • 30. I'm not really supposed to show you this one and there's no guarantee it will stay in the future, but I find this one extremely useful myself ... CALL gds.debug.sysInfo(); Very useful for say … quickly figuring out how low you are on heap and such ...
  • 31.
  • 32. No, not really ● Unless you're improvising a one-shot thing and even then … the syntax of these things (unless you're doing a trivial demo) is not easy, you should follow a workflow and use a Named graph. ● Unless you're using an algorithm that hasn't been converted to using the workspace yet … well … you don't really have a choice then … (Pathfinding comes to mind)
  • 33. I tried all of the syntax for all of my presentations during these two days … as you would/should … ● The original decks still had 3.5.x syntax, Emil Eifrem (our CEO) has sworn to shoot everybody that still shows 3.5.x stuff ● Obviously I also want to show you the latest GDS library ● There are subtle differences about how to write the projections in the named syntax versus those in the anonymous syntax ● ... So spare yourself the frustration and pain and learn the syntax you'll be using for production. Named graphs. Thank me later!
  • 34.
  • 35. Jon Snow didn't show up as the top dog based on the pagerank algorithm. And I actually showed you earlier what the issue is ... A person's interaction with another person is obviously undirected (or bi-directional, whichever you prefer), but the Property Graph is directed and in modeling trainings you'll hear to not create a second relationship (as that would duplicate data) then.
  • 36. However, how would an algorithm know that the domain implies an undirected relationship as the Property Graph has no schema that specifies / enforces such information? The algorithm makes the reasonable (default) assumption that INTERACTS is a directed relationship. Persons that are on the target end of them are thus not considered in the pagerank. And it turns out (and this is purely based on how the data was loaded) that Jon Snow is frequently the target, rarely the source.
  • 37. CALL{ nodeProjection: "Person", relationshipProjection: { INTERACTS: { type: "INTERACTS", orientation: "UNDIRECTED" } } }) YIELD nodeId, score RETURN gds.util.asNode(nodeId).name AS name, score ORDER BY score DESC, name ASC LIMIT 20;
  • 38. ● It takes the Person nodes and puts them in the workspace (again as Person and note that it didn't have to be). ● It takes the INTERACTS relationships and puts them in the workspace (again as INTERACTS … idem). Because we specify the orientation as undirected this will effectively result in doubling the number of them in the workspace ... I don't always find all this reshaping that obvious myself. Planning upfront what you are aiming for is a good idea!
  • 39. I just showed you how to fix the problem for an Anonymous graph, but now we want it as a Named graph … ● Take the syntax from the Mode IV example and create the named graph again, this time as gds-interaction-natural ● Try to modify the syntax and create a second named graph, gds-interaction-undirected ● Using gds.graph.list on both named graphs, can you recognize the difference? Note it down! When you are ready (give everybody a bit of a chance though), paste your solution (to second and third bulletpoint) in the chat ...
  • 40. CALL gds.graph.create("gds-interaction-undirected","Person",{ INTERACTS: { type: "INTERACTS", orientation: "UNDIRECTED" } }) The relationshipCount should have doubled, for me (yours may be slightly different) they are 3907 and 7814
  • 41. Nodes ● label(s) ● properties Relationships ● type(s) ● orientation ● aggregation ● properties And all those can (but also must) be controlled with either a Native or a Cypher projection. ● Cypher gives you complete flexiblity, Native gives you complete performance. ● Cypher leaves your original graph standing as is, Native may require constructs
  • 42. Lets consider the financial practices of Dewey, Cheatum and Howe ...
  • 43. Less clutter, same information … right … right … RIGHT???
  • 44. Instead of going to jail for 25 years, Dewey, Cheatum and Howe avoided the law for another 10 years of money laundering. False names, true story ... Because … while aggregation is great for most analytics usecases, it also destroyed the clear 1% mule kickback scheme that you could almost literally see with the naked eye … Transactional fraud detection. If only there was a way to shape data efficiently - depending on the usecase - without destroying the more expressive set that describes our business ...
  • 45. If you remember one thing (ok, one thing + the puppies) of this session about the Graph Catalog, that is it. That is the purpose of it and that's why Neo4j can rightfully claim a prominent place in this game. And as an aside … the Native Projection can very efficiently (much more efficient than Cypher Projection) do aggregations for analytical purposes.
  • 46. Yes, I know it's an empty slide … how could I possibly fit all of it on such a thing … allow me to swap to my code editor for a second ...
  • 47. CALL gds.graph.create.cypher('gds-ultimate-cypher', 'MATCH (p:Person) RETURN id(p) as id, p.birth_year as birthyear', 'MATCH (p1:Person)-[:APPEARED_IN]->(b:Book)<-[:APPEARED_IN]-(p2:Person ) RETURN id(p1) as source, id(p2) as target, count(DISTINCT b) as weight');
  • 48. Who cares as long as we all agree that this and not Jon Snow is the top dog!
  • 49.
  • 50. Each of the algorithms comes with eight procedures. Try typing CALL gds.wcc in the browser without completing the line (or entering) and see what you get ...
  • 51. Algorithm Task gds.wcc.stats statistics about the run gds.wcc.write writes result back to database gds.wcc.mutate writes result back to in-memory graph streams result gds.wcc.stats.estimate estimated memory usage statistics gds.wcc.write.estimate estimated memory usage write gds.wcc.mutate.estimate estimated memory usage mutate estimated memory usage stream
  • 52. A result-stream out of an algorithm is quite like the printouts we used to get at work. Nobody ever looked at the things and they end up as drawing paper for the kids … ok, the similarity stopped a bit before that point, but you get what I mean.
  • 53. Yes, that is how that is spelled, it's not Segway, that's one of those weird electrical devices that has you balance on two wheels ... Any-way … have you ever wondered about how underused the results of a machine learning pipeline often are? You've spend tons of energy into learning something and then … it ends up on a four coloured bar chart in Tableau? So while we're on the topic … there's this thing called a Property Graph that allows very flexible modeling of your data and would happily take good care of your newly learned fact ...
  • 54. One of the reasons I've been using the Graph Data Science library right from the start (back when it was still called algo) is that it can write back the results to the database. Unsure who originally thought of that (I suspect it was by incident), but it was a stroke of genious. And in order to corroborate that, I have to talk about ...
  • 55. Did you know about this monopartite and bipartite stuff? And how it relates to analytics? I mean, know before you heard about it today and had it spelled out to you? All of you did? Wow … I'm superimpressed now ... What has been impressing customers ever since we have Graph Data Science is the unfailing (golden) combination of similarity followed by community detection. Similarity turns bipartite subgraphs into monopartite graphs. Community detection then segments <whatever it is you want to segment>. Kerching!
  • 56. Has that become a not-PC sentence yet? It will soon no doubt ... Writing similarity back (as a relationship) to a graph has some other nice effects. Suddenly doing recommendations becomes a whole lot easier. If you know (with a simple pointerhop) who is similar to me … I'm sure you can find ways to tell me what I like. Those relationships do clutter up the graph though. Wouldn't it be nice if I could do the golden combination and only get the communities back as properties?
  • 57. It has taken a while to make my point but I wanted you to fully understand why being able to mutate the in-memory workspace is so useful. Now let us finish this session by putting it in practice ... CALL gds.graph.create('house-bipartite', ['House','Person'], { BELONGS_TO: { type: 'BELONGS_TO', orientation: 'REVERSE'}});
  • 58. CALL gds.nodeSimilarity.mutate('house-bipartite', { similarityCutoff: 0.05, mutateRelationshipType: 'SIMILAR', mutateProperty: 'score' }); CALL gds.louvain.write('house-bipartite', { writeProperty: 'community'});
  • 59. MATCH (h:House) WITH as community, count(*) as members, collect( as membernames RETURN * ORDER BY members DESC LIMIT 10;