Optimizing Cypher Queries in Neo4j

6,765 views

Published on

Mark and Wes will talk about Cypher optimization techniques based on real queries as well as the theoretical underlying processes. They'll start from the basics of "what not to do", and how to take advantage of indexes, and continue to the subtle ways of ordering MATCH/WHERE/WITH clauses for optimal performance as of the 2.0.0 release.

Published in: Technology

Optimizing Cypher Queries in Neo4j

  1. 1. Optimizing Cypher Queries in Neo4j Wes Freeman (@wefreema) Mark Needham (@markhneedham)
  2. 2. Today's schedule • Brief overview of cypher syntax • Graph global vs Graph local queries • Labels and indexes • Optimization patterns • Profiling cypher queries • Applying optimization patterns
  3. 3. Cypher Syntax • Statement parts o Optional: Querying part (MATCH|WHERE) o Optional: Updating part (CREATE|MERGE) o Optional: Returning part (WITH|RETURN) • Parts can be chained together
  4. 4. Cypher Syntax - Refresher MATCH (n:Label)-[r:LINKED]->(m) WHERE n.prop = "..." RETURN n, r, m
  5. 5. Starting points • Graph scan (global; potentially slow) • Label scan (usually reserved for aggregation queries; not ideal) • Label property index lookup (local; good!)
  6. 6. Introducing the football dataset
  7. 7. The 1.9 global scan O(n) n = # of nodes START pl = node(*) MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 150ms w/ 30k nodes, 120k rels
  8. 8. The 2.0 global scan MATCH (pl)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 130ms w/ 30k nodes, 120k rels O(n) n = # of nodes
  9. 9. Why is it a global scan? • Cypher is a pattern matching language • It doesn't discriminate unless you tell it to o It must try to start at all nodes to find this pattern, as specified
  10. 10. Introduce a label Label your starting points CREATE (player:Player {name: "Wayne Rooney"} )
  11. 11. O(k) k = # of nodes with that labelLabel scan MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 80ms w/ 30k nodes, 120k rels (~900 :Player nodes)
  12. 12. Indexes don't come for free CREATE INDEX ON :Player(name) OR CREATE CONSTRAINT ON pl:Player ASSERT pl.name IS UNIQUE
  13. 13. O(log k) k = # of nodes with that labelIndex lookup MATCH (pl:Player)-[:played]->(stats) WHERE pl.name = "Wayne Rooney" RETURN stats 6ms w/ 30k nodes, 120k rels (~900 :Player nodes)
  14. 14. Optimization Patterns • Avoid cartesian products • Avoid patterns in the WHERE clause • Start MATCH patterns at the lowest cardinality and expand outward • Separate MATCH patterns with minimal expansion at each stage
  15. 15. Introducing the movie data set
  16. 16. Anti-pattern: Cartesian Products MATCH (m:Movie), (p:Person)
  17. 17. Subtle Cartesian Products MATCH (p:Person)-[:KNOWS]->(c) WHERE p.name="Tom Hanks" WITH c MATCH (k:Keyword) RETURN c, k
  18. 18. Counting Cartesian Products MATCH (pl:Player),(t:Team),(g:Game) RETURN COUNT(DISTINCT pl), COUNT(DISTINCT t), COUNT(DISTINCT g) 80000 ms w/ ~900 players, ~40 teams, ~1200 games
  19. 19. MATCH (pl:Player) WITH COUNT(pl) as players MATCH (t:Team) WITH COUNT(t) as teams, players MATCH (g:Game) RETURN COUNT(g) as games, teams, players8ms w/ ~900 players, ~40 teams, ~1200 games Better Counting
  20. 20. Directions on patterns MATCH (p:Person)-[:ACTED_IN]-(m) WHERE p.name = "Tom Hanks" RETURN m
  21. 21. Parameterize your queries MATCH (p:Person)-[:ACTED_IN]-(m) WHERE p.name = {name} RETURN m
  22. 22. Fast predicates first Bad: MATCH (t:Team)-[:played_in]->(g) WHERE NOT (t)-[:home_team]->(g) AND g.away_goals > g.home_goals RETURN t, COUNT(g)
  23. 23. Better: MATCH (t:Team)-[:played_in]->(g) WHERE g.away_goals > g.home_goals AND NOT (t)-[:home_team]->() RETURN t, COUNT(g) Fast predicates first
  24. 24. Patterns in WHERE clauses • Keep them in the MATCH • The only pattern that needs to be in a WHERE clause is a NOT
  25. 25. MERGE and CONSTRAINTs • MERGE is MATCH or CREATE • MERGE can take advantage of unique constraints and indexes
  26. 26. MERGE (without index) MERGE (g:Game {date:1290257100, time: 1245, home_goals: 2, away_goals: 3, match_id: 292846, attendance: 60102}) RETURN g 188 ms w/ ~400 games
  27. 27. Adding an index CREATE INDEX ON :Game(match_id)
  28. 28. MERGE (with index) MERGE (g:Game {date:1290257100, time: 1245, home_goals: 2, away_goals: 3, match_id: 292846, attendance: 60102}) RETURN g 6 ms w/ ~400 games
  29. 29. Alternative MERGE approach MERGE (g:Game { match_id: 292846 }) ON CREATE SET g.date = 1290257100 SET g.time = 1245 SET g.home_goals = 2 SET g.away_goals = 3 SET g.attendance = 60102 RETURN g
  30. 30. Profiling queries • Use the PROFILE keyword in front of the query o from webadmin or shell - won't work in browser • Look for db_hits and rows • Ignore everything else (for now!)
  31. 31. Reviewing the football dataset
  32. 32. Football Optimization MATCH (game)<-[:contains_match]-(season:Season), (team)<-[:away_team]-(game), (stats)-[:in]->(game), (team)<-[:for]-(stats)<-[:played]-(player) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 103137 ms w/ ~900 players, ~20 teams, ~400 games
  33. 33. Football Optimization ==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8-e746407c504a", " INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE240cfcd2-24d9-48a2-8ca9-fb0286f3d323 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0) ==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEe91b055b-a943-4ddd-9fe8- e746407c504a,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE240cfcd2-24d9-48a2- 8ca9-fb0286f3d323,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899) ==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED108", "season", " UNNAMED55", "player", "team", " UNNAMED124", " UNNAMED85", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192) ==> PatternMatch(g="(player)-[' UNNAMED124']-(stats)", _rows=5192, _db_hits=0) ==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=5192, _db_hits=15542) ==> TraversalMatcher(trail="(season)-[ UNNAMED12:contains_match WHERE true AND true]->(game)<-[ UNNAMED85:in WHERE true AND true]-(stats)-[ UNNAMED108:for WHERE true AND true]->(team)<-[ UNNAMED55:away_team WHERE true AND true]- (game)", _rows=15542, _db_hits=1620462)
  34. 34. Break out the match statements MATCH (game)<-[:contains_match]-(season:Season) MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10200 ms w/ ~900 players, ~20 teams, ~400 games
  35. 35. Start small • Smallest cardinality label first • Smallest intermediate result set first
  36. 36. Exploring cardinalities MATCH (game)<-[:contains_match]-(season:Season) RETURN COUNT(DISTINCT game), COUNT(DISTINCT season) 1140 games, 3 seasons MATCH (team)<-[:away_team]-(game:Game) RETURN COUNT(DISTINCT team), COUNT(DISTINCT game) 25 teams, 1140 games
  37. 37. Exploring cardinalities MATCH (stats)-[:in]->(game:Game) RETURN COUNT(DISTINCT stats), COUNT(DISTINCT game) 31117 stats, 1140 games MATCH (stats)<-[:played]-(player:Player) RETURN COUNT(DISTINCT stats), COUNT(DISTINCT player) 31117 stats, 880 players
  38. 38. Look for teams first MATCH (team)<-[:away_team]-(game:Game) MATCH (game)<-[:contains_match]-(season) WHERE season.name = "2012-2013" MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10162 ms w/ ~900 players, ~20 teams, ~400 games
  39. 39. ==> ColumnFilter(symKeys=["player.name", " INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297-b0c7ec85c969", " INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5"], returnItemNames=["player.name", "COLLECT(DISTINCT team.name)", "goals"], _rows=10, _db_hits=0) ==> Top(orderBy=["SortItem(Cached( INTERNAL_AGGREGATE199af213-e3bd-400f-aba9-8ca2a9e153c5 of type Number),false)"], limit="Literal(10)", _rows=10, _db_hits=0) ==> EagerAggregation(keys=["Cached(player.name of type Any)"], aggregates=["( INTERNAL_AGGREGATEbb08f36b-a70d-46b3-9297- b0c7ec85c969,Distinct(Collect(Property(team,name(0))),Property(team,name(0))))", "( INTERNAL_AGGREGATE199af213-e3bd-400f- aba9-8ca2a9e153c5,Sum(Property(stats,goals(13))))"], _rows=503, _db_hits=10899) ==> Extract(symKeys=["stats", " UNNAMED12", " UNNAMED168", "season", " UNNAMED125", "player", "team", " UNNAMED152", " UNNAMED51", "game"], exprKeys=["player.name"], _rows=5192, _db_hits=5192) ==> PatternMatch(g="(stats)-[' UNNAMED152']-(team),(player)-[' UNNAMED168']-(stats)", _rows=5192, _db_hits=0) ==> PatternMatch(g="(stats)-[' UNNAMED125']-(game)", _rows=10394, _db_hits=0) ==> Filter(pred="Property(season,name(0)) == Literal(2012-2013)", _rows=380, _db_hits=380) ==> PatternMatch(g="(season)-[' UNNAMED51']-(game)", _rows=380, _db_hits=1140) ==> TraversalMatcher(trail="(game)-[ UNNAMED12:away_team WHERE true AND true]->(team)", _rows=1140, _db_hits=1140) Look for teams first
  40. 40. Filter games a bit earlier MATCH (game)<-[:contains_match]-(season:Season) WHERE season.name = "2012-2013" MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game) MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10148 ms w/ ~900 players, ~20 teams, ~400 games
  41. 41. Filter out stats with no goals MATCH (game)<-[:contains_match]-(season:Season) WHERE season.name = "2012-2013" MATCH (team)<-[:away_team]-(game) MATCH (stats)-[:in]->(game)WHERE stats.goals > 0 MATCH (team)<-[:for]-(stats)<-[:played]-(player) RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10 59 ms w/ ~900 players, ~20 teams, ~400 games
  42. 42. Movie query optimization MATCH (movie:Movie {title: {title} }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  43. 43. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  44. 44. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })MATCH (genre)<-[:HAS_GENRE]- (movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  45. 45. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie)MATCH (director)-[:DIRECTED]- >(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  46. 46. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie)MATCH (actor)-[:ACTED_IN]- >(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  47. 47. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie)MATCH (writer)-[:WRITER_OF]- >(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  48. 48. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie)MATCH (actor)-[:ACTED_IN]- >(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  49. 49. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies)MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  50. 50. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie)WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writersORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  51. 51. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESCWITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  52. 52. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writersMATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<- [:HAS_KEYWORD]-(movies) WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  53. 53. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' }) MATCH (genre)<-[:HAS_GENRE]-(movie) MATCH (director)-[:DIRECTED]->(movie) MATCH (actor)-[:ACTED_IN]->(movie) MATCH (writer)-[:WRITER_OF]->(movie) MATCH (actor)-[:ACTED_IN]->(actormovies) MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword) as weight, count(DISTINCT actormovies) as actormoviesweight, movie, collect(DISTINCT genre.name) as genres, collect(DISTINCT director.name) as directors, actor, collect(DISTINCT writer.name) as writers ORDER BY weight DESC, actormoviesweight DESC WITH collect(DISTINCT {name: actor.name, weight: actormoviesweight}) as actors, movie, collect(DISTINCT {related: {title: related.title}, weight: weight}) as related, genres, directors, writers MATCH (movie)-[:HAS_KEYWORD]->(keyword:Keyword)<-[:HAS_KEYWORD]-(movies)WITH keyword.name as keyword, count(movies) as keyword_weight, movie, related, actors, genres, directors, writers ORDER BY keyword_weight RETURN collect(DISTINCT keyword), movie, actors, related, genres, directors, writers
  54. 54. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor ORDER BY actormoviesweight DESC WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  55. 55. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight // 1 row per actor ORDER BY actormoviesweight DESC WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  56. 56. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor)WITH movie, actor, length((actor)-[:ACTED_IN]- >()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  57. 57. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actorWITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  58. 58. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]- >(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  59. 59. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]- >(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movieORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  60. 60. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESCWITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers 10x faster
  61. 61. Movie query optimization MATCH (movie:Movie {title: 'The Matrix' })<-[:ACTED_IN]-(actor) WITH movie, actor, length((actor)-[:ACTED_IN]->()) as actormoviesweight ORDER BY actormoviesweight DESC // 1 row per actor WITH movie, collect({name: actor.name, weight: actormoviesweight}) as actors // 1 row MATCH (movie)-[:HAS_GENRE]->(genre) WITH movie, actors, collect(genre) as genres // 1 row MATCH (director)-[:DIRECTED]->(movie) WITH movie, actors, genres, collect(director.name) as directors // 1 row MATCH (writer)-[:WRITER_OF]->(movie) WITH movie, actors, genres, directors, collect(writer.name) as writers // 1 row MATCH (movie)-[:HAS_KEYWORD]->(keyword)<-[:HAS_KEYWORD]-(movies:Movie) WITH DISTINCT movies as related, count(DISTINCT keyword.name) as keywords, movie, genres, directors, actors, writers // 1 row per related movie ORDER BY keywords DESC WITH collect(DISTINCT { related: { title: related.title }, weight: keywords }) as related, movie, actors, genres, directors, writers // 1 rowMATCH (movie)-[:HAS_KEYWORD]->(keyword) RETURN collect(keyword.name) as keywords, related, movie, actors, genres, directors, writers // 1 row 10x faster
  62. 62. Design for Queryability Model
  63. 63. Design for Queryability Query
  64. 64. Design for Queryability Model
  65. 65. Making the implicit explicit • When you have implicit relationships in the graph you can sometimes get better query performance by modeling the relationship explicitly
  66. 66. Making the implicit explicit
  67. 67. Refactor property to node Bad: MATCH (g:Game) WHERE g.date > 1343779200 AND g.date < 1369094400 RETURN g
  68. 68. Good: MATCH (s:Season)-[:contains]->(g) WHERE season.name = "2012-2013" RETURN g Refactor property to node
  69. 69. Conclusion • Avoid the global scan • Add indexes / unique constraints • Split up MATCH statements • Measure, measure, measure, tweak, repeat • Soon Cypher will do a lot of this for you!
  70. 70. Bonus tip • Use transactions/transactional cypher endpoint
  71. 71. Q & A • If you have them send them in

×