Football graph - Neo4j and the Premier League

2,385 views

Published on

Published in: Sports, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,385
On SlideShare
0
From Embeds
0
Number of Embeds
139
Actions
Shares
0
Downloads
34
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • In this talk, we'll look at how graph data and Neo4j can be used to model the English Premier League. We'll see how the graph model and Cypher query language makes it natural and fun to query multidimensional semi-structured data. We'll also see how graphs encourage discoverability so that we can spot interesting correlations and become king of the arcane football facts (e.g. how many goals have been scored at grounds in the North West of England by players originating from South America) at your local pub quiz. We'll also see what the graph model would look like if modeled in a relational way and show where the approach reaches its limits and the graph addresses and resolves those challenges.
  • Let’s get started and talk about graphs. Now in this context we’re thinking more of what are sometimes known as networks and…
  • …many people when they hear the word graph think of this.
  • Which isn’t what we’re going to be talking about today!
  • It’s not a new thing, you’ll already be familiar with lots of things that are graphs but perhaps you don’t know it yet. The London tube is perhaps the most famous example that Londoners at least use every day
  • It’s not a new thing, you’ll already be familiar with lots of things that are graphs but perhaps you don’t know it yet. The London tube is perhaps the most famous example that Londoners at least use every day
  • Or if not then you’ve certainly heard of the social network (graph)
  • An organisational hierarchy is a common model
  • An organisational hierarchy is a common model
  • Or of course as we mentioned earlier, a social network of friends of friends and so on is a popular graph
  • Null values all over the place
  • Now, as I say, graph databases allow you to store, manage and query your data as a graph. Neo4j adopts a very particular graph model, which we call the property graph model.So I’m going to spend the next few minutes talking about the important aspects of this model in more detail.In fact, I’m going to talk about the enhanced property graph model, which will be available in Neo4j 2.0 sometime later this year.
  • Pointer in memory and ultimately on disk
  • Analogy: Gmail labels. Every mail can have zero or more labels attached. Allow you to associate filters with groups of emails.
  • Always motivated by needs, problems, goals: not transparent window onto realityC18: Seven Bridges of KönigsbergGoal: Find path through the city that crosses each bridge once and once only
  • Which leads us perfectly into neo4j’s query language
  • Football is quite a nice domain for
  • Football is quite a nice domain for modelling in graphs because the data has a lot of dimensions to it
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • Football is quite a nice domain for
  • -> SQL - define your tables and relationships and generally don’t change that.Might denormalise or add indexes to speed up queries-> Graphs – define your initial nodes and relationships. May then add ‘layers’ to the graph to make implicit relationships explicit
  • Football is quite a nice domain for
  • How is this different to a relational database? We have tables (nodes) and foreign keys between tables (relationships)Those are calculated at run time – in a graph a relationship is a first Class citizen. Effectively a pre-computed indexYou can also traverse lots of ‘hops’ which becomes quite expensive when You do
  • If it’s not fun and It seems cumbersome then perhaps it’s the wrong tool for that particular data problem or it’s modeled in the wrong way. Might be worth asking
  • Might be worth asking for help if that isn’t happening or you’re stuck. We have a good community on Stack Overflow and a mailing list as well. You’ll get answers to any questions you have pretty quickly.
  • Please take a copy of t
  • Might be worth asking for help if that isn’t happening or you’re stuck. We have a good community on Stack Overflow and a mailing list as well. You’ll get answers to any questions you have pretty quickly.
  • Football graph - Neo4j and the Premier League

    1. 1. In a League of their Own: Neo4j and Premiership Football Mark Needham @markhneedham
    2. 2. • • • • • • Intro to graphs When do we need a graph? Property graph model Neo4j’s query language The football graph Using Neo4j from .NET Outline
    3. 3. Let’s talk graphs
    4. 4. Dancing With Michael Jackson Eating Brains You mean these?
    5. 5. Dancing With Michael Jackson Eating Brains Nope!
    6. 6. Node Relationship Ok so what’s a graph then?
    7. 7. The tube
    8. 8. The social network (graph)
    9. 9. Complexity What are graphs good for?
    10. 10. complexity = f(size, semi-structure, connectedness) Data Complexity
    11. 11. Size
    12. 12. complexity = f(size , semi-structure, connectedness) The Real Complexity
    13. 13. Semi-Structure
    14. 14. USER_ID FIRST_NAME LAST_NAME EMAIL_1 EMAIL_2 FACEBOOK 315 Mark Needham mark.needham@neotech nology.com m.h.needham@gmail.com NULL Email: mark.needham@neotechnology.com Email: m.h.needham@gmail.com Twitter: @markhneedham Skype: mk_jnr1984 TWITTER @markhneedham CONTACT USER Semi-Structure CONTACT_TYPE SKYPE mk_jnr1984
    15. 15. complexity = f(size , semi-structure, connectedness) The Real Complexity
    16. 16. Connectedness
    17. 17. Connectedness
    18. 18. Connectedness
    19. 19. Densely Connected Semi Structured When do we need a graph?
    20. 20. Lots of join tables Densely connected?
    21. 21. Lots of sparse tables Semi-Structured?
    22. 22. • Millions of ‘joins’ per second • Consistent query times as dataset grows • Join Complexity and Performance • Easy to evolve data model • Easy to ‘layer’ different types of data together Properties of graph databases
    23. 23. Property Graph Data Model
    24. 24. Nodes
    25. 25. • Used to represent entity attributes and/or metadata (e.g. timestamps, version) • Key-value pairs • Java primitives • Arrays • null is not a valid value • Every node can have different properties Nodes can have properties
    26. 26. What’s a node?
    27. 27. Relationships
    28. 28. • Relationships are first class citizens • Every relationship has a name and a direction – Add structure to the graph – Provide semantic context for nodes • Properties used to represent quality or weight of relationship, or metadata • Every relationship must have a start node and end node Relationships
    29. 29. Nodes can be connected by more than one relationship Nodes can have more than one relationship Self relationships are allowed Relationships
    30. 30. Labels
    31. 31. Think Gmail labels
    32. 32. • Nodes – Entities • Relationships – Connect entities and structure domain • Properties – Entity attributes, relationship qualities, and metadata • Labels – Group nodes by role Four Building Blocks
    33. 33. Purposeful abstraction of a domain designed to satisfy particular application/end-user goals Models
    34. 34. Model Query Design for Queryability
    35. 35. Model Design for Queryability
    36. 36. Model Query Design for Queryability
    37. 37. • Declarative Pattern-Matching language • SQL-like syntax • Designed for graphs Introducing Cypher
    38. 38. A B C Patterns, patterns, everywhere
    39. 39. a b (a) --> (b) It’s all about the ASCII art!
    40. 40. MATCH (a)-->(b) RETURN a, b a b The most basic query
    41. 41. a ACTED IN m (a)–[:ACTED_IN]->(m) Adding in a relationship type
    42. 42. MATCH (a)-[:ACTED_IN]->(m) RETURN a.name, m.name a ACTED IN m Adding in a relationship type
    43. 43. The football graph
    44. 44. The football graph
    45. 45. Find Arsenal’s away matches
    46. 46. Find Arsenal’s away matches
    47. 47. MATCH (team:Team)<-[:away_team]-(game) WHERE team.name = "Arsenal" RETURN game Find Arsenal’s away matches
    48. 48. MATCH (team:Team)<-[:away_team]-(game) WHERE team.name = "Arsenal" RETURN game.name Graph Pattern
    49. 49. MATCH (team:Team)<-[:away_team]-(game) WHERE team.name = "Arsenal" RETURN game.name Anchor pattern in graph
    50. 50. MATCH (team:Team)<-[:away_team]-(game) WHERE team.name = "Arsenal" RETURN game.name Create projection of results
    51. 51. Find Arsenal’s away matches
    52. 52. Evolving the football graph
    53. 53. Find the top away goal scorers
    54. 54. MATCH (team)<-[:away_team]-(game:Game), (game)<-[:contains_match]-(season:Season), (team)<-[:for]-(stats)<-[:played]-(player), (stats)-[:in]->(game) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10 Find the top away goal scorers
    55. 55. MATCH (team)<-[:away_team]-(game:Game), (game)<-[:contains_match]-(season:Season), (team)<-[:for]-(stats)<-[:played]-(player), (stats)-[:in]->(game) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10 Multiple graph patterns
    56. 56. MATCH (team)<-[:away_team]-(game:Game), (game)<-[:contains_match]-(season:Season), (team)<-[:for]-(stats)<-[:played]-(player), (stats)-[:in]->(game) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10 Anchor pattern in the graph
    57. 57. MATCH (team)<-[:away_team]-(game:Game), (game)<-[:contains_match]-(season:Season), (team)<-[:for]-(stats)<-[:played]-(player), (stats)-[:in]->(game) WHERE season.name = "2012-2013" RETURN player.name, COLLECT(DISTINCT team.name), SUM(stats.goals) as goals ORDER BY goals DESC LIMIT 10 Group by player
    58. 58. Find the top away goal scorers
    59. 59. • Goals scored in each month by Michu • Tottenham results when Gareth Bale scores • What did Wayne Rooney do in April? • Which players only score when a game is televised? Other football queries
    60. 60. Graph Query Design
    61. 61. The relational version
    62. 62. Relational Graphs Tables Nodes - no need to set a property if it - assume records all have the same structure doesn’t exist Foreign keys between tables Relationships - joins calculated at run time - stored as a ‘Pre-computed - the more tables you join to a query the slower the query gets index’ at write time - very easy to do lots of ‘hops’ between relationships Graph vs Relational
    63. 63. Neo4j Server Application H T T P REST Client .NET and Neo4j
    64. 64. Neo4j Server Application H T T P Neo4jClient REST Client .NET and Neo4j
    65. 65. .NET and Neo4j
    66. 66. .NET and Neo4j
    67. 67. .NET and Neo4j
    68. 68. .NET and Neo4j
    69. 69. .NET and Neo4j
    70. 70. Thinking in graphs
    71. 71. Graphs should be fun!
    72. 72. Last Wednesday of the month Ask for help if you get stuck
    73. 73. www.graphdatabases.com Come take a copy, it’s free!
    74. 74. Mark Needham @markhneedham mark.needham@neotechnology.com Questions?

    ×