This document summarizes Michael Hunger's presentation on how graphs make databases fun again. Some key points:
- Traditional relational databases have issues modeling connected data and performing complex queries over relationships. Graph databases like Neo4j can more naturally represent connected data as nodes and relationships.
- Neo4j was originally created to solve issues modeling connected data for a digital asset management system. It uses a graph data model and allows complex relationship queries through its Cypher query language.
- The document demonstrates importing meetup data into Neo4j and running queries to find connections between users, groups, and topics. It also shows examples of querying actor relationships and movie data.
- Tools are presented
3. Topics
• Databases are No Fun
• Relational Pain -> Graph Fun
• The world is a Graph
• Neo4j
• Model, Query, Import
• Having Fun in the Developer Zone
• GitHub Events
• Software Analytics
• Neo4j from Java
8. History of Neo4j - Problem
• Digital Asset Management System in 2000
• SaaS many users in many countries
• Two hard use-cases
• Multi language keyword search
• Including synonyms / word hierarchies
• Access Management to Assets for SaaS Scale
• Groups, Hierarchies, Permissions, Realtime
9. History of Neo4j – Relational Attempt
• Tried with many relational DBs
• JOIN Performance Problems
• Hierarchies, Networks, Graphs
• Modeling Problems
• Data Model evolution
• No Success, even …
• With expensive database consultants!
10. History of Neo4j – First working Implementation
• Graph Model & API sketched on a napkin
• Nodes connected by Relationships
• Just like your conceptual model
• Implemented network-database in memory
• Java API, fast Traversals
• Worked well, but …
• No persistence, No Transactions
• Long import / export time from relational storage
11. History of Neo4j - Solution
• Evolved to full fledged database in Java
• With persistence using files + memory mapping
• Transactions with Transaction Log (WAL)
• Lucene for fast entity lookup
• Founded Company in 2007
• Neo4j (REST)-Server
• Neo4j Clustering & HA
• Cypher Query Language
• Today …
12. Neo Technology Overview
Product
• Neo4j - World’s leading graph
database
• 1M+ downloads, adding 70k+
per month
• 150+ enterprise subscription
customers including over
50 of the Global 2000
Company
• Neo Technology, Creator of Neo4j
• 100+ employees with HQ in Silicon
Valley, London, Munich, Paris and
Malmö
• $45M in funding from Fidelity,
Sunstone, Conor, Creandum,
Dawn Capital
13. What, Who, Where, How?
Financial
Services
Communications
Health &
Life
Sciences
HR &
Recruiting
Media &
Publishing
Social
Web
Industry
& Logistics
Entertainment Consumer Retail Information ServicesBusiness Services
http://neo4j.com/use-cases http://neo4j.com/customers
15. What is it with Relationships?
• World is full of connected people, events, things
• There is “Value in Relationships” !
• What about Data Relationships?
• How do you store your object model?
• How do you explain
JOIN tables to your boss?
16. Neo4j – allows you to connect the dots
• Was built to efficiently
• store,
• query and
• manage highly connected data
• Transactional, ACID
• Real-time OLTP
• Open source
• Highly scalable already on few machines
17. Value from Data Relationships
Common Use Cases
Internal Applications
Master Data Management
Network and
IT Operations
Fraud Detection
Customer-Facing Applications
Real-Time Recommendations
Graph-Based Search
Identity and
Access Management
22. Teaser: Meetup.com Import
• For a Meetup Event
• Import Attendees
• For each Attendee
• Import Interests / Topics
• Import other Meetup Memberships
• Other groups our members are in
• Top 10 topics
• Topics & Groups of active Member
https://github.com/ikwattro/meetup2neo
http://markhneedham.com/blog?s=meetup
23. From RDBMS to Neo4j
Relational Pains =
Graph Pleasure
24. Relational DBs Can’t Handle Relationships Well
• Data Model built for tabular forms not JOINS
managing connections was bolted on both in
schema and query
• Strict schema not suitable for variable structured
data which is generated and used by todays
applications
• Data volume and JOIN number affect cost of query
operation exponentially
• Variable hierarchies and networks are hard to store
and query so many “patterns” were developed
… often only denormalization makes complex
relational queries fast but destroys the good
normalized data-model
Built for Forms
Joins are expensive
Denormalize #FTW
25. Unlocking Value from Your Data Relationships
• Model your data naturally as a graph
of data and relationships
• Drive graph model from domain and
use-cases
• Use relationship information in real-
time to transform your business
• Add new relationships on the fly to
adapt to your changing requirements
26. High Query Performance with a Native Graph DB
• Relationships are first class citizen
• No need for joins, just follow pre-
materialized relationships of nodes
• Query & Data-locality – navigate out
from your starting points
• Only load what’s needed
• Aggregate and project results as you
go
• Optimized disk and memory model
for graphs
27. Relational Versus Graph Models
Relational Model Graph Model
KNOWS
ANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREAS
DELIA
TOBIAS
MICA
31. High Query Performance: Some Numbers
• Traverse 2-4M+ relationships per
second and core
• Cost based query optimizer –
complex queries return in
milliseconds
• Import 100K-1M records per second
transactionally
• Bulk import tens of billions of records
in a few hours
33. The Whiteboard Model Is the Physical Model
Eliminates Graph-to-
Relational Mapping
In your data model
Bridge the gap
between business
and IT models
In your application
Greatly reduce need
for application code
34. CAR
name: “Dan”
born: May 29, 1970
twitter: “@dan”
name: “Ann”
born: Dec 5, 1975
since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Property Graph Model Components
Nodes
• The objects in the graph
• Can have name-value properties
• Can be labeled
Relationships
• Relate nodes by type and direction
• Can have name-value properties
LOVES
LOVES
LIVES WITH
PERSON PERSON
35. Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
36. Getting Data into Neo4j
Cypher-Based “LOAD CSV” Capability
• Transactional (ACID) writes
• Initial and incremental loads of up to
10 million nodes and relationships
Command-Line Bulk Loader neo4j-import
• For initial database population
• For loads up to 10B+ records
• Up to 1M records per second
4.58 million things
and their relationships…
Loads in 100 seconds!
CSV
41. Query Comparison: Colleagues of Tom Hanks?
SELECT *
FROM Person as actor
JOIN ActorMovie AS am1 ON (actor.id = am1.actor_id)
JOIN ActorMovie AS am2 ON (am1.movie_id = am2.movie_id)
JOIN Person AS coll ON (coll.id = am2.actor_id)
WHERE actor.name = "Tom Hanks“
MATCH
(actor:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coll:Person)
WHERE actor.name = "Tom Hanks"
RETURN *
43. Most prolific actors and their filmography?
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, count(*), collect(m.title) as movies
ORDER BY count(*) desc, p.name asc
LIMIT 10;
45. Neo4j Query Planner
Cost based Query Planner since Neo4j 2.2
• Uses database stats to select best plan
• Currently for Read Operations
• Query Plan Visualizer, finds
• Non optimal queries
• Cartesian Product
• Missing Indexes, Global Scans
• Typos
• Massive Fan-Out
55. Event: Watch a Repository
MATCH (w:WatchEvent)
WITH w LIMIT 1
MATCH p = (w)-[:EVENT_TIME]->(:Minute)<-[:CHILD*]-(:Year),
(w)-[:EVENT_ACTOR]->(u:User)-->(r:Repository)
<-[:WATCHED_REPOSITORY]-(w)
RETURN *;
https://twitter.com/ikwattro/status/618431227100532737
In the near future, many of your apps will be driven by data relationships and not transactions
You can unlock value from business relationships with Neo4j
Presenter Notes - Challenges with current technologies?
Database options are not suited to model or store data as a network of relationships
Performance degrades with number and levels of relationships making it harder to use for real-time applications
Not flexible to add or change relationships in realtime
Presenter Notes - Challenges with current technologies?
Database options are not suited to model or store data as a network of relationships
Performance degrades with number and levels of relationships making it harder to use for real-time applications
Not flexible to add or change relationships in realtime
Presenter Notes - How does one take advantage of data relationships for real-time applications?
To take advantage of relationships
Data needs to be available as a network of connections (or as a graph)
Real-time access to relationship information should be available regardless of the size of data set or number and complexity of relationships
The graph should be able to accommodate new relationships or modify existing ones
Presenter Notes - How does one take advantage of data relationships for real-time applications?
To take advantage of relationships
Data needs to be available as a network of connections (or as a graph)
Real-time access to relationship information should be available regardless of the size of data set or number and complexity of relationships
The graph should be able to accommodate new relationships or modify existing ones
680/605
Presenter Notes - How does one take advantage of data relationships for real-time applications?
To take advantage of relationships
Data needs to be available as a network of connections (or as a graph)
Real-time access to relationship information should be available regardless of the size of data set or number and complexity of relationships
The graph should be able to accommodate new relationships or modify existing ones
In the near future, many of your apps will be driven by data relationships and not transactions
You can unlock value from business relationships with Neo4j
In the near future, many of your apps will be driven by data relationships and not transactions
You can unlock value from business relationships with Neo4j
In the near future, many of your apps will be driven by data relationships and not transactions
You can unlock value from business relationships with Neo4j
In the near future, many of your apps will be driven by data relationships and not transactions
You can unlock value from business relationships with Neo4j