Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
github.com/maxdemarzi
About 200 public repositories
Max De Marzi
Neo4j Field Engineer
About
Me !
01
02
03
04
maxdemarzi.co...
Experience Technical Doesn’t
Matter
75 % 50% 95%
You go home, thinking about graphs
All that matters
You
Property 

Graph
It’s super simple. 



All you get is:
Property Graph Model
Properties
Nodes
Relationships
What you (probably) already know:
Joins are executed every time
you query the relationship
Executing a Join means to
search for a key
B-Tree Index: O(log(n)...
Same Data, Different Layout
No more Tables, no more Foreign Keys, no more Joins
Relational Databases can’t handle Relationships
Degraded Performance
Speed plummets as data grows
and as the number of joi...
NoSQL Databases can’t handle Relationships
Degraded Performance
Speed plummets as you try to join
data together in the app...
What’s Our
Secret
Sauce?
Fixed Sized Records
“Joins” on Creation
Spin Spin Spin through
this data structure
Pointers instead of
Lookups
1
2
3
4
Neo...
Remains steady as database grows
Real Time Query Performance
Connectedness	and	Size	of	Data	Set
Response	Time
0	to	2	hops
...
I don’t know the average height of all hollywood actors, but I do know the Six Degrees of Kevin Bacon
But not for every qu...
Reimagine your Data as a Graph
Better Performance
Query relationships in real time
Right Language
Cypher was purpose built...
Just draw stuff and “walla” there is your data model
Graphs are Whiteboard Friendly
Movie Property Graph
Some Models are Easy
Should Roles be their own Node?
Some Models are Easy but not for all Questions
How do you model Flight Data?
Airports Nodes with Flying To Relationships
How do you model Flight Data?
Maybe Flight should be its own Node?
How do you model Flight Data?
Don’t we care about Flights only on particular Days?
How do you model Flight Data?
What is this trick with the date in the relationship type?
How do you model Flight Data?
We don’t need Airports if we model this way!
How do you model Flight Data?
Lets get Creative
Group Destinations together!
How do you model Flight Data?
OMG WAT!
How do you model Flight Data?
Do not try and bend the data. That’s im possible.
If they can do it, you can do it!
How do you model Comic Books?
Cloning Twitter
Building a News Feed
9:00 am
@hipster
This is what I had for breakfast! <Insert Image of squirrel food>
8:...
How do others do it?
Cloning Twitter
How do others do it?
Cloning Twitter
The Wrong Way
Modeling a Twitter Feed
A Better Way
Modeling a Twitter Feed
Bigger Model
Modeling a Twitter Feed
getDegree is your Friend
This is Java. What happened to Cypher?
Java Core API
Easy to Learn (no really)
Java Core API
• Step	by	Step	from	GraphDatabaseService	
• Start	a	transaction	(reads	and	writes)...
Get friends of a User
Java Core API
Traversal API
Interesting to Learn
Traversal API
• Start	with	the	Simple	Defaults	(order,	relationships,	depth,	
uniqueness,	etc)	
• Cus...
Example
Traversal API
Cypher
Cypher:	Powerful	and	Expressive	Query	Language
MATCH	(:Person	{	name:“Dan”}	)	-[:LOVES]->	(:Person	{	name:“Ann”}	)	
LOVES
...
MATCH	(boss)-[:MANAGES*0..3]->(sub),	
						(sub)-[:MANAGES*1..3]->(report)	
WHERE	boss.name	=	“John	Doe”	
RETURN	sub.name...
Understanding User Behavior
EventsMetrics
TargetingSearching
Purchase
History
Learn	from	the	Experts
• Alex	Beutel,	CMU	
• Leman	Akoglu,	Stony	Brook	
• Christos	Faloutsos,	CMU	
• Graph-Based	User	Beha...
User	Behavior	Challenges
• How	can	we	understand	
normal	user	behavior?
User	Behavior	Challenges
• How	can	we	understand	
normal	user	behavior?	
• How	can	we	find	
suspicious	behavior?
User	Behavior	Challenges
• How	can	we	understand	
normal	user	behavior?	
• How	can	we	find	
suspicious	behavior?	
• How	ca...
Does	your	little	girl	like	Rambo?
Demographics:	Age
Demographics:	Gender
Do	Little	Girls	like	Movies	other	Little	Girls	Like?
Yes!	Little	Girls	like	Movies	other	Little	Girls	Like
What	do	Little	Girls	Like?
MATCH	(u:User)-[r:RATED]->(m:Movie)

WHERE	u.age	=	1	AND	u.gender	=	"F"	AND	r.stars	>	3

RETURN...
What	do	Little	Girls	Like?
What	do	Men	25-34	Like?
MATCH	(u:User)-[r:RATED]->(m:Movie)

WHERE	u.age	=	25	AND	u.gender	=	"M"	AND	r.stars	>	3

RETURN	m...
What	do	Men	25-34	Like?
Modeling	“Normal”	Behavior
• Predict	Edges

(Similar	Users)
Modeling	“Normal”	Behavior
• Predict	Edges

(Movies	I	should	Watch)
What	Rating	should	I	give	101	Dalmatians?
MATCH	(me:User	{id:1})-[r1:RATED]->(m:Movie)

<-[r2:RATED]-(:User)-[r3:RATED]->
...
Modeling	“Normal”	Behavior
• Predict	Edges	
• Predict	Node	Attributes	
• Predict	Edge	Attributes	
• Clustering	and	
Commun...
Predict	a	Star	Rating	purely	on	Demographics
MATCH	(u:User)-[r:RATED]->(m:Movie	{title:”Toy	Story”})

WHERE	u.age	=	1	AND	...
Modeling	“Normal”	Behavior
• Predict	Edges	
• Predict	Node	Attributes	
• Predict	Edge	Attributes	
• Clustering	and	
Commun...
Two	Sides	of	the	Same	Coin
Recommendations	
• Add	the	relationship	
that	does	not	exist	
Fraud	Detection	
• Find	the	relat...
Modeling	User	Behavior
• Modeling	normal	users	
and	detecting	anomalies	
are	two	sides	of	
understanding	user	
behavior
Recommendation
Engines
Hello	World	Recommendation
Hello	World	Recommendation
Movie	Data	Model
Cypher	Query:	Movie	Recommendation
MATCH	(watched:Movie	{title:"Toy	Story”})	<-[r1:RATED]-	()	-[r2:RATED]->	(unseen:Movie)...
Movie	Data	Model
Cypher	Query:	k-NN	Recommendation
MATCH	(m:Movie)	<-[r:RATED]-	(b:Person)	-[s:SIMILARITY]-	(p:Person	{name:'Zoltan	Varju'}...
Cypher
Stored
Procedures
Combine any APIs
Cypher Stored Procedures
Don’t use SOLR Facets for this!
Multiple Dimensions
AgeSize
FeaturesProperty
Cost
Multiple Dimensions
Java 

Audio Book!
What about Publisher? 

What about Author? 

What about Publication Year?
What abou...
Bucket or Group Values if you have to
Discrete Values for Each Dimension
Nodes for Discrete Dimensional Values
Dimensional Model
*Use Named Relationship Types instead of HAS
Stupid Glasses
Loud Pants
Skate Boards
Neon Colors
1
2
3
4
Who remembers this?
Look at how thick they were, even back in 1902!
It’s a Sears Catalogue!
Ares Predator
Street Samurai Catalog
With free two day shipping!
Cypher Version of the Catalog
A tree is a simple graph
A Tree of Data
So fast, it’s not even funny.
Promotions
About 2-4M Traversals per second per core
Traversing a 50 level Tree UP costs pra...
Connecting	unconnected	Things	indirectly
What	are	the	Top	10	Jobs	for	me	
• that	are	in	the	same	location	I’m	in	
• for	which	I	have	the	necessary	qualifications
Partial	Subgraph	Search
Data Cleansing
Look for Shared Features
Calculate Similarity
Connect
Extract Features
1
2
3
4
Using an Anchor
and many
more use
cases!
Thank You!
Neo4j Presentation
Neo4j Presentation
Neo4j Presentation
Neo4j Presentation
Neo4j Presentation
Neo4j Presentation
Neo4j Presentation
Upcoming SlideShare
Loading in …5
×

Neo4j Presentation

1,036 views

Published on

Neo4j Presentation from Graph Day Chicago

Published in: Technology
  • Be the first to comment

Neo4j Presentation

  1. 1. github.com/maxdemarzi About 200 public repositories Max De Marzi Neo4j Field Engineer About Me ! 01 02 03 04 maxdemarzi.com @maxdemarzi
  2. 2. Experience Technical Doesn’t Matter 75 % 50% 95%
  3. 3. You go home, thinking about graphs All that matters You
  4. 4. Property 
 Graph It’s super simple. 
 
 All you get is: Property Graph Model Properties Nodes Relationships
  5. 5. What you (probably) already know:
  6. 6. Joins are executed every time you query the relationship Executing a Join means to search for a key B-Tree Index: O(log(n)) Your data grows by 10x, your speed slows down by half More Data = More Searches Slower Performance The Problem 1 2 3 4
  7. 7. Same Data, Different Layout No more Tables, no more Foreign Keys, no more Joins
  8. 8. Relational Databases can’t handle Relationships Degraded Performance Speed plummets as data grows and as the number of joins grows Wrong Language SQL was built with Set Theory in mind, not Graph Theory Not Flexible New types of data and relationships require schema redesign Wrong Model They cannot model or store relationships without complexity1 2 3 4
  9. 9. NoSQL Databases can’t handle Relationships Degraded Performance Speed plummets as you try to join data together in the application Wrong Languages Lots of wacky “almost sql” languages terrible at “joins” Not ACID Eventually Consistent means Eventually Corrupt Wrong Model They cannot model or store relationships without complexity1 2 3 4
  10. 10. What’s Our Secret Sauce?
  11. 11. Fixed Sized Records “Joins” on Creation Spin Spin Spin through this data structure Pointers instead of Lookups 1 2 3 4 Neo4j Secret Sauce
  12. 12. Remains steady as database grows Real Time Query Performance Connectedness and Size of Data Set Response Time 0 to 2 hops
 0 to 3 degrees
 Thousands of connections Tens to hundreds of hops
 Thousands of degrees
 Billions of connections Relational and
 Other NoSQL
 Databases Neo4j Neo4j is 
 1000x faster
 Reduces minutes 
 to milliseconds
  13. 13. I don’t know the average height of all hollywood actors, but I do know the Six Degrees of Kevin Bacon But not for every query
  14. 14. Reimagine your Data as a Graph Better Performance Query relationships in real time Right Language Cypher was purpose built for Graphs Flexible and Consistent Evolve your schema seamlessly while keeping transactions Right Model Graphs simplify how you think 1 2 3 4 Agile, High Performance and Scalable without Sacrifice
  15. 15. Just draw stuff and “walla” there is your data model Graphs are Whiteboard Friendly
  16. 16. Movie Property Graph Some Models are Easy
  17. 17. Should Roles be their own Node? Some Models are Easy but not for all Questions
  18. 18. How do you model Flight Data?
  19. 19. Airports Nodes with Flying To Relationships How do you model Flight Data?
  20. 20. Maybe Flight should be its own Node? How do you model Flight Data?
  21. 21. Don’t we care about Flights only on particular Days? How do you model Flight Data?
  22. 22. What is this trick with the date in the relationship type? How do you model Flight Data?
  23. 23. We don’t need Airports if we model this way! How do you model Flight Data?
  24. 24. Lets get Creative
  25. 25. Group Destinations together! How do you model Flight Data?
  26. 26. OMG WAT! How do you model Flight Data?
  27. 27. Do not try and bend the data. That’s im possible.
  28. 28. If they can do it, you can do it! How do you model Comic Books?
  29. 29. Cloning Twitter Building a News Feed 9:00 am @hipster This is what I had for breakfast! <Insert Image of squirrel food> 8:30 am @neo4j Automated tweet telling me about Graph Connect 2017 in NYC on Oct 23-24 8:12 am @ex-coworker Stuff I no longer care about. 8:03 am @someguy Inspirational Quote of the Day
  30. 30. How do others do it? Cloning Twitter
  31. 31. How do others do it? Cloning Twitter
  32. 32. The Wrong Way Modeling a Twitter Feed
  33. 33. A Better Way Modeling a Twitter Feed
  34. 34. Bigger Model Modeling a Twitter Feed
  35. 35. getDegree is your Friend
  36. 36. This is Java. What happened to Cypher?
  37. 37. Java Core API
  38. 38. Easy to Learn (no really) Java Core API • Step by Step from GraphDatabaseService • Start a transaction (reads and writes) • findNode(Label, Property, Value) • findNodes(Label, Property, Value) • findNodes(Label) • getNodeById(Long) • getRelationships(Direction, Type) • getProperty(Property, (optional) Default Value)
  39. 39. Get friends of a User Java Core API
  40. 40. Traversal API
  41. 41. Interesting to Learn Traversal API • Start with the Simple Defaults (order, relationships, depth, uniqueness, etc) • Custom Expanders • Where should I go next • Custom Evaluators • I’ve gone there… should I accept this path?
  42. 42. Example Traversal API
  43. 43. Cypher
  44. 44. Cypher: Powerful and Expressive Query Language MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} ) LOVES Dan Ann Label Property Label Property Node Node
  45. 45. MATCH (boss)-[:MANAGES*0..3]->(sub), (sub)-[:MANAGES*1..3]->(report) WHERE boss.name = “John Doe” RETURN sub.name AS Subordinate, 
 count(report) AS Total Express Complex Queries Easily with Cypher Find all direct reports and 
 how many people they manage, 
 up to 3 levels down Cypher QuerySQL Query
  46. 46. Understanding User Behavior EventsMetrics TargetingSearching Purchase History
  47. 47. Learn from the Experts • Alex Beutel, CMU • Leman Akoglu, Stony Brook • Christos Faloutsos, CMU • Graph-Based User Behavior Modeling: From Prediction to Fraud Detection • http://www.cs.cmu.edu/~abeutel/kdd2015_tutorial/
  48. 48. User Behavior Challenges • How can we understand normal user behavior?
  49. 49. User Behavior Challenges • How can we understand normal user behavior? • How can we find suspicious behavior?
  50. 50. User Behavior Challenges • How can we understand normal user behavior? • How can we find suspicious behavior? • How can we distinguish the two?
  51. 51. Does your little girl like Rambo?
  52. 52. Demographics: Age
  53. 53. Demographics: Gender
  54. 54. Do Little Girls like Movies other Little Girls Like?
  55. 55. Yes! Little Girls like Movies other Little Girls Like
  56. 56. What do Little Girls Like? MATCH (u:User)-[r:RATED]->(m:Movie)
 WHERE u.age = 1 AND u.gender = "F" AND r.stars > 3
 RETURN m.title, COUNT(r) AS cnt
 ORDER BY cnt DESC
 LIMIT 10
  57. 57. What do Little Girls Like?
  58. 58. What do Men 25-34 Like? MATCH (u:User)-[r:RATED]->(m:Movie)
 WHERE u.age = 25 AND u.gender = "M" AND r.stars > 3
 RETURN m.title, COUNT(r) AS cnt
 ORDER BY cnt DESC
 LIMIT 10
  59. 59. What do Men 25-34 Like?
  60. 60. Modeling “Normal” Behavior • Predict Edges
 (Similar Users)
  61. 61. Modeling “Normal” Behavior • Predict Edges
 (Movies I should Watch)
  62. 62. What Rating should I give 101 Dalmatians? MATCH (me:User {id:1})-[r1:RATED]->(m:Movie)
 <-[r2:RATED]-(:User)-[r3:RATED]->
 (m2:Movie {title:”101 Dalmatians”})
 WHERE ABS(r1.stars-r2.stars) <=1
 RETURN AVG(r3.stars)
  63. 63. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes • Predict Edge Attributes • Clustering and Community Detection
  64. 64. Predict a Star Rating purely on Demographics MATCH (u:User)-[r:RATED]->(m:Movie {title:”Toy Story”})
 WHERE u.age = 1 AND u.gender = "F" 
 RETURN AVG(r.stars)
  65. 65. Modeling “Normal” Behavior • Predict Edges • Predict Node Attributes • Predict Edge Attributes • Clustering and Community Detection • Fraud Detection
  66. 66. Two Sides of the Same Coin Recommendations • Add the relationship that does not exist Fraud Detection • Find the relationships that should not exist
  67. 67. Modeling User Behavior • Modeling normal users and detecting anomalies are two sides of understanding user behavior
  68. 68. Recommendation Engines
  69. 69. Hello World Recommendation
  70. 70. Hello World Recommendation
  71. 71. Movie Data Model
  72. 72. Cypher Query: Movie Recommendation MATCH (watched:Movie {title:"Toy Story”}) <-[r1:RATED]- () -[r2:RATED]-> (unseen:Movie) WHERE r1.rating > 7 AND r2.rating > 7 AND watched.genres = unseen.genres AND NOT( (:Person {username:”maxdemarzi"}) -[:RATED|WATCHED]-> (unseen) ) RETURN unseen.title, COUNT(*) ORDER BY COUNT(*) DESC LIMIT 25 What are the Top 25 Movies • that I haven't seen • with the same genres as Toy Story • given high ratings • by people who liked Toy Story
  73. 73. Movie Data Model
  74. 74. Cypher Query: k-NN Recommendation MATCH (m:Movie) <-[r:RATED]- (b:Person) -[s:SIMILARITY]- (p:Person {name:'Zoltan Varju'}) WHERE NOT( (p) -[:RATED|WATCHED]-> (m) ) WITH m, s.similarity AS similarity, r.rating AS rating ORDER BY m.name, similarity DESC WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS recommendation ORDER BY recommendation DESC RETURN movie, recommendation
 LIMIT 25 What are the Top 25 Movies • that Zoltan Varju has not seen • using the average rating • by my top 3 neighbors
  75. 75. Cypher Stored Procedures
  76. 76. Combine any APIs Cypher Stored Procedures
  77. 77. Don’t use SOLR Facets for this! Multiple Dimensions AgeSize FeaturesProperty Cost
  78. 78. Multiple Dimensions Java 
 Audio Book! What about Publisher? 
 What about Author? 
 What about Publication Year? What about Java Version?
 What About…. Left parentheses, n, right parentheses, semi-colon!
  79. 79. Bucket or Group Values if you have to Discrete Values for Each Dimension
  80. 80. Nodes for Discrete Dimensional Values Dimensional Model *Use Named Relationship Types instead of HAS
  81. 81. Stupid Glasses Loud Pants Skate Boards Neon Colors 1 2 3 4 Who remembers this?
  82. 82. Look at how thick they were, even back in 1902! It’s a Sears Catalogue!
  83. 83. Ares Predator Street Samurai Catalog
  84. 84. With free two day shipping! Cypher Version of the Catalog
  85. 85. A tree is a simple graph A Tree of Data
  86. 86. So fast, it’s not even funny. Promotions About 2-4M Traversals per second per core Traversing a 50 level Tree UP costs practically nothing.
  87. 87. Connecting unconnected Things indirectly
  88. 88. What are the Top 10 Jobs for me • that are in the same location I’m in • for which I have the necessary qualifications
  89. 89. Partial Subgraph Search
  90. 90. Data Cleansing Look for Shared Features Calculate Similarity Connect Extract Features 1 2 3 4
  91. 91. Using an Anchor
  92. 92. and many more use cases!
  93. 93. Thank You!

×