Neo4j Training Introduction

Introduction to Neo4j
Graph Database

github.com/maxdemarzi
About 200 public repositories
Max De Marzi
Neo4j Field Engineer
About
Me !
01
02
03
04
maxdemarzi.com
@maxdemarzi
About 175 blog posts

• Relational Databases
• Graph Databases
• The most important slide about
Neo4j you will ever see
• A few slides about Modeling
• The Graph Platform
• Neo4j Cloud (aka Aura)
• Talking to Neo4j
• Neo4j Use Cases
Agenda

Lost and Busted
Relational Databases

What does a Relational Database look like?

Relational Databases look like Trees

https://jvns.ca/blog/2014/09/27/how-does-sqlite-work-part-1-pages/

1 Table
Lots of Pages
Many Hops from
one page to another

https://jvns.ca/blog/2014/10/02/how-does-sqlite-work-part-2-btrees/

First we search for an id in the Index B- tree for the RowId
Then we search the Table B-tree to get to the data.
Inside each Page, we do a Binary search for which page to go to next.

What you (probably) already know:

Joins are executed every time
you query the relationship
Executing a Join means to
search for a key
B-Tree Index: O(log(n))
Your data grows, your search time
goes up
More Data = More Searches
Slower Performance
The Problem
1
2
3
4

Relational Databases can’t handle
Relationships
Degraded Performance
Speed plummets as data grows
and as the number of joins grows
Wrong Language
SQL was built with Set Theory in
mind, not Graph Theory
Not Flexible
New types of data and relationships
require schema redesign
Wrong Model
They cannot model or store
relationships without complexity
1
2
3
4

Wrangle your data efficiently
Before it eats you alive

NoSQL Databases can’t handle
Relationships
Degraded Performance
Speed plummets as you try to join
data together in the application
Wrong Languages
Lots of wacky “almost sql”
languages terrible at “joins”
Not ACID
Eventually Consistent means
Eventually Corrupt
Wrong Model
They cannot model or store
relationships without complexity
1
2
3
4

Graph Databases
The New Hotness

Property Graph Model Components
Nodes
• Relate nodes by type and direction
• Can have Properties
• Can have Labels
• Can have Properties
name:”Dan”
born: May 29, 1970
twitter:”@dan”
name:”Ann”
born: Dec 5, 1975
Since:
Jan 10, 2011
brand: “Volvo”
model: “V70”
Car
LOVES
LIVES_WITH
Person
Relationships
Person

Same Data, Different Layout
No more Tables, no more Foreign Keys, no more Joins

Double Linked List Relationship Layout

The Most Important Slide about
Neo4j you will ever see

Fixed Sized Records
“Joins” on Creation
Spin Spin Spin through
this data structure
Pointers instead of
Searches1
2
3
4

Partitions
Each Node’s
relationships are
partitioned by type
and direction.

Real-Time Query Performance
Relational and Other NoSQL
Databases
ResponseTime
Connectedness and Size of Data Set
0 to 2 hops
0 to 3 degrees
Few connections
5+ hops
3+ degrees
Thousands of connections
1000x
Advantage
“Minutes to milliseconds”
Neo4j

I don’t know the average height of all hollywood actors, but I do know the Six Degrees of Kevin Bacon
But not for every query

Reimagine your Data as a Graph
Better Performance
Query relationships in real time
Right Language
Cypher was purpose built for
Graphs
Flexible and Consistent
Evolve your schema seamlessly while
keeping transactions
Right Model
Graphs simplify how you think1
2
3
4
Agile, High Performance
and Scalable without Sacrifice

Just draw stuff and “walla” there is your data model
Graphs are Whiteboard Friendly

Movie Property Graph
Some Models are Easy

Should Roles be their own Node?
Some Models are Easy but not for all
Questions

We’ll talk more about Modeling tomorrow.

Graph
Transactions
Graph
Analytics
Data Integration
Development
& Admin
Analytics
Tooling
Drivers & APIs Discovery & Visualization
Developers
Admins
Applications Business Users
Data Analysts
Data Scientists
Enterprise Data Hub
Native Graph Technology for Applications & Analytics

Graph Databases: Designed for Connected Data
TRADITIONAL
DATABASES
BIG DATA
TECHNOLOGY
Store and retrieve data Aggregate and filter data Connections in data
Real time storage & retrieval Real-Time Connected Insights
Long running queries
aggregation & filtering

Perspective
Search
Visualization
Exploration
Inspection
Editing
Visually Explore your Neo4j Graph with Bloom
Business view of the graph enables analysts to
discover new insights
Codeless “Search first” experience makes it
easy for non-developers to pick up graphs
Easy-to-use graph interactions to explore,
inspect or edit connected data
GPU accelerated high performance
visualizations enable macro graph views
Deploys easily with Neo4j Desktop or as a
Neo4j Server plug-in component
Quickly prototype projects and enable
collaboration between developers and
business users

Neo4j Bloom User Interface
• Prompted Search
• Property Browser &
editor
• Category icons and
color scheme
• Pan, Zoom & Select

The most popular BI tools can now talk live to the
world’s most popular graph db
• Best live, seamless integration of graph data
with your favorite BI tools
• Familiar UI for end users
• No development effort for IT
• Democratizes access to Neo4j data
• Free to adopt by BI teams of Enterprise
Edition customers
Neo4j BI Connector
Tableau
JDBC
Neo4j
BI Connector
SQL
Cypher
Business/Data Analyst
Investigator
Data Scientist

for Enterprise Graph Data Science
Neo4j Graph Data Science
Library
Scalable Graph
Algorithms & Analytics
Workspace
Native Graph
Creation & Persistence
Neo4j
Database
Visual Graph Exploration
& Prototyping
Neo4j
Bloom
Practical Integrated Intuitive

• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality & Approximate
• PageRank
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count
• Clustering Coefficients
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• Balanced Triad (identification)
Graph Algorithms & Functions in Neo4j
• Shortest Path
• Single-Source Shortest Path
• All Pairs Shortest Path
• A* Shortest Path
• Yen’s K Shortest Path
• Minimum Weight Spanning Tree
• K-Spanning Tree (MST)
• Random Walk
• Depth First Search
• Breadth First Search
• Triangle Count
• Local Clustering Coefficient
• Connected Components (Union Find)
• Strongly Connected Components
• Label Propagation
• Louvain Modularity
• K-1 Coloring
• Modularity Optimization
• Euclidean Distance
• Cosine Similarity
• Node Similarity (Jaccard)
• Overlap Similarity
• Pearson Similarity
• Approximate KNN
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors
...and also Auxiliary Functions:
• Random graph generation
• Graph export
• One hot encoding
• Distributions & metrics

Neo4j Integrates with Common Architectures
From Disparate Silos
To Cross-Silo Connections
From Tabular Data
To Connected Data
From Data Lake to Real-Time
Operations

Lots of Plugins…
and you can make your own.

Official
Neo4j Drivers
Community
Neo4j Drivers

Neo4j Cloud offerings to suit every need
Database-as-a-service Self-hosted Cloud Managed Services (CMS)
Cloud-native service
Zero administration Pay-as-you-
go
Self-service deployment
Cloud-native stack
No access to underlying infra and
systems.
Self hosted and managed
Any cloud (AWS, GCP, Azure)
Bring-your-own-license
Self-manage software, infra in own
private cloud
Own data, tenant, security
>50% deploy this way
White-glove fully managed
service by Neo4j experts
Fully customizable deployment model
and service levels
Operate In own data centers or Virtual
Private Cloud

Fully managed cloud-native Neo4j graph
database service, for the cloud-first
developer
• Fully automated with zero administration
• Faster innovation with the power of graphs
• Scalable on-demand dynamically
• Worry-free security and reliability
• Simple pay-as-you-go pricing

Talking to Neo4j
The good, the bad and the ugly

Easy to Learn (by Java Devs)
• Step by Step from GraphDatabaseService
• Start a transaction (reads and writes)
• findNode(Label, Property, Value)
• findNodes(Label, Property, Value)
• findNodes(Label)
• getNodeById(Long)
• getRelationships(Direction, Type)
• getProperty(Property, (optional) Default Value)

Not so Easy to Learn (by Java Devs)
•Start with the Simple Defaults :
order, relationships, depth, uniqueness, etc
•Custom Expanders
•Where should I go next
•Custom Evaluators
•I’ve gone there… should I accept this path?

Cypher: Powerful and Expressive Query Language
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
LOVES
Dan Ann
Label Property Label Property
Node Node

MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
WHERE boss.name = “John Doe”
RETURN sub.name AS Subordinate,
count(report) AS Total
Express Complex Queries Easily with Cypher
Find all direct reports and
how many people they manage,
up to 3 levels down
Cypher QuerySQL Query

Combine any APIs
Cypher Stored Procedures
https://maxdemarzi.com/2017/01/26/writing-a-cypher-stored-procedure/

Boring Java Code for Non Java Devs
https://maxdemarzi.com/2019/01/28/neo4j-stored-procedures-for-devs-that-dont-know-java-yet/
It’s only 372 Slides.

Highly Valuable Connected Data Use Cases
Drive Enterprise Adoption
Network &
IT Operations
Fraud
Detection
Identity & Access
Management
Knowledge
Graph
Master Data
Management
Real-Time
Recommendations

• Record “Cyber Monday” sales
• About 35M daily transactions
• Each transaction is 3-22 hops
• Queries executed in 4ms or less
• Replaced IBM Websphere commerce
• 300M pricing operations per day
• 10x transaction throughput on half the hardware
compared to Oracle
• Replaced Oracle database
• Large postal service with over 500k employees
• Neo4j routes 7M+ packages daily at peak, with
peaks of 5,000+ routing operations per second.
Handling Large Graph Work Loads for Enterprises
Real-time promotion
recommendations
Marriott’s Real-time
Pricing Engine
Handling Package
Routing in Real-Time

• 27 Million warranty & service documents parsed
for text to knowledge graph
• Graph is context for AI to learn “prime examples”
and anticipate maintenance
• Improves satisfaction and equipment lifespan
• Connecting 50 research databases, 100k’s of Excel
workbooks, 30 bio-sample databases
• Bytes 4 Diabetes Award for use of a knowledge
graph, graph analytics, and AI
• Customized views for flexible research angles
• Almost 70% of CC fraud was missed
• ~1B Nodes and Relationships to analyze
• Graph analytics with queries & algorithms help
find $ millions of fraud in 1st year
Improving Analytics, ML & AI for Enterprises
Caterpillar’s AI Supply
Chain & Maintenance
German Center for
Diabetes Research (DZD)
Financial Fraud
Detection & Recovery
Top 10
Bank

Cypher Query: Movie Recommendation
MATCH (watched:Movie {title:"Toy Story”}) <-[r1:RATED]- (p2) -[r2:RATED]-> (unseen:Movie), (p)
WHERE r1.rating > 7 AND r2.rating > 7 AND p2.gender = “female” AND p2.age < 35
AND watched.genres = unseen.genres
AND NOT( (p:Person) -[:RATED|WATCHED]-> (unseen) )
AND p.username IN [“maxdemarzi”,”janedoe”,”jamesdean”]
RETURN unseen.title, COUNT(*)
ORDER BY COUNT(*) DESC
LIMIT 25
What are the Top 25 Movies
• that I haven't seen
• with the same genres as Toy Story
• given high ratings
• by women under 35 who liked Toy Story

Let’s try k-nearest neighbors (k-NN)
Cosine Similarity

Cypher Query: Ratings of Two Users
MATCH (p1:Person {name:'Michael Sherman’}) -[r1:RATED]-> (m:Movie),
(p2:Person {name:'Michael Hunger’}) -[r2:RATED]-> (m:Movie)
RETURN m.name AS Movie,
r1.rating AS `M. Sherman's Rating`,
r2.rating AS `M. Hunger's Rating`
What are the Movies these 2 users have both rated

Cypher Query: Ratings of Two Users
Calculating Cosine Similarity

Cypher Query: Cosine Similarity
MATCH (p1:Person) -[x:RATED]-> (m:Movie) <-[y:RATED]- (p2:Person)
WITH SUM(x.rating * y.rating) AS xyDotProduct,
SQRT(REDUCE(xDot = 0.0, a IN COLLECT(x.rating) | xDot + a^2)) AS xLength,
SQRT(REDUCE(yDot = 0.0, b IN COLLECT(y.rating) | yDot + b^2)) AS yLength,
p1, p2
MERGE (p1)-[s:SIMILARITY]-(p2)
SET s.similarity = xyDotProduct / (xLength * yLength)
Calculate it for all Person nodes with at least one Movie between them

Available in the Graph Data Science Library
• Jaccard Similarity
• Cosine Similarity
• Pearson Similarity
• Euclidian Distance
• Overlap Similarity

Cypher Query: k-NN Recommendation
MATCH (m:Movie) <-[r:RATED]- (b:Person) -[s:SIMILARITY]- (p:Person {name:'Zoltan Varju'})
WHERE NOT( (p) -[:RATED|WATCHED]-> (m) )
WITH m, s.similarity AS similarity, r.rating AS rating
ORDER BY m.name, similarity DESC
WITH m.name AS movie, COLLECT(rating)[0..3] AS ratings
WITH movie, REDUCE(s = 0, i IN ratings | s + i)*1.0 / LENGTH(ratings) AS recommendation
ORDER BY recommendation DESC
RETURN movie, recommendation
LIMIT 25
What are the Top 25 Movies
• that Zoltan Varju has not seen
• using the average rating
• by my top 3 neighbors

Top Jobs a user qualifies for in the same location

• Number of Applicants to a Job
• Wholesale Resume sales
• Selling your aggregated Data
Just one tiny itsy bitsy problem:
Job Boards get paid by:

Two Way Matches
Find your soulmate in the graph
• Are they energetic?
• Do they like dogs?
• Have a good sense of humor?
• Neat and tidy, but not crazy about it?
What are the Top 10 Potential Mates for me
• that are in the same location
• are sexually compatible
• have traits I want
• want traits I have
Recommend Love

http://maxdemarzi.com/2013/04/19/match-making-with-neo4j/

• Finding lots of “Possible Connections”
• Monthly Subscription Fees
• Keeping you single
Just one tiny itsy bitsy problem:
Dating Boards get paid by:

https://maxdemarzi.com/2020/03/20/finding-fraud-part-two-revised/
They should not be connected

https://maxdemarzi.com/2019/08/19/finding-fraud/
Credit Card Fraud

Finding similar behavior in chains of data

Graphs is one of the many layers of Fraud Detection

https://maxdemarzi.com/2019/12/04/visualizing-activities/
Understanding your Users

Two Sides of the Same Coin
Recommendations
Add a relationship that does not exist
Fraud
Delete the relationship that should not exist

http://www.apcjones.com/arrows
/#
It’s simple, it’s fast, it doesn’t run out of ink.
Arrows

Neo4j Training Introduction

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Neo4j Training Introduction

Similar to Neo4j Training Introduction (20)

More from Max De Marzi

More from Max De Marzi (20)

Recently uploaded

Recently uploaded (20)

Neo4j Training Introduction

Editor's Notes