Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j

Scalability and Graph
Analytics with Neo4j
Stefan Kolmar
VP Field Engineering - Neo4j

The Evolution of Databases
TRADITIONAL OLTP/RELATIONAL

The Evolution of Databases
TRADITIONAL OLTP/RELATIONAL BIG DATA TECHNOLOGY

The classic challenges for Telco’s
Large Data Volumes
CDRs
Network Metrics
Customer Metrics

The classic challenges for Telco’s
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Dynamic Access Dynamic Access

What Is Diﬀerent in Neo4j?
Index-Free Adjacency

Connectedness and Size of Data Set
ResponseTime
Relational and
Other NoSQL
Databases
0 to 2 hops
0 to 3 degrees
Thousands of connections
1000x
Advantage
Tens to hundreds of hops
Thousands of degrees
Billions of connections
Neo4j
“Minutes to
milliseconds”

The Largest Investment
in Graph Databases

• B2B SaaS:
Greatly simpliﬁed management of DB infrastructure for your customers.
• Multi-tenancy:
A single instance of Neo4j Server/Cluster may serve multiple customers/users within an
organization.
• Rapid Testing/Development/Deployment:
Manage separate databases for development, testing, staging, etc. in a single infrastructure.
• Scalability:
Disjoint data is organized in physically separate structures, strong isolation.
• Cloud-Friendly:
Databases can be associated to cloud storage and easily detached from a server and attached
to another server.
Multi-Database: Use Cases

Administration commands:
● CREATE|DROP|START|STOP DATABASE name
Use commands:
● HTTP API: http://server:port/.../database
● Browser & Cypher Shell: :USE database
● Drivers: Session(database)
● Browser:
Conﬁgure and Manage Neo4j Multi-Database
Network Mgmt
Customer
Relations

Unbounded Scalability in Neo4j 4.0

• Scale-out model
• Two ways of using:
• Operate over single large, decomposed graph
• Query across disjoint graphs, per business domain
Data Scientists
Run analysis on large, distributed databases.
Developers
Develop large scale applications on
laptops/desktops and deploy
in a network of Neo4j clusters.
Enterprises
Keep data in designated geographies
Analyse graphs without replicating or
moving them.
Fabric: Distributed Graph Query

Cypher Queries
SQL
Cypher in Neo4j
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
RETURN boss.name AS Boss,
sub.name AS Subordinate,
count(report) AS Total

Multi-graph Cypher Queries
SQL
UNWIND corporate.graphIds() AS gid
CALL {
USE corporate.graph( gid )
MATCH (boss)-[:MANAGES*0..3]->(sub),
(sub)-[:MANAGES*1..3]->(report)
RETURN boss.name AS Boss,
sub.name AS Subordinate,
count(report) AS Total
}
RETURN Boss, Subordinate, Total ORDER BY Total
Cypher in Neo4j 4.0
• Executes queries in parallel on multiple databases, combining or aggregating results.
• Chains queries together from multiple databases for sophisticated real-time analyses.

The foundation:
Causal Cluster
How will this help a Telco to scale?
The evolution:
Fabric
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Large Data Volumes
CDRs
Network Metrics
Customer Metrics
Scaling R/W access

NEO4J DBMSuser
NEO4J DBMS
CLUSTER A
CORE 1
CORE 3CORE 2
REPLICA 1
REPLICA 2
CLUSTER B
CORE 1
CORE 3CORE 2
NM1
Network Metrics
Network Metrics
NM2
NM1 NM2
NM1 NM2
NM3
NM3 NM3
NM3
NM3

http://ldbcouncil.org/developer/snb and https://neo4j.com/fosdem20
Neo4j 4.0 Scalability in Action
Sharding the LDBC Social Network Benchmark
Data Model

• 1-shard for the Persons graph
• N-shards for the Forums graph

Up to 300x reduced latency
Up to 10x Performance improvement

BobJoe
• Based on Role-based Access Control for
graphs
• Restrictions on what data can be seen by
different users, applied to all database
interactions
• Implicit security view of the data for each
user through schema-based security
definitions
• Grant/Deny permissions to traverse, read or
write data based on node labels, relationship
types or database and property names
• Security rules are replicated across the
cluster via roles that are associated with the
users
Security and Data Privacy
Baseline_Personnel
_Security_Standard
Security_Check Counter_Terrorism
_Check
Developed_Vetting

Security and Data Privacy in Practice

• Call Centre Agent:
-> needs Doctor’s name
-> not allowed to read diagnosis
• Doctor:
-> ability to view patient records and
-> ability to view patient diagnoses
Constraints

// Doctors get wide-ranging access
GRANT ACCESS ON DATABASE healthcare TO doctor;
GRANT TRAVERSE {*} ON GRAPH healthcare TO doctor;
GRANT READ {*} ON GRAPH healthcare TO doctor;
GRANT WRITE ON GRAPH healthcare TO doctor;
Security Conﬁg
// Agents get narrower access
GRANT ACCESS ON DATABASE healthcare TO agent;
GRANT TRAVERSE {*} ON GRAPH healthcare TO agent;
GRANT READ {Name} ON GRAPH healthcare NODES Doctor TO agent;
GRANT READ {Name} ON GRAPH healthcare NODES Patient TO agent;

Call Centre Agent
MATCH (:CallcenterAgent {name: 'Alice'})
<-[:CALLED]-(p:Patient)-[:HAS_DIAGNOSIS]-(dia)
<-[:ESTABLISHED]-(d:Doctor)
RETURN p.name, d.name, dia.name;

Reactive Architecture Neo4j 4.0

• Flow control throughout the stack, allowing for
the client application to fully control the
production and ﬂow of records within a result
• Synchronous/Asynchronous execution
• Based on reactive streams with non-blocking
backpressure library
• Client applications can pull or discard the whole
result or N elements
• Can also be gracefully cancelled
• Exposed through a reactive API in Drivers v4.0
• Use Cases:
• Long queries with large result sets
• Paged results
• Thin/small clients
Reactive Architecture

Graph Recipes & Analytics Graph Enhanced ML & AI
Graph Data Science
Science-driven approach to gain knowledge from the
relationships and structures in data, typically to power predictions.
Uses multi-disciplinary workflows that may include
queries, statistics, algorithms and machine learning.
`
Answers specific questions to gain insights from
connections in existing/historical data
Approaches typically include global queries and
algorithms and direct use of results
Training models (ML) with graph structured data
to be used to emulate human, probabilistic
decisions within a solution/ application (AI
system)

Optimized for Analytics
Leverage custom data structures
optimized for global traversals and
aggregation
Flexibly decompose and reshape
your graph for specific use cases
Algorithms for Insights
Robust algorithms that are highly
parallelized and scale to billions of
nodes
Early access to dozens of
experimental implementations
Intuitive Interface
Drastically simplified and
standardized API that enables
custom configurations
Documentation, training, and
examples so getting started is simple
Product Supported & Under Active Development
The Graph Data Science Library

Graph Data Science
Analytics projections:
- Specialized data structure for algorithms,
capable of supporting billions of nodes
- Cypher loaders for experimentation
- Quickly reshape, combine, aggregate, and
deduplicate your transactional data
- Support for multiple node labels,
relationship types, and properties
- Manage multiple in-memory analytics
graphs for different workloads
- Memory footprint allowing large scale use
Graph algorithms & more:
- 40+ algorithms in 5 categories: community,
centrality, similarity, pathfinding, and link
prediction
- Helper algorithms like graph generation, one
hot encoding, and random walk
- Early previews to new implementations in the
alpha & beta name spaces
- Supported, scalable algorithms include seeding,
determinism, and incremental calculations
- Estimate mode for memory requirements

Graph Data Science Algorithms
Generally Unsupervised
38
A subset of data science algorithms that come from network science,
Graph Algorithms enable reasoning about network structure.
Pathﬁnding
and Search
Centrality
(Importance)
Community
Detection
Heuristic
Link Prediction
Similarity

• Neo4j provides
• Scalability for Telco’s
• Carrier grade high availability with Causal Cluster
• Security features to fulﬁll privacy requirements
• Graph Analytics to provide Data Science infrastructure for Telcos
Conclusions

Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j

Similar to Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j (20)

More from Neo4j

More from Neo4j (20)

Recently uploaded

Recently uploaded (20)

Scalability and Graph Analytics with Neo4j - Stefan Kolmar, Neo4j