INTRO TO NEO4J: Uncovering Hidden Insights through Graphs

INTRO TO NEO4J:
Uncovering Hidden Insights
through Graphs
Calin Constantinov / 18 April 2018

Honest Agenda
1. Shameless insufferable bragging
2. Needlessly boring theoretical stuff
3. Irrelevant graph model example
4. Disappointingly simple, confusingly-hasty, hands-on demo

Identifying holiday seasons
Social-validation seeking, attention-wh**ing

My community
Who’s got swag?

My closest friends
We go way back, like spinal cords and car seats!

Mathematical background
Can you find the
the mistake?

Intermission
…but this is all purely theoretical…

Intermission
…it most likely doesn’t work in real life…

Intermission
…I mean…wouldn’t that be kinda wrong?

Intermission
Eh… don’t worry about it…

Endorsements
She didn’t endorse me back :(
Percentage distribution for top 20 endorsed skills.

Wide-range and niche companies
Finding the perfect job for your hipster-esque coding needs
Percentage distribution for top 3 endorsed skills for selected companies.

Loyal employees
#relationshipgoals
Top 15 companies by average time an employee has a position in the company (in months).

Case study: Minimalist social network
Epic battle!
Let’s consider a social network with 1 000 000 users, each having 50 friends.
SQL has to “fake” relationships (don’t we all?).
SQL: Graph:

Minimalist social network (cont’d)
S14E04: You have 0 friends
Also consider a non-reflexive scenario: Who are my followers?
Reversing the direction of a traversal would be difficult with non-native graph processing.
For that, you must either create a costly reverse-lookup index for each traversal or perform a
brute-force search through the original index.
The results are in!

Native Graph: Index-free adjacency
Lightning McQueen
Index-free adjacency ensures lightning-fast retrieval without the need for indexes.
Query times are only proportional to the amount of the graph searched.
Each node directly references its adjacent nodes, acting as a micro-index for all nearby nodes.
Bidirectional joins are effectively precomputed and stored in the database as relationships.
Relationships – rather than over-reliance on indexes – are used for efficient traversals.

Index-free adjacency (cont’d)
For native graph databases, node records point to lists of relationships, labels and properties.
Graph data is kept in store files, each of which contain data for a specific graph internals.
Example: The node store is a fixed-size record store, where each record is 15 bytes in length.
The database can directly compute a record’s location, at cost O(1).
Let's get dirty!

Index-free adjacency (cont’d‘d)
//TODO: find super awesome pun!
With fixed-sized records and pointer-like record IDs, traversals are implemented simply
by chasing pointers around a data structure, which can be performed at very high speed.
Neo4j 2.x could store 34 billion nodes. Neo4j 3.x deploys dynamic pointer compression for
infinite nodes.
Conceptually, it all comes down to this:

Index-free adjacency (cont’d‘d’d)
And find I'm king of the hill, top of the heap!
Neo4j 2.x lazy loading on-heap object-cache:
Neo4j 3.x relies only on a scalable, high performing LRU-K off-heap page-cache.

Key features for Neo4j
Fully ACID database.
Scalability and HA capabilities.
Intuitive data queries using Cypher.
Open source.
Neo4j takes things seriously: relationships are considered first class citizens!
Is returning something random considered eventual consistency?

Cypher
‘Member ASCII art? (っ◕‿◕)っ
Powerful and expressive query language requiring 10x to 100x less code than SQL.
Declarative language for describing patterns in graphs visually using an ASCII-art syntax.
Comes with a profiler / interactive query planner.

Looks come first (and you know it)
Visual models are easiest to comprehend by humans. Even the ER model is itself a graph!
Businesses need tools for capturing multiple-domain semantics within a visual data model.
Data interconnectivity and topology is at least as important as the data.
Let's Get Visual! Visual!

Making sense of data
The value of data isn’t represented by its volume, but by our capacity to understand the
relationships between its consisting elements.
Graph databases represent a technology that has the analytical and discovery capabilities
that no other persistence solution can provide.
Moreover, modern data is starting to have an obvious graph-like structure. SQL does not
naturally support graph specific operations (e.g. DFS, BFS).
In case of a traditional approach, queries take too long to complete to be run on demand.
That’s not necessarily the case for graphs!
Go graph like all the other cool kids!

MODELLING A
PROFESSIONAL
NETWORK

The time domain
BTW, blink twice if I’m running late!

A job timeline
All your data are belong to us!

The complete graph model
This is so meta!

“…BUT WILL IT
BLEND?” DEMO

LET’S TALK.
calin.constantinov@iquestgroup.com
calin.constantinov@software.ucv.ro

INTRO TO NEO4J: Uncovering Hidden Insights through Graphs

Recommended

Recommended

More Related Content

Similar to INTRO TO NEO4J: Uncovering Hidden Insights through Graphs

Similar to INTRO TO NEO4J: Uncovering Hidden Insights through Graphs (20)

Recently uploaded

Recently uploaded (20)

INTRO TO NEO4J: Uncovering Hidden Insights through Graphs

Editor's Notes