Recommendations and Statistics with Graph Databases

Recommendations and Statistics
with Graph Databases
Calin Constantinov
Development Consultant
Neo4j Certified Professional
16th May 2019

1. Recommendations 101
2. SQL Drawbacks and NOSQL Alternatives
3. Graph Databases
4. Simple Queries with (open)Cypher
5. Building a Social Recommendations Platform with Neo4j
6. Facebook example: PlacesToBe
7. LinkedIn example: LocalTalent
8. QA
Agenda

Smart Things Others Have Said
45% of online shoppers are more likely to shop on a site that offers personalized
recommendations
56% of online shoppers are more likely to return to a site that recommends products
59% of online shoppers believe that it is easier to find more interesting products on
personalized online retail stores
source: https://www.invespcro.com/blog/online-shopping-personalization

Common Approaches
source: https://www.themarketingtechnologist.co/building-a-recommendation-engine-for-
geek-setting-up-the-prerequisites-13

The Ratings Matrix
source: https://nikhilwins.wordpress.com/2015/09/18/movie-recommendations-how-does-
netflix-do-it-a-9-step-coding-intuitive-guide-into-collaborative-filtering

Basic Similarity Measures
Euclidean distance:
Cosine similarity:

Multidimensionality: A 360° Customer View
source: Wenkai Mo - Recommender System

Ideal recommendation features
NOVEL – however, remainders do sometimes work.
RELEVANT – even though an item seems interesting, also consider past orders.
SERENDIPIDY – always recommending the obvious is pointless.
TRANSPARENT – raise trust and credibility by explaining yourself.

SQL DRAWBACKS
AND
NOSQL ALTERNATIVES

SQL Problems
:(
Although SQL databases are excellent for a vast category of problems, they lack scalability.
The ”one size fits all” approach of relational databases is no longer valid.
Moreover, modern data is starting to have an obvious graph-like structure.
SQL does not naturally support graph specific operations (e.g. DFS, BFS).
Complex stored procedures and queries are thus needed for even the simplest tasks.
And what about changes to the structure of the data?

Case Study: Recommender Systems
Fancy name for “Fooling the customer”
Much more can be told about a person by analyzing his relationships than reviewing raw
statistics about him.
Recommendations are more likely to be of value when larger volumes of diverse data are
analyzed.
In case of a traditional approach, queries take too long to complete to be run on demand.
Spoiler alert! That’s not necessarily the case for graphs!
Precomputed recommendations are usually displayed to the users (but consider an
auctioning site!).

NOSQL
Not solely aimed towards pretentious hipsters anymore!

Data Is the New Dollar
source: David Somerville - http://www.smrvl.com/blog

The Labelled Property Graph Model

The Labelled Property Graph Model (cont’d)

Making sense of data
Go graph! All the other kids are doing it!
Takeaway: The value of data isn’t represented by its volume, but by our capacity to
understand the relationships between its consisting elements.
Graph databases represent a technology that has the analytical and discovery capabilities
that no other persistence solution can provide.
Graphs model relations in a generic manner and enable flexibility without major
restructuring of the global schema (as in case of SQL).
Bonus: there’s a very high level of abstraction associated with the way graph queries can
be expressed.

Case study: Minimalist social network
Epic battle!
Let’s consider a social network with 1 000 000 users, each having 50 friends.
SQL has to “fake” relationships (don’t we all?).
SQL: Graph:
source: Ian Robinson, Jim Webber, and Emil Eifrem: Graph Databases, 2013, O'Reilly

Minimalist social network (cont’d)
S14E04: You have 0 friends
Also consider a non-reflexive scenario: Who are my followers?
Reversing the direction of a traversal would be difficult with non-native graph processing.
For that, you must either create a costly reverse-lookup index for each traversal or
perform a brute-force search through the original index.
The results are in!

Cypher
‘Member ASCII art? (っ◕‿◕)っ
Powerful and expressive query language requiring 10x to 100x less code than SQL.
Declarative language for describing patterns in graphs visually using an ASCII-art syntax.
Comes with a profiler / interactive query planner.

Collaborative Filtering over a Graph
MATCH (m:Movie {title: "Home Alone"})<-[:RATED]-(u:User)-[:RATED]->(rec:Movie)
RETURN rec.title AS recommendation, COUNT(u) AS usersWhoAlsoWatched
ORDER BY usersWhoAlsoWatched DESC
LIMIT 25

Weighing In
MATCH (u:User {name: "Nicole Ramsey"})
MATCH (u)-[r:RATED]->(m:Movie)
WITH u, AVG(r.rating) AS average
MATCH (u)-[r:RATED]->(m:Movie)
WHERE r.rating > average
RETURN m, r.rating

BUILDING A
SOCIAL
RECOMMENDATIONS
PLATFORM

Airport places
The metagraph:
Exquisite food and cheap beer, right? <3
source: https://neo4j.com/blog/real-time-recommendation-engine-data-science/

Basic social recommendation
Food and drink places in the following {categories} closest to gate {gate} in terminal {terminal}
that {user}'s friends like:
Making friends and liking stuff

User similarities
Let’s apply weights to the Like relationship and compute similarity distances between users.
The moment we began to fall apart
We could add this part in order to:
Find food and drink places in the following {categories} closest to gate {gate} in terminal
{terminal} that users similar to {user} like.

Applying K-Means
More interestingly, user clusters can be identified:
Always remember that you are absolutely unique. Just like everyone else.

Social cluster recommendations
Find food and drink places in the following {categories} closest to gate {gate} in terminal
{terminal} that users in {user}'s cluster like:
It’s a date!

CraiovaRestaurants
Wanna go out tonight?
Back in 2013, Facebook data from 10 users and their friends was mined.
The final dataset consisted of 21981 users, 48051 check-ins, 549 places and 76 categories, all
linked by 392607 relationships. (7% of all check-ins ever placed in Craiova were captured!)
Yes, this was before Cambridge Analytica.

Popular places
Pub crawl!
Most popular places, by number of visitors.

Places where people return
They keep coming back for more!
Most popular places, by the percentage of visitors that have returned at least once.

Places visited by friends
We're social people (at least on Facebook)
Places a given user hasn’t visited but are most commonly visited by users that are most
commonly visiting places with the given user.

Similar places
Bear with me…
Similar places with a given place based on the number of common categories and largest
number of users commonly visiting both places.

Highly-Available Neo4j Heterogeneous Load Balanced Cluster
tl;dr
All read times reasonably fall within a “real-time” constraint.

The graph model
The dataset: 206 complete profiles (2044 total), 275 active jobs (775 total), 361 companies
991 skills, 19421 endorsements, 89 educational institutions.
This is so META!

Biggest companies
Top 15 companies by number of active jobs.
Size Matters!

Loyal employees
#relationshipgoals
Top 15 companies by average time an employee has a position in the company (in months).

Employee leaves
Time for breakup songs!
Top 10 leaves from one company to another.

Active jobs
So many noobs!
A view on the distribution of the active jobs.

Showcased skills
Number of profiles displaying one of the top 20 displayed skills.
Who doesn’t like a show-off?

Endorsements
She didn’t endorse me back :(
Percentage distribution for top 20 endorsed skills.

Wide-range and niche companies
Finding the perfect job for your hipster-esque coding needs
Percentage distribution for top 3 endorsed skills for selected companies.

(calin:IncredibleGraphExpert)-[:ANSWERS]->(anyQuestion)

See you at the workshop on June 13th

Recommendations and Statistics with Graph Databases

Recommended

Recommended

More Related Content

Similar to Recommendations and Statistics with Graph Databases

Similar to Recommendations and Statistics with Graph Databases (20)

Recently uploaded

Recently uploaded (20)

Recommendations and Statistics with Graph Databases