Spark Meetup @ Netflix, 05/19/2015

Spark and GraphX in the Netflix
Recommender System
Ehtsham Elahi and Yves Raimond
(@EhtshamElahi) (@moustaki)
Algorithms Engineering
Netflix

Recommendations @ Netflix
● Goal: Help members find
content that they’ll enjoy
to maximize satisfaction
and retention
● Core part of product
○ Every impression is a
recommendation

5
▪ Regression (Linear, logistic, elastic net)
▪ SVD and other Matrix Factorizations
▪ Factorization Machines
▪ Restricted Boltzmann Machines
▪ Deep Neural Networks
▪ Markov Models and Graph Algorithms
▪ Clustering
▪ Latent Dirichlet Allocation
▪ Gradient Boosted Decision Trees/Random Forests
▪ Gaussian Processes
▪ …
Models & Algorithms

Main Challenge - Scale
● Algorithms @ Netflix Scale
○ > 62 M Members
○ > 50 Countries
○ > 1000 device types
○ > 100M Hours / day
● Can distributed Machine
Learning algorithms help with
Scale?

Spark and GraphX
● Spark - Distributed in-memory computational engine
using Resilient Distributed Datasets (RDDs)
● GraphX - extends RDDs to Multigraphs and provides
graph analytics
● Convenient and fast, all the way from prototyping
(spark-notebook, iSpark, Zeppelin) to production

Two Machine Learning Problems
● Generate ranking of items with respect to a given item
from an interaction graph
○ Graph Diffusion algorithms (e.g. Topic Sensitive Pagerank)
● Find Clusters of related items using co-occurrence data
○ Probabilistic Graphical Models (Latent Dirichlet Allocation)

Iterative Algorithms in GraphX
v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute

v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute
GraphX represents the
graph as RDDs. e.g.
VertexRDD, EdgeRDD

v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute
GraphX provides APIs
to propagate and
update attributes

v1
v2v3
v4
v6
v7Vertex Attribute
Edge Attribute
Iterative Algorithm
proceeds by creating
updated graphs

● Popular graph diffusion algorithm
● Capturing vertex importance with regards to a particular
vertex
● e.g. for the topic “Seattle”
Topic Sensitive Pagerank @ Netflix

Iteration 0
We start by
activating a single
node
“Seattle”
related to
shot in
featured in
related to
cast
cast
cast
related to

Iteration 1
With some probability,
we follow outbound
edges, otherwise we
go back to the origin.

Iteration 2
Vertex accumulates
higher mass

Iteration 2
And again, until
convergence

GraphX implementation
● Running one propagation for each possible starting
node would be slow
● Keep a vector of activation probabilities at each vertex
● Use GraphX to run all propagations in parallel

Topic Sensitive Pagerank in GraphX
activation probability,
starting from vertex 1
...
Activation probabilities
as vertex attributes
...
...
... ...
...
...

Example graph diffusion results
“Matrix”
“Zombies”
“Seattle”

Distributed Clustering algorithms

LDA @ Netflix
● A popular clustering/latent factors model
● Discovers clusters/topics of related videos from Netflix
data
● e.g, a topic of Animal Documentaries

LDA - Graphical Model
Per-topic word
distributions
Per-document topic
distributions
Topic label for
document d and word w

Question: How to parallelize inference?

Question: How to parallelize inference?
Answer: Read conditional independencies
in the model

Gibbs Sampler 1 (Semi Collapsed)

Gibbs Sampler 1 (Semi Collapsed)
Sample Topic Labels in a given document Sequentially
Sample Topic Labels in different documents In parallel

Gibbs Sampler 2 (UnCollapsed)
Sample Topic Labels in a given document In parallel

Gibbs Sampler 2 (UnCollapsed)
Suitable For GraphX
Sample Topic Labels in a given document In parallel

Distributed Gibbs Sampler
w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
A distributed parameterized graph for
LDA with 3 Topics

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
LDA with 3 Topics
document

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
LDA with 3 Topics
word

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
LDA with 3 Topics
Edge: if word appeared
in the document

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
LDA with 3 Topics
Per-document topic
distribution

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
LDA with 3 Topics
Per-topic word
distributions

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
(vertex, edge, vertex) = triplet

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
Categorical distribution
for the triplet using
vertex attributes

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
Categorical distributions for
all triplets

w1
w2
w3
d1
d2
0.3
0.4
0.1
0.3
0.2
0.8
0.4
0.4
0.1
0.3 0.6 0.1
0.2 0.5 0.3
1
1
2
0
Sample Topics for all edges

w1
w2
w3
d1
d2
0
1
0
0
1
1
1
0
0
0 2 0
1 0 1
1
1
2
0
Neighborhood aggregation for topic
histograms

w1
w2
w3
d1
d2
0.1
0.4
0.3
0.1
0.4
0.4
0.8
0.2
0.3
0.1 0.8 0.1
0.45 0.1 0.45
Realize samples from Dirichlet to
update the graph

Example LDA Results
Cluster of Bollywood
Movies
Cluster of Kids shows
Cluster of Western
movies

Algorithm Implementations
● Topic Sensitive Pagerank
○ Distributed GraphX implementation
○ Alternative Implementation: Broadcast graph adjacency matrix,
Scala/Breeze code, triggered by Spark
● LDA
○ Distributed GraphX implementation
○ Alternative Implementation: Single machine, Multi-threaded Java code
● All implementations are Netflix internal code

Performance Comparison
Open Source DBPedia
dataset

Sublinear rise in time
with GraphX Vs Linear
rise in the Alternative

Doubling the size of cluster:
2.0 speedup in the Alternative
Impl Vs 1.2 in GraphX

Large number of
vertices propagated in
parallel lead to large
shuffle data, causing
failures in GraphX for
small clusters

Netflix dataset
Number of Topics = 100

GraphX setup:
8 x Resources than the
Multi-Core setup

Wikipedia dataset, 100
Topic LDA
Cluster: (16 x r3.2xl)
(source: Databricks)

GraphX for very large datasets
outperforms the multi-core
unCollapsed Impl

What we learned so far...
● Where is the cross-over point for your iterative ML
algorithm?
○ GraphX brings performance benefits if you’re on the right side of that
point
○ GraphX lets you easily throw more hardware at a problem
● GraphX very useful (and fast) for other graph
processing tasks
○ Data pre-processing
○ Efficient joins

What we learned so far ...
● Regularly save the state
○ With a 99.9% success rate, what’s the probability of successfully
running 1,000 iterations?
● Multi-Core Machine learning (r3.8xl, 32 threads, 220
GB) is very efficient
○ if your data fits in memory of single machine !

What we learned so far ...
● Regularly save the state
○ With a 99.9% success rate, what’s the probability of successfully
running 1,000 iterations?
○ ~36%
● Multi-Core Machine learning (r3.8xl, 32 threads, 220
GB) is very efficient
○ if your data fits in memory of single machine !

We’re hiring!
(come talk to us)
https://jobs.netflix.com/

Spark Meetup @ Netflix, 05/19/2015

More Related Content

What's hot

Viewers also liked

Similar to Spark Meetup @ Netflix, 05/19/2015

More from Yves Raimond

Recently uploaded

Spark Meetup @ Netflix, 05/19/2015