SlideShare a Scribd company logo
1 of 102
Download to read offline
BUILDING STREAMING
RECOMMENDATION
ENGINES ON APACHE SPARK
Rui Vieira
Software Engineer
Overview
Collaborative Filtering
Batch Alternating Least Squares (ALS)
Streaming ALS
Apache Spark
Distributed Streaming ALS
Collaborative Filtering
Collaborative Filtering
Users, products and ratings
(user, product) ⟼ rating
Collaborative
“Filtering”
Collaborative Filtering
Collaborative Filtering
A
?
B
?
Collaborative Filtering
A B
? ?
Alternating Least Squares
R =
user 1 user 2 user 3 · · · user N
⎡
⎢
⎢
⎢
⎢
⎣
⎤
⎥
⎥
⎥
⎥
⎦
1 4.5 ? · · · 3 product 1
? 3 3 · · · 4 product 2
5 3 ? · · · ? product 3
...
...
...
...
...
...
2 4 1 · · · ? product M
Alternating Least Squares
=
̂𝖱x,y = 𝖴x 𝖯T
y
𝖯T
y
𝖴x
̂𝖱
Batch ALS
loss =
∑
x,y
(ϵx,y)
2
+ λx + λy
ϵx,y = rx,y − ̂rx,y
Alternating Least Squares
=
̂𝖱 𝖴
𝖯T
Alternating Least Squares
=
̂𝖱 𝖴
𝖯T
Alternating Least Squares
R =
user 1 user 2 user 3 · · · user N
⎡
⎢
⎢
⎢
⎢
⎣
⎤
⎥
⎥
⎥
⎥
⎦
1 4.5 3.8 · · · 3 product 1
3.2 3 3 · · · 4 product 2
5 3 3.4 · · · 3.1 product 3
...
...
...
...
...
...
2 4 1 · · · 2.7 product M
Batch ALS
1
2
3
4
300
1 2 3 4 300
70 82 60 54 65
70 86 68 67 72
96 103 82 82 77
90 87 68 93 82
38 48 44 51 35
…
…
Batch ALS
1
2
3
4
300
1 2 3 4 300
70 82 60 54 65
70 86 68 67 72
96 103 82 82 77
90 87 68 93 82
38 48 44 51 35
…
…
Batch ALS
Batch ALS
Batch ALS
Batch ALS
Streaming ALS
Can we update the model with a data stream?
Stochastic Gradient Descent (SGD)
Bias SGD (B-SGD)
Streaming ALS
bx,y = μ + bx + by
̂r⋆
x,y = bx,y + ̂rx,y
Streaming ALS
ˆrx,y = µ + bx + by + Ux · PT
y
loss =
x,y
⎛
⎜
⎝rx,y − ˆrx,y
ϵx,y
⎞
⎟
⎠
2
+ · · ·
Streaming ALS
bx bx + (✏x,y xbx)
by by + (✏x,y yby)
bias
factors
Ux Ux + (✏x,yPy
0
xUx)
Py Py + ✏x,yUx
0
yPy
Streaming ALS
Streaming ALS
Streaming ALS
Streaming ALS
Batch
B-SGD
Predictions
Streaming ALS
Streaming ALS
APACHE SPARK
Apache Spark
Array[T]
RDD[T]
partition 1
partition 2
partition 3
RDD[T]
partition 1
partition 2
partition 3
.map
MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating) 
val ratings: RDD[Rating]
MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating) 
val ratings: RDD[Rating]
val rank: int
MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating) 
val ratings: RDD[Rating]
val rank: int
val iterations: int
MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating) 
val ratings: RDD[Rating]
val rank: int
val iterations: int
val lambda: Double
MLlib ALS
> val model = ALS.train(ratings, rank, iterations, lambda)
model: MatrixFactorizationModel
class MatrixFactorizationModel {
val userFeatures: RDD[(Int, Array[Double])]
val productFeatures: RDD[(Int, Array[Double])]
}
Spark Streaming ALS
RDD[Rating]
DStream[Rating]
Window 1 Window 2 Window 3 Window 4
RDD[Rating] RDD[Rating] RDD[Rating] RDD[Rating]
Spark Streaming ALS
DStream[Rating]
Spark Streaming ALS
DStream[Rating]
Window1
RDD[Rating]
model = StreamingALS.train(rdd1, params)
RDD[Rating]
model = model.train(rdd2)
Window2
RDD[Rating]
model = model.train(rdd3)
Window3
What do we need?
user latent factors
product latent factors
calculate the global bias
calculate user specific bias
calculate product specific bias
user latent factors
product latent factors
calculate the global bias
calculate user specific bias
calculate product specific bias
What do we need?
Spark Streaming ALS
Window1
RDD[Rating]
Spark Streaming ALS
Window1
RDD[Rating]
(👨, 🏀, 3.5)
(👩, 🎾, 3.0)
(%, ⚽, 1.0)
(', 🏈, 4.0)
(), ⚽, 5.0)
RDD[Rating]
Spark Streaming ALS
Window1
RDD[Rating]
(👨, 🏀, 3.5)
(👩, 🎾, 3.0)
(%, ⚽, 1.0)
(', 🏈, 4.0)
(), ⚽, 5.0)
RDD[Rating]
(🏀,(👨, 3.5))
(🎾,(👩, 3.0))
(⚽,(%, 1.0))
(🏈,(', 4.0))
(⚽,(), 5.0))
RDD[(Int, Rating)]
.m
Spark Streaming ALS
Window1
RDD[Rating]
(👨, 🏀, 3.5)
(👩, 🎾, 3.0)
(%, ⚽, 1.0)
(', 🏈, 4.0)
(), ⚽, 5.0)
RDD[Rating]
(🏀,(👨, 3.5))
(🎾,(👩, 3.0))
(⚽,(%, 1.0))
(🏈,(', 4.0))
(⚽,(), 5.0))
RDD[(Int, Rating)]
.m
(👨,(🏀, 3.5))
(👩,(🎾, 3.0))
(%,(⚽, 1.0))
(',(🏈, 4.0))
(),(⚽, 5.0))
RDD[(Int, Rating)]
.map
Spark Streaming ALS
Spark Streaming ALS
(👨, (🏀, 3.5))
(👩, (🎾, 3.0))
(%, (⚽, 1.0))
(', (🏈, 4.0))
(), (⚽, 5.0))
RDD[(Int, Rating)]
Spark Streaming ALS
(👨, (🏀, 3.5))
(👩, (🎾, 3.0))
(%, (⚽, 1.0))
(', (🏈, 4.0))
(), (⚽, 5.0))
RDD[(Int, Rating)]
(👨, [0.123, -0.234, …]))
(👩, [0.934, 0.526, …])
(%, [0.421, -0.594, …])
(', [0.034, 0.661, …])
(), [0.713, -0.335, …])
RDD[(Int, Factor)]
.map
Spark Streaming ALS
Spark Streaming ALS
(👨,(🏀, 3.5))
(👩, (🎾, 3.0))
(%,(⚽, 1.0))
(',(🏈, 4.0))
(),(⚽, 5.0))
RDD[(Int, Rating)]
Spark Streaming ALS
(👨,(🏀, 3.5))
(👩, (🎾, 3.0))
(%,(⚽, 1.0))
(',(🏈, 4.0))
(),(⚽, 5.0))
RDD[(Int, Rating)]
(👨, [0.123, -0.234, …])
(👩, [0.934, 0.526, …])
(%, [0.421, -0.594, …])
(', [0.034, 0.661, …])
(), [0.713, -0.335, …])
RDD[(Int, Factor)]
Spark Streaming ALS
(👨,(🏀, 3.5))
(👩, (🎾, 3.0))
(%,(⚽, 1.0))
(',(🏈, 4.0))
(),(⚽, 5.0))
RDD[(Int, Rating)]
(👨, [0.123, -0.234, …])
(👩, [0.934, 0.526, …])
(%, [0.421, -0.594, …])
(', [0.034, 0.661, …])
(), [0.713, -0.335, …])
RDD[(Int, Factor)]
.join
Spark Streaming ALS
.join
(🏀, (👨, 3.5, [0.123, -0.234, …]))
(🎾, (👩, 3.0, [0.934, 0.526, …]))
(⚽, (%, 1.0, [0.421, -0.594, …]))
(🏈, (', 4.0, [0.034, 0.661, …]))
(⚽, (), 5.0, [0.713, -0.335, …]))
RDD[(Int, (Int, Double, Factor))]
user latent factors
Spark Streaming ALS
Spark Streaming ALS
(🏀, (👨, 3.5, [0.123, -0.234, …]))
(🎾, (👩, 3.0, [0.934, 0.526, …]))
(⚽, (%, 1.0, [0.421, -0.594, …]))
(🏈, (', 4.0, [0.034, 0.661, …]))
(⚽, (), 5.0, [0.713, -0.335, …]))
RDD[(Int, (Int, Double, Factor))]
Spark Streaming ALS
(🏀, (👨, 3.5, [0.123, -0.234, …]))
(🎾, (👩, 3.0, [0.934, 0.526, …]))
(⚽, (%, 1.0, [0.421, -0.594, …]))
(🏈, (', 4.0, [0.034, 0.661, …]))
(⚽, (), 5.0, [0.713, -0.335, …]))
RDD[(Int, (Int, Double, Factor))]
(🏀, [0.764, 0.254, …]))
(⚽, [0.136, 0.933, …])
(🎾, [0.663, -0.134, …])
(🎾, [0.811, 0.535, …])
(🏈, [0.234, -0.579, …])
RDD[(Int, Factor)]
.join
Spark Streaming ALS
(🏀, (👨, 3.5, [0.123, …], [0.764, …] ))
(🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …]))
(⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …]))
(🏈, (', 4.0, [0.034, 0.661, …], [0.811, …]))
(⚽, (), 5.0, [0.713, -0.335, …], [0.234, …]))
RDD[(Int, (Int, Double, Factor, Factor))]
user latent factors
product latent factors
calculate the global bias
calculate user specific bias
calculate product specific bias
What do we need?
Spark Streaming ALS
(🏀, (👨, 3.5, [0.123, …], [0.764, …] ))
(🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …]))
(⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …]))
(🏈, (', 4.0, [0.034, 0.661, …], [0.811, …]))
(⚽, (), 5.0, [0.713, -0.335, …], [0.234, …]))
RDD[(Int, (Int, Double,
Spark Streaming ALS
(🏀, (👨, 3.5, [0.123, …], [0.764, …] ))
(🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …]))
(⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …]))
(🏈, (', 4.0, [0.034, 0.661, …], [0.811, …]))
(⚽, (), 5.0, [0.713, -0.335, …], [0.234, …]))
RDD[(Int, (Int, Double,
(🏀, (👨, 3.5, [0.123, …], [0.764, …] ))
(🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …]))
(⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …]))
(🏈, (', 4.0, [0.034, 0.661, …], [0.811, …]))
(⚽, (), 5.0, [0.713, -0.335, …], [0.234, …]))
RDD[(Int, (Int, Double, Factor, Factor))]
user latent factors
product latent factors
calculate the global bias
calculate user specific bias
calculate product specific bias
What do we need?
Spark Streaming ALS
✏x,y = rx,y ˆrx,y
ˆrx,y = µ + bx + by + Ux · PT
y
bx bx + (✏x,y xbx)
Spark Streaming ALS
✏x,y = rx,y ˆrx,y
ˆrx,y = µ + bx + by + Ux · PT
y
bx bx + (✏x,y xbx)
Spark Streaming ALS
ˆrx,y = µ + bx + by + Ux · PT
y
Spark Streaming ALS
ˆrx,y = µ + bx + by + Ux · PT
y
(⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …]))
RDD[(Int, (Int, Double, Factor, Factor))]
predicted(⚽, %) = µ + bx + by + [0.421, -0.594, …] x [0.663, …]T = 2.3
Spark Streaming ALS
✏x,y = rx,y ˆrx,y
bx bx + (✏x,y xbx)
Spark Streaming ALS
✏x,y = rx,y ˆrx,y
Spark Streaming ALS
✏x,y = rx,y ˆrx,y
(⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …]))
RDD[(Int, (Int, Double, Factor))]
predicted(%,⚽) = 2.3
rating(%,⚽) = 1.0
error(%,⚽) = rating(%,⚽) - predicted(%,⚽) = -1.3
Spark Streaming ALS
bx bx + (✏x,y xbx)
by by + (✏x,y yby)
gradients
Spark Streaming ALS
bx bx + (✏x,y xbx)
by by + (✏x,y yby)
gradients
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
(%, 0.128, [0.957, -0.247, …])
(', 0.242, [0.038, 0.883, …])
(), 0.245, [0.283, -0.953, …])
RDD[(Int, Double, Factor)]
Spark Streaming ALS
bx bx + (✏x,y xbx)
by by + (✏x,y yby)
gradients
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
(%, 0.128, [0.957, -0.247, …])
(', 0.242, [0.038, 0.883, …])
(), 0.245, [0.283, -0.953, …])
RDD[(Int, Double, Factor)]
(🏀, 0.274, [0.445, -0.233, …])
(🎾, 0.483, [0.843, 0.023, …])
(⚽, 0.595, [0.284, -0.987, …])
(🏈, 0.103, [0.340, 0.328, …])
(⚽, 0.253, [0.472, -0.274, …])
RDD[(Int, Double, Factor)]
Spark Streaming ALS
Spark Streaming ALS
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
RDD[(Int, Double, Factor)]
Spark Streaming ALS
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
RDD[(Int, Double, Factor)]
(👨, 0.932)
(👩, 0.101)
RDD[(Int, Double)]
∇bx
Spark Streaming ALS
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
RDD[(Int, Double, Factor)]
(👨, 0.932)
(👩, 0.101)
RDD[(Int, Double)]
(👨, [0.123, …])
(👩, [0.334, …])
RDD[(Int, Factor)]
∇bx ∇Ux
Spark Streaming ALS
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
RDD[(Int, Double, Factor)]
(🏀, 0.274, [0.445, -0.233, …])
(🎾, 0.483, [0.843, 0.023, …])
RDD[(Int, Double, Factor)]
(👨, 0.932)
(👩, 0.101)
RDD[(Int, Double)]
(👨, [0.123, …])
(👩, [0.334, …])
RDD[(Int, Factor)]
∇bx ∇Ux
Spark Streaming ALS
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
RDD[(Int, Double, Factor)]
(🏀, 0.274, [0.445, -0.233, …])
(🎾, 0.483, [0.843, 0.023, …])
RDD[(Int, Double, Factor)]
(👨, 0.932)
(👩, 0.101)
RDD[(Int, Double)]
(👨, [0.123, …])
(👩, [0.334, …])
RDD[(Int, Factor)]
(🏀, 0.274)
(🎾, 0.483)
RDD[(Int, Double)]
∇bx ∇Ux ∇by
Spark Streaming ALS
(👨, 0.932, [0.123, -0.140, …])
(👩, 0.101, [0.334, 0.273, …])
RDD[(Int, Double, Factor)]
(🏀, 0.274, [0.445, -0.233, …])
(🎾, 0.483, [0.843, 0.023, …])
RDD[(Int, Double, Factor)]
(👨, 0.932)
(👩, 0.101)
RDD[(Int, Double)]
(👨, [0.123, …])
(👩, [0.334, …])
RDD[(Int, Factor)]
(🏀, 0.274)
(🎾, 0.483)
RDD[(Int, Double)]
(🏀, [0.445, …])
(🎾, [0.843, …])
RDD[(Int, Factor)]
∇bx ∇Ux ∇by ∇Py
Spark Streaming ALS
Spark Streaming ALS
(👨, 0.932)
(👨, 0.101)
RDD[(Int, Double)]
b(👨) += Σ∇b(👨)
Spark Streaming ALS
(👨, 0.932)
(👨, 0.101)
RDD[(Int, Double)]
(👨, [0.123, …])
(👨, [0.334, …])
RDD[(Int, Factor)]
b(👨) += Σ∇b(👨) U(👨) += Σ∇U(👨)
Spark Streaming ALS
(👨, 0.932)
(👨, 0.101)
RDD[(Int, Double)]
(👨, [0.123, …])
(👨, [0.334, …])
RDD[(Int, Factor)]
(⚽, 0.274)
(⚽, 0.483)
RDD[(Int, Double)]
b(👨) += Σ∇b(👨) U(👨) += Σ∇U(👨) b(⚽) += Σ∇b(⚽)
Spark Streaming ALS
(👨, 0.932)
(👨, 0.101)
RDD[(Int, Double)]
(👨, [0.123, …])
(👨, [0.334, …])
RDD[(Int, Factor)]
(⚽, 0.274)
(⚽, 0.483)
RDD[(Int, Double)]
(⚽, [0.445, …])
(⚽, [0.843, …])
RDD[(Int, Factor)]
b(👨) += Σ∇b(👨) U(👨) += Σ∇U(👨) b(⚽) += Σ∇b(⚽) U(⚽) += Σ∇U(⚽)
Spark Streaming ALS
DStream[Rating]
Window 1 Window 2 Window 3 Window 4
RDD[Rating] RDD[Rating] RDD[Rating] RDD[Rating]
StreamALS
RDD[Int, Factor] RDD[Int, Factor]
StreamALS
Spark Streaming ALS
RDD[(Int, (Int, Double)]
(🧔, (🎱, 3.5)) (,, (⚾, 1.0)) (👨, (🏀, 4.5)) …
RDD[(Int, (Int, Double)]
(🎱, (🧔, 3.5)) (🏈, (', 3.0)) (⚽, (%, 1.5)) …
Windowt
RDD[Rating]
(👨, 🏀, 4.5)
(👩, 🎾, 4.5)
(%, ⚽, 1.5)
(', 🏈, 3.0)
(), ⚽, 2.0)
(🧔, 🎱, 3.5)
(,, ⚾, 1.0)
(., ⚾, 2.5)
.map
Spark Streaming ALS
RDD[(Int, (Int, Double), Int, Factor)]
(👨, (🏀, 4.5), 👨, (0.12, [0.9,…]))
(👩, (🎾, 4.5), (👩, (-0.1, [1.37,…])))
(🧔, (🎱, 1.5), None)
…
RDD[(Int, (Int, Factor)]
(👨, (0.12, [0.9,…]))
(👩, (-0.1, [1.37,…]))
(%, (1, [0.123,…]))
…
RDD[(Int, (Int, Double)]
(👨, (🏀, 4.5)) (👩, (🎾, 4.5)) (🧔, (🎱, 1.5)) …
userRatings.fullOuterJoin(userFactors).map {
case (userId, (_, userFactors)) =>
(userId, userFactors(featureGenerator.nextValue()))
}
.fullOuterJoin
Data
MovieLens
Widely used in recommendation engine research
Variants
Small - 100,000 ratings / 9,000 movies / 700 users
Full - 26 million ratings / 45,000 movies / 270,000
Training batch ALS
val split: Array[RDD[Rating]] = ratings.randomSplit(0.8, 0.2)
val model = ALS.train(split(0), rank, iter, lambda)
Training batch ALS
val split: Array[RDD[Rating]] = ratings.randomSplit(0.8, 0.2)
val model = ALS.train(split(0), rank, iter, lambda)
val predictions: RDD[Rating] = model.predict(split(1).map { x =>
(x.user, x.product))
}
val pairs = predictions.map(x => ((x.user, x.product), x.rating))
.join(split(1).map(x => ((x.user, x.product), x.rating))
.values
Val RMSE = math.sqrt(pairs.map(x => math.pow(x._1 - x._2, 2)).mean())
Training streaming ALS
spark
master
spark
worker
spark
worker
spark
worker
spark
worker
kafka
Training streaming ALS
val model = StreamingALS(rank, iterations, lambda, gamma)
trainingStreamSet.foreachRDD { rdd =>
model.train(rdd)
val RMSE = calculateRMSE(model, validation)
}
RDD[Rating] RDD[Rating] RDD[Rating]
kafka
Comparison
1.1
1.2
1.3
0 20 40 60 80
t
RMSE
TO CONSIDER…
To consider
“Cold start”
Same as batch ALS
Too few observations = meaningless
Train offline
To consider
Hyper-parameter estimation
Data
Parameters A
Model
Data
Parameters B
Data
Parameters C
Data
Parameters D
To consider
Hyper-parameter estimation
Data
Parameters A
Model
Data
Parameters B
Data
Parameters C
Data
Parameters D
Model
Parameters A
Model
Parameters B
Parameters C
Parameters D
Model
To consider
Hyper-parameter estimation
Data
Model A
Data
Model B
Data
Model C
Data
Model D
Model B
Model C
Model D
Model B
Model D
Model B
To consider
Partition A Partition BJoin
Partitioning
To consider
RDD random access?
val u = userFeatures.lookup(userId)
val p = productFeatures.lookup(productId)
val predicted = model.predict(userId, productId, u, p, globalBias)
R
=
U
PT
x
y
Links
Blog post:
https://ruivieira.github.io/a-streaming-als-
implementation.html
radanalytics.io
THANK YOU

More Related Content

Similar to Building Streaming Recommendation Engines on Apache Spark with Rui Vieira

Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache CalciteJulian Hyde
 
Recommendation Engine with In-Database Machine Learning
Recommendation Engine with In-Database Machine LearningRecommendation Engine with In-Database Machine Learning
Recommendation Engine with In-Database Machine LearningTigerGraph
 
Clojure for Data Science
Clojure for Data ScienceClojure for Data Science
Clojure for Data Sciencehenrygarner
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programaciónSoftware Guru
 
Extracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and PysparkExtracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and PysparkParag Ahire
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1Ke Wei Louis
 
第5回NIPS読み会・関西発表資料
第5回NIPS読み会・関西発表資料第5回NIPS読み会・関西発表資料
第5回NIPS読み会・関西発表資料Kyoichiro Kobayashi
 
Optimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4jOptimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4jNeo4j
 
R is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdfR is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdfannikasarees
 
Useful javascript
Useful javascriptUseful javascript
Useful javascriptLei Kang
 
Intro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopIntro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopProttay Karim
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePeter Solymos
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceHeroku
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsJohann de Boer
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsMongoDB
 

Similar to Building Streaming Recommendation Engines on Apache Spark with Rui Vieira (20)

Data Profiling in Apache Calcite
Data Profiling in Apache CalciteData Profiling in Apache Calcite
Data Profiling in Apache Calcite
 
Recommendation Engine with In-Database Machine Learning
Recommendation Engine with In-Database Machine LearningRecommendation Engine with In-Database Machine Learning
Recommendation Engine with In-Database Machine Learning
 
Structured logging
Structured loggingStructured logging
Structured logging
 
Clojure for Data Science
Clojure for Data ScienceClojure for Data Science
Clojure for Data Science
 
Ejercicios de estilo en la programación
Ejercicios de estilo en la programaciónEjercicios de estilo en la programación
Ejercicios de estilo en la programación
 
Extracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and PysparkExtracting And Analyzing Cricket Statistics with R and Pyspark
Extracting And Analyzing Cricket Statistics with R and Pyspark
 
Super Advanced Python –act1
Super Advanced Python –act1Super Advanced Python –act1
Super Advanced Python –act1
 
第5回NIPS読み会・関西発表資料
第5回NIPS読み会・関西発表資料第5回NIPS読み会・関西発表資料
第5回NIPS読み会・関西発表資料
 
Optimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4jOptimizing Cypher Queries in Neo4j
Optimizing Cypher Queries in Neo4j
 
R is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdfR is a very flexible and powerful programming language, as well as a.pdf
R is a very flexible and powerful programming language, as well as a.pdf
 
Useful javascript
Useful javascriptUseful javascript
Useful javascript
 
Intro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshopIntro to Machine Learning with TF- workshop
Intro to Machine Learning with TF- workshop
 
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Austin 2018: Tips and Tricks for Avoiding Common Query Pitfalls
 
R and data mining
R and data miningR and data mining
R and data mining
 
R programming language
R programming languageR programming language
R programming language
 
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query PitfallsMongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query Pitfalls
MongoDB.local Sydney 2019: Tips and Tricks for Avoiding Common Query Pitfalls
 
Poetry with R -- Dissecting the code
Poetry with R -- Dissecting the codePoetry with R -- Dissecting the code
Poetry with R -- Dissecting the code
 
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of IndifferenceRob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
Rob Sullivan at Heroku's Waza 2013: Your Database -- A Story of Indifference
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalytics
 
Tips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query PitfallsTips and Tricks for Avoiding Common Query Pitfalls
Tips and Tricks for Avoiding Common Query Pitfalls
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 

Recently uploaded (20)

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 

Building Streaming Recommendation Engines on Apache Spark with Rui Vieira

  • 1. BUILDING STREAMING RECOMMENDATION ENGINES ON APACHE SPARK Rui Vieira Software Engineer
  • 2. Overview Collaborative Filtering Batch Alternating Least Squares (ALS) Streaming ALS Apache Spark Distributed Streaming ALS
  • 4. Collaborative Filtering Users, products and ratings (user, product) ⟼ rating Collaborative “Filtering”
  • 8. Alternating Least Squares R = user 1 user 2 user 3 · · · user N ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ 1 4.5 ? · · · 3 product 1 ? 3 3 · · · 4 product 2 5 3 ? · · · ? product 3 ... ... ... ... ... ... 2 4 1 · · · ? product M
  • 9. Alternating Least Squares = ̂𝖱x,y = 𝖴x 𝖯T y 𝖯T y 𝖴x ̂𝖱
  • 10. Batch ALS loss = ∑ x,y (ϵx,y) 2 + λx + λy ϵx,y = rx,y − ̂rx,y
  • 13. Alternating Least Squares R = user 1 user 2 user 3 · · · user N ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ 1 4.5 3.8 · · · 3 product 1 3.2 3 3 · · · 4 product 2 5 3 3.4 · · · 3.1 product 3 ... ... ... ... ... ... 2 4 1 · · · 2.7 product M
  • 14. Batch ALS 1 2 3 4 300 1 2 3 4 300 70 82 60 54 65 70 86 68 67 72 96 103 82 82 77 90 87 68 93 82 38 48 44 51 35 … …
  • 15. Batch ALS 1 2 3 4 300 1 2 3 4 300 70 82 60 54 65 70 86 68 67 72 96 103 82 82 77 90 87 68 93 82 38 48 44 51 35 … …
  • 20. Streaming ALS Can we update the model with a data stream? Stochastic Gradient Descent (SGD) Bias SGD (B-SGD)
  • 21. Streaming ALS bx,y = μ + bx + by ̂r⋆ x,y = bx,y + ̂rx,y
  • 22. Streaming ALS ˆrx,y = µ + bx + by + Ux · PT y loss = x,y ⎛ ⎜ ⎝rx,y − ˆrx,y ϵx,y ⎞ ⎟ ⎠ 2 + · · ·
  • 23. Streaming ALS bx bx + (✏x,y xbx) by by + (✏x,y yby) bias factors Ux Ux + (✏x,yPy 0 xUx) Py Py + ✏x,yUx 0 yPy
  • 31. Apache Spark Array[T] RDD[T] partition 1 partition 2 partition 3 RDD[T] partition 1 partition 2 partition 3 .map
  • 32. MLlib ALS val model = ALS.train(ratings, rank, iterations, lambda)
  • 33. MLlib ALS val model = ALS.train(ratings, rank, iterations, lambda) case class Rating(int user, int product, double rating)  val ratings: RDD[Rating]
  • 34. MLlib ALS val model = ALS.train(ratings, rank, iterations, lambda) case class Rating(int user, int product, double rating)  val ratings: RDD[Rating] val rank: int
  • 35. MLlib ALS val model = ALS.train(ratings, rank, iterations, lambda) case class Rating(int user, int product, double rating)  val ratings: RDD[Rating] val rank: int val iterations: int
  • 36. MLlib ALS val model = ALS.train(ratings, rank, iterations, lambda) case class Rating(int user, int product, double rating)  val ratings: RDD[Rating] val rank: int val iterations: int val lambda: Double
  • 37. MLlib ALS > val model = ALS.train(ratings, rank, iterations, lambda) model: MatrixFactorizationModel class MatrixFactorizationModel { val userFeatures: RDD[(Int, Array[Double])] val productFeatures: RDD[(Int, Array[Double])] }
  • 38. Spark Streaming ALS RDD[Rating] DStream[Rating] Window 1 Window 2 Window 3 Window 4 RDD[Rating] RDD[Rating] RDD[Rating] RDD[Rating]
  • 40. Spark Streaming ALS DStream[Rating] Window1 RDD[Rating] model = StreamingALS.train(rdd1, params) RDD[Rating] model = model.train(rdd2) Window2 RDD[Rating] model = model.train(rdd3) Window3
  • 41. What do we need? user latent factors product latent factors calculate the global bias calculate user specific bias calculate product specific bias
  • 42. user latent factors product latent factors calculate the global bias calculate user specific bias calculate product specific bias What do we need?
  • 44. Spark Streaming ALS Window1 RDD[Rating] (👨, 🏀, 3.5) (👩, 🎾, 3.0) (%, ⚽, 1.0) (', 🏈, 4.0) (), ⚽, 5.0) RDD[Rating]
  • 45. Spark Streaming ALS Window1 RDD[Rating] (👨, 🏀, 3.5) (👩, 🎾, 3.0) (%, ⚽, 1.0) (', 🏈, 4.0) (), ⚽, 5.0) RDD[Rating] (🏀,(👨, 3.5)) (🎾,(👩, 3.0)) (⚽,(%, 1.0)) (🏈,(', 4.0)) (⚽,(), 5.0)) RDD[(Int, Rating)] .m
  • 46. Spark Streaming ALS Window1 RDD[Rating] (👨, 🏀, 3.5) (👩, 🎾, 3.0) (%, ⚽, 1.0) (', 🏈, 4.0) (), ⚽, 5.0) RDD[Rating] (🏀,(👨, 3.5)) (🎾,(👩, 3.0)) (⚽,(%, 1.0)) (🏈,(', 4.0)) (⚽,(), 5.0)) RDD[(Int, Rating)] .m (👨,(🏀, 3.5)) (👩,(🎾, 3.0)) (%,(⚽, 1.0)) (',(🏈, 4.0)) (),(⚽, 5.0)) RDD[(Int, Rating)] .map
  • 48. Spark Streaming ALS (👨, (🏀, 3.5)) (👩, (🎾, 3.0)) (%, (⚽, 1.0)) (', (🏈, 4.0)) (), (⚽, 5.0)) RDD[(Int, Rating)]
  • 49. Spark Streaming ALS (👨, (🏀, 3.5)) (👩, (🎾, 3.0)) (%, (⚽, 1.0)) (', (🏈, 4.0)) (), (⚽, 5.0)) RDD[(Int, Rating)] (👨, [0.123, -0.234, …])) (👩, [0.934, 0.526, …]) (%, [0.421, -0.594, …]) (', [0.034, 0.661, …]) (), [0.713, -0.335, …]) RDD[(Int, Factor)] .map
  • 51. Spark Streaming ALS (👨,(🏀, 3.5)) (👩, (🎾, 3.0)) (%,(⚽, 1.0)) (',(🏈, 4.0)) (),(⚽, 5.0)) RDD[(Int, Rating)]
  • 52. Spark Streaming ALS (👨,(🏀, 3.5)) (👩, (🎾, 3.0)) (%,(⚽, 1.0)) (',(🏈, 4.0)) (),(⚽, 5.0)) RDD[(Int, Rating)] (👨, [0.123, -0.234, …]) (👩, [0.934, 0.526, …]) (%, [0.421, -0.594, …]) (', [0.034, 0.661, …]) (), [0.713, -0.335, …]) RDD[(Int, Factor)]
  • 53. Spark Streaming ALS (👨,(🏀, 3.5)) (👩, (🎾, 3.0)) (%,(⚽, 1.0)) (',(🏈, 4.0)) (),(⚽, 5.0)) RDD[(Int, Rating)] (👨, [0.123, -0.234, …]) (👩, [0.934, 0.526, …]) (%, [0.421, -0.594, …]) (', [0.034, 0.661, …]) (), [0.713, -0.335, …]) RDD[(Int, Factor)] .join
  • 54. Spark Streaming ALS .join (🏀, (👨, 3.5, [0.123, -0.234, …])) (🎾, (👩, 3.0, [0.934, 0.526, …])) (⚽, (%, 1.0, [0.421, -0.594, …])) (🏈, (', 4.0, [0.034, 0.661, …])) (⚽, (), 5.0, [0.713, -0.335, …])) RDD[(Int, (Int, Double, Factor))] user latent factors
  • 56. Spark Streaming ALS (🏀, (👨, 3.5, [0.123, -0.234, …])) (🎾, (👩, 3.0, [0.934, 0.526, …])) (⚽, (%, 1.0, [0.421, -0.594, …])) (🏈, (', 4.0, [0.034, 0.661, …])) (⚽, (), 5.0, [0.713, -0.335, …])) RDD[(Int, (Int, Double, Factor))]
  • 57. Spark Streaming ALS (🏀, (👨, 3.5, [0.123, -0.234, …])) (🎾, (👩, 3.0, [0.934, 0.526, …])) (⚽, (%, 1.0, [0.421, -0.594, …])) (🏈, (', 4.0, [0.034, 0.661, …])) (⚽, (), 5.0, [0.713, -0.335, …])) RDD[(Int, (Int, Double, Factor))] (🏀, [0.764, 0.254, …])) (⚽, [0.136, 0.933, …]) (🎾, [0.663, -0.134, …]) (🎾, [0.811, 0.535, …]) (🏈, [0.234, -0.579, …]) RDD[(Int, Factor)] .join
  • 58. Spark Streaming ALS (🏀, (👨, 3.5, [0.123, …], [0.764, …] )) (🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …])) (⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …])) (🏈, (', 4.0, [0.034, 0.661, …], [0.811, …])) (⚽, (), 5.0, [0.713, -0.335, …], [0.234, …])) RDD[(Int, (Int, Double, Factor, Factor))]
  • 59. user latent factors product latent factors calculate the global bias calculate user specific bias calculate product specific bias What do we need?
  • 60. Spark Streaming ALS (🏀, (👨, 3.5, [0.123, …], [0.764, …] )) (🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …])) (⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …])) (🏈, (', 4.0, [0.034, 0.661, …], [0.811, …])) (⚽, (), 5.0, [0.713, -0.335, …], [0.234, …])) RDD[(Int, (Int, Double,
  • 61. Spark Streaming ALS (🏀, (👨, 3.5, [0.123, …], [0.764, …] )) (🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …])) (⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …])) (🏈, (', 4.0, [0.034, 0.661, …], [0.811, …])) (⚽, (), 5.0, [0.713, -0.335, …], [0.234, …])) RDD[(Int, (Int, Double, (🏀, (👨, 3.5, [0.123, …], [0.764, …] )) (🎾, (👩, 3.0, [0.934, 0.526, …], [0.933, …])) (⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …])) (🏈, (', 4.0, [0.034, 0.661, …], [0.811, …])) (⚽, (), 5.0, [0.713, -0.335, …], [0.234, …])) RDD[(Int, (Int, Double, Factor, Factor))]
  • 62. user latent factors product latent factors calculate the global bias calculate user specific bias calculate product specific bias What do we need?
  • 63. Spark Streaming ALS ✏x,y = rx,y ˆrx,y ˆrx,y = µ + bx + by + Ux · PT y bx bx + (✏x,y xbx)
  • 64. Spark Streaming ALS ✏x,y = rx,y ˆrx,y ˆrx,y = µ + bx + by + Ux · PT y bx bx + (✏x,y xbx)
  • 65. Spark Streaming ALS ˆrx,y = µ + bx + by + Ux · PT y
  • 66. Spark Streaming ALS ˆrx,y = µ + bx + by + Ux · PT y (⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …])) RDD[(Int, (Int, Double, Factor, Factor))] predicted(⚽, %) = µ + bx + by + [0.421, -0.594, …] x [0.663, …]T = 2.3
  • 67. Spark Streaming ALS ✏x,y = rx,y ˆrx,y bx bx + (✏x,y xbx)
  • 68. Spark Streaming ALS ✏x,y = rx,y ˆrx,y
  • 69. Spark Streaming ALS ✏x,y = rx,y ˆrx,y (⚽, (%, 1.0, [0.421, -0.594, …], [0.663, …])) RDD[(Int, (Int, Double, Factor))] predicted(%,⚽) = 2.3 rating(%,⚽) = 1.0 error(%,⚽) = rating(%,⚽) - predicted(%,⚽) = -1.3
  • 70. Spark Streaming ALS bx bx + (✏x,y xbx) by by + (✏x,y yby) gradients
  • 71. Spark Streaming ALS bx bx + (✏x,y xbx) by by + (✏x,y yby) gradients (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) (%, 0.128, [0.957, -0.247, …]) (', 0.242, [0.038, 0.883, …]) (), 0.245, [0.283, -0.953, …]) RDD[(Int, Double, Factor)]
  • 72. Spark Streaming ALS bx bx + (✏x,y xbx) by by + (✏x,y yby) gradients (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) (%, 0.128, [0.957, -0.247, …]) (', 0.242, [0.038, 0.883, …]) (), 0.245, [0.283, -0.953, …]) RDD[(Int, Double, Factor)] (🏀, 0.274, [0.445, -0.233, …]) (🎾, 0.483, [0.843, 0.023, …]) (⚽, 0.595, [0.284, -0.987, …]) (🏈, 0.103, [0.340, 0.328, …]) (⚽, 0.253, [0.472, -0.274, …]) RDD[(Int, Double, Factor)]
  • 74. Spark Streaming ALS (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) RDD[(Int, Double, Factor)]
  • 75. Spark Streaming ALS (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) RDD[(Int, Double, Factor)] (👨, 0.932) (👩, 0.101) RDD[(Int, Double)] ∇bx
  • 76. Spark Streaming ALS (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) RDD[(Int, Double, Factor)] (👨, 0.932) (👩, 0.101) RDD[(Int, Double)] (👨, [0.123, …]) (👩, [0.334, …]) RDD[(Int, Factor)] ∇bx ∇Ux
  • 77. Spark Streaming ALS (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) RDD[(Int, Double, Factor)] (🏀, 0.274, [0.445, -0.233, …]) (🎾, 0.483, [0.843, 0.023, …]) RDD[(Int, Double, Factor)] (👨, 0.932) (👩, 0.101) RDD[(Int, Double)] (👨, [0.123, …]) (👩, [0.334, …]) RDD[(Int, Factor)] ∇bx ∇Ux
  • 78. Spark Streaming ALS (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) RDD[(Int, Double, Factor)] (🏀, 0.274, [0.445, -0.233, …]) (🎾, 0.483, [0.843, 0.023, …]) RDD[(Int, Double, Factor)] (👨, 0.932) (👩, 0.101) RDD[(Int, Double)] (👨, [0.123, …]) (👩, [0.334, …]) RDD[(Int, Factor)] (🏀, 0.274) (🎾, 0.483) RDD[(Int, Double)] ∇bx ∇Ux ∇by
  • 79. Spark Streaming ALS (👨, 0.932, [0.123, -0.140, …]) (👩, 0.101, [0.334, 0.273, …]) RDD[(Int, Double, Factor)] (🏀, 0.274, [0.445, -0.233, …]) (🎾, 0.483, [0.843, 0.023, …]) RDD[(Int, Double, Factor)] (👨, 0.932) (👩, 0.101) RDD[(Int, Double)] (👨, [0.123, …]) (👩, [0.334, …]) RDD[(Int, Factor)] (🏀, 0.274) (🎾, 0.483) RDD[(Int, Double)] (🏀, [0.445, …]) (🎾, [0.843, …]) RDD[(Int, Factor)] ∇bx ∇Ux ∇by ∇Py
  • 81. Spark Streaming ALS (👨, 0.932) (👨, 0.101) RDD[(Int, Double)] b(👨) += Σ∇b(👨)
  • 82. Spark Streaming ALS (👨, 0.932) (👨, 0.101) RDD[(Int, Double)] (👨, [0.123, …]) (👨, [0.334, …]) RDD[(Int, Factor)] b(👨) += Σ∇b(👨) U(👨) += Σ∇U(👨)
  • 83. Spark Streaming ALS (👨, 0.932) (👨, 0.101) RDD[(Int, Double)] (👨, [0.123, …]) (👨, [0.334, …]) RDD[(Int, Factor)] (⚽, 0.274) (⚽, 0.483) RDD[(Int, Double)] b(👨) += Σ∇b(👨) U(👨) += Σ∇U(👨) b(⚽) += Σ∇b(⚽)
  • 84. Spark Streaming ALS (👨, 0.932) (👨, 0.101) RDD[(Int, Double)] (👨, [0.123, …]) (👨, [0.334, …]) RDD[(Int, Factor)] (⚽, 0.274) (⚽, 0.483) RDD[(Int, Double)] (⚽, [0.445, …]) (⚽, [0.843, …]) RDD[(Int, Factor)] b(👨) += Σ∇b(👨) U(👨) += Σ∇U(👨) b(⚽) += Σ∇b(⚽) U(⚽) += Σ∇U(⚽)
  • 85. Spark Streaming ALS DStream[Rating] Window 1 Window 2 Window 3 Window 4 RDD[Rating] RDD[Rating] RDD[Rating] RDD[Rating] StreamALS RDD[Int, Factor] RDD[Int, Factor] StreamALS
  • 86. Spark Streaming ALS RDD[(Int, (Int, Double)] (🧔, (🎱, 3.5)) (,, (⚾, 1.0)) (👨, (🏀, 4.5)) … RDD[(Int, (Int, Double)] (🎱, (🧔, 3.5)) (🏈, (', 3.0)) (⚽, (%, 1.5)) … Windowt RDD[Rating] (👨, 🏀, 4.5) (👩, 🎾, 4.5) (%, ⚽, 1.5) (', 🏈, 3.0) (), ⚽, 2.0) (🧔, 🎱, 3.5) (,, ⚾, 1.0) (., ⚾, 2.5) .map
  • 87. Spark Streaming ALS RDD[(Int, (Int, Double), Int, Factor)] (👨, (🏀, 4.5), 👨, (0.12, [0.9,…])) (👩, (🎾, 4.5), (👩, (-0.1, [1.37,…]))) (🧔, (🎱, 1.5), None) … RDD[(Int, (Int, Factor)] (👨, (0.12, [0.9,…])) (👩, (-0.1, [1.37,…])) (%, (1, [0.123,…])) … RDD[(Int, (Int, Double)] (👨, (🏀, 4.5)) (👩, (🎾, 4.5)) (🧔, (🎱, 1.5)) … userRatings.fullOuterJoin(userFactors).map { case (userId, (_, userFactors)) => (userId, userFactors(featureGenerator.nextValue())) } .fullOuterJoin
  • 88. Data MovieLens Widely used in recommendation engine research Variants Small - 100,000 ratings / 9,000 movies / 700 users Full - 26 million ratings / 45,000 movies / 270,000
  • 89. Training batch ALS val split: Array[RDD[Rating]] = ratings.randomSplit(0.8, 0.2) val model = ALS.train(split(0), rank, iter, lambda)
  • 90. Training batch ALS val split: Array[RDD[Rating]] = ratings.randomSplit(0.8, 0.2) val model = ALS.train(split(0), rank, iter, lambda) val predictions: RDD[Rating] = model.predict(split(1).map { x => (x.user, x.product)) } val pairs = predictions.map(x => ((x.user, x.product), x.rating)) .join(split(1).map(x => ((x.user, x.product), x.rating)) .values Val RMSE = math.sqrt(pairs.map(x => math.pow(x._1 - x._2, 2)).mean())
  • 92. Training streaming ALS val model = StreamingALS(rank, iterations, lambda, gamma) trainingStreamSet.foreachRDD { rdd => model.train(rdd) val RMSE = calculateRMSE(model, validation) } RDD[Rating] RDD[Rating] RDD[Rating] kafka
  • 95. To consider “Cold start” Same as batch ALS Too few observations = meaningless Train offline
  • 96. To consider Hyper-parameter estimation Data Parameters A Model Data Parameters B Data Parameters C Data Parameters D
  • 97. To consider Hyper-parameter estimation Data Parameters A Model Data Parameters B Data Parameters C Data Parameters D Model Parameters A Model Parameters B Parameters C Parameters D Model
  • 98. To consider Hyper-parameter estimation Data Model A Data Model B Data Model C Data Model D Model B Model C Model D Model B Model D Model B
  • 99. To consider Partition A Partition BJoin Partitioning
  • 100. To consider RDD random access? val u = userFeatures.lookup(userId) val p = productFeatures.lookup(productId) val predicted = model.predict(userId, productId, u, p, globalBias) R = U PT x y