Many of today’s most engaging—and commercially important—applications provide personalised experiences to users. Collaborative filtering algorithms capture the commonality between users and enable applications to make personalised recommendations quickly and efficiently.
The Alternating Least Squares (ALS) algorithm is still deemed the industry standard in collaborative filtering. In this talk Sophie will show you how to implement ALS using Apache Spark to build your own recommendation engine. Sophie will show that, by splitting the recommendation engine into multiple cooperating services, it is possible to reduce the system’s complexity and produce a robust collaborative filtering platform with support for continuous model training.
In this presentation you will learn how to build a recommendation system for the case where recorded data is explicitly given as a rating, as well as for the case where the data is less succinct. You will walk away from this talk with the knowledge and tools needed to implement your own recommendation system using collaborative filtering and microservices.
33. MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating)
val ratings: RDD[Rating]
34. MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating)
val ratings: RDD[Rating]
val rank: int
35. MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating)
val ratings: RDD[Rating]
val rank: int
val iterations: int
36. MLlib ALS
val model = ALS.train(ratings, rank, iterations, lambda)
case class Rating(int user, int product, double rating)
val ratings: RDD[Rating]
val rank: int
val iterations: int
val lambda: Double
37. MLlib ALS
> val model = ALS.train(ratings, rank, iterations, lambda)
model: MatrixFactorizationModel
class MatrixFactorizationModel {
val userFeatures: RDD[(Int, Array[Double])]
val productFeatures: RDD[(Int, Array[Double])]
}
88. Data
MovieLens
Widely used in recommendation engine research
Variants
Small - 100,000 ratings / 9,000 movies / 700 users
Full - 26 million ratings / 45,000 movies / 270,000
89. Training batch ALS
val split: Array[RDD[Rating]] = ratings.randomSplit(0.8, 0.2)
val model = ALS.train(split(0), rank, iter, lambda)
90. Training batch ALS
val split: Array[RDD[Rating]] = ratings.randomSplit(0.8, 0.2)
val model = ALS.train(split(0), rank, iter, lambda)
val predictions: RDD[Rating] = model.predict(split(1).map { x =>
(x.user, x.product))
}
val pairs = predictions.map(x => ((x.user, x.product), x.rating))
.join(split(1).map(x => ((x.user, x.product), x.rating))
.values
Val RMSE = math.sqrt(pairs.map(x => math.pow(x._1 - x._2, 2)).mean())
100. To consider
RDD random access?
val u = userFeatures.lookup(userId)
val p = productFeatures.lookup(productId)
val predicted = model.predict(userId, productId, u, p, globalBias)
R
=
U
PT
x
y