SlideShare a Scribd company logo
1 of 48
Download to read offline
Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
Recommender
Systems
Recommender System
Making recommendations
Andrew Ng
Predicting movie ratings
User rates movies using one to five stars
Ratings
= no. of users
nu
= no. of movies
nm
r(i,j)=1 if user j has
rated movie i
y(i,j) = rating given by
user j to movie i
(defined only if r(i,j)=1)
𝑛! = 4
𝑛" = 5
𝑟(1,1) = 1
𝑟(3,1) = 0 𝑦($,&) = 4
Movie Alice(1) Bob(2) Carol(3) Dave(4)
Love at last
Romance forever
Cute puppies of love
Nonstop car chases
Swords vs. karate
5
5
?
0
0
5
?
4
0
0
0
?
0
5
5
0
0
?
4
?
Collaborative Filtering
Using per-item features
Andrew Ng
What if we have features of the movies?
Movie Alice(1) Bob(2) Carol(3) Dave(4)
Love at last 5 5 0 0
Romance forever 5 ? ? 0
Cute puppies of love ? 4 0 ?
Nonstop car chases 0 0 5 4
Swords vs. karate 0 0 5 ?
For user j: Predict user j’s rating for movie i as
For user 1: Predict rating for movie i as:
x1
(romance)
x2
(action)
0.9 0
1.0 0.01
0.99 0
0.1 1.0
0 0.9
just linear
regression
𝑛! = 4
𝑛" = 5
𝑛 = 2
w(1) = !
"
𝑏($)
= 0 x(3) = ".'
" w(1) $ x(3) + b(1) = 4.95
w(j) $ x(i) + b(j)
𝑥($) =
0.9
0
𝑥(()
=
0.99
0
w · x(i) b
+
(1) (1)
Andrew Ng
Notation:
r(i,j) = 1 if user j has rated movie i (0 otherwise)
y(i,j)
= rating given by user j on movie i (if defined)
w(j)
, b(j)
= parameters for user j
x(i)
= feature vector for movie i
For user j and movie i, predict rating: w(j) ! x(i) + b(j)
m(j)
= no. of movies rated by user j
To learn w(j), b(j)
1
2𝑚())
+
*:,(*,)).$
𝑤())
$ 𝑥(*)
+ 𝑏())
− 𝑦(*,)) /
J 𝑤 ( , 𝑏 ( =
min
!(")"(")
Cost function
+
λ
2𝑚($)
+
&'(
)
𝑤&
($) *
Andrew Ng
Cost function
To learn parameters 𝑤((), 𝑏(() for user j :
To learn parameters 𝑤()), 𝑏()), 𝑤(&), 𝑏(&), ⋯ 𝑤 *! , 𝑏 *! for all users :
1
2
+
+:-(+,$)
𝑤($) $ 𝑥(+) + 𝑏($) − 𝑦(+,$) *
+
λ
2
+
&'(
)
𝑤&
($) *
1
2
+
$'(
)$
+
+:- +,$ '(
𝑤($)
$ 𝑥(+)
+𝑏($)
− 𝑦(+,$) *
+
λ
2
+
$'(
)$
+
&'(
)
𝑤&
($) *
J
𝑤()), … , 𝑤 *!
𝑏()), … , 𝑏 *!
=
J 𝑤 ( , 𝑏 ( =
𝑓(𝑥)
Collaborative Filtering
Collaborative filtering algorithm
Problem motivation
Movie Alice (1) Bob (2) Carol (3) Dave (4) x1
(romance)
x2
(action)
Love at last 5 5 0 0 0.9 0
Romance forever 5 ? ? 0 1.0 0.01
Cute puppies of love ? 4 0 ? 0.99 0
Nonstop car chases 0 0 5 4 0.1 1.0
Swords vs. karate 0 0 5 ? 0 0.9
Problem motivation
𝑤(#)%
5
0
𝑏(#) = 0
, 𝑤(&)%
5
0
, 𝑏(&) = 0
, 𝑤(')%
0
5
, 𝑏(') = 0
, 𝑤(()%
0
5
, 𝑏(() = 0
𝑥(#) =
1
0
using 𝒘(𝒋)! 𝒙(𝒊) + 𝒃(𝒋)
→
𝑤(#) ! 𝑥(#) ≈ 5
𝑤(&) ! 𝑥(#) ≈ 5
𝑤(') ! 𝑥(#) ≈ 0
𝑤(() ! 𝑥(#) ≈ 0
Movie Alice (1) Bob (2) Carol (3) Dave (4) x1
(romance)
x2
(action)
Love at last 5 5 0 0 ? ?
Romance forever 5 ? ? 0 ? ?
Cute puppies of love ? 4 0 ? ? ?
Nonstop car chases 0 0 5 4 ? ?
Swords vs. karate 0 0 5 ? ? ?
𝑥(")
𝑥($)
Andrew Ng
Cost function
Given 𝑤("), 𝑏("), 𝑤($), 𝑏($), ⋯ , 𝑤 %$ , 𝑏 %$
To learn 𝑥(")
, 𝑥($)
, ⋯ , 𝑥 %% :
1
2
+
&'"
%%
+
(:* &,( '"
𝑤(() $ 𝑥(&) + 𝑏(() − 𝑦(&,() $
+
λ
2
+
&'"
%%
+
,'"
%
𝑥,
(&) $
to learn 𝑥 &
:
J 𝑥 + =
J 𝑥()), 𝑥 & , … , 𝑥 *+ =
1
2
5
(:- +,( .)
𝑤(() 6 𝑥(+) + 𝑏(() − 𝑦(+,() &
+
,'"
%
𝑥,
(&) $
+
λ
2
Andrew Ng
Collaborative filtering
Cost function to learn 𝑥 "
, ⋯ , 𝑥(%%)
:
Cost function to learn 𝑤("), 𝑏("), ⋯ 𝑤 %$ , 𝑏 %$ :
min
,(%),.(%), ⋯, , &' ,. &'
1
2
3
0%#
1'
3
2:4 2,0 %#
𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) &
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
(0) &
min
6(%), ⋯, 6
(&()
1
2
3
2%#
1(
3
0:4 2,0 %#
𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
Put them together:
min
,(%), …, , &'
.(%), …, . &'
6(%), …, 6 &(
1
2
3
2,0 :4 2,0 %#
𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) &
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
0 &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
𝐽 𝑤, 𝑏, 𝑥 =
Alice Bob Carol
Movie1 5 5 ?
Movie2 ? 2 3
𝑖 = 1
𝑖 = 2
j=1 j=2 j=3
Andrew Ng
Gradient Descent
Linear regression (course 1)
repeat {
𝑤+ = 𝑤+ − 𝛼 <
<=>
𝐽 𝑤, 𝑏
𝑏 = 𝑏 − 𝛼 /
/0
𝐽 𝑤, 𝑏
}
𝑏(() = 𝑏(() − 𝛼 /
/0 ?
𝐽(𝑤, 𝑏, 𝑥)
𝑥1
(+)
= 𝑥1
(+)
− 𝛼
/
/2@
(>) J(w,b,x)
w, b, x
𝑤+
(()
= 𝑤+
(()
− 𝛼 /
/3>
(?) 𝐽 𝑤, 𝑏, 𝑥
Collaborative Filtering
Binary labels:
favs,
likes and clicks
Andrew Ng
Binary labels
Movie Alice(1) Bob(2) Carol(3) Dave(4)
Love at last 1 1 0 0
Romance forever 1 ? ? 0
Cute puppies of love ? 1 0 ?
Nonstop car chases 0 0 1 1
Swords vs. karate 0 0 1 ?
Andrew Ng
Example applications
1. Did user j purchase an item after being shown?
2. Did user j fav/like an item?
3. Did user j spend at least 30sec with an item?
4. Did user j click on an item?
Meaning of ratings:
1 - engaged after being shown item
0 - did not engage after being shown item
? - item not yet shown
Andrew Ng
From regression to binary classification
Previously:
Predict 𝒚 𝒊,𝒋 as 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋)
For binary labels:
Predict that the probability of 𝒚 𝒊,𝒋 = 𝟏
is given by 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋)
where g 𝑧 =
9
9:;"#
g
Andrew Ng
Cost function for binary application
Previous cost function:
Loss for binary labels 𝑦(+,():
Loss for
single
example
cost for all examples
3
2,0 :4 2,0 %#
𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
0 &
1
2
𝐽 𝑤, 𝑏, 𝑥 = +
(&,():* &,( '"
𝐿 𝑓(-,.,/) 𝑥 , 𝑦 &,(
𝑓(3,0,2) 𝑥 = 𝑔(𝑤 ( 6 𝑥 + + 𝑏 ( )
𝐿 𝑓(,,.,6) 𝑥 , 𝑦 2,0 = − 𝑦 2,0 log 𝑓(,,.,6) 𝑥 − 1 − 𝑦 2,0 log 1 − 𝑓(,,.,6) 𝑥
𝑔(𝑤(0) ! 𝑥(2) + 𝑏(0))
Recommender Systems
implementation
Mean normalization
Users who have not rated any movies
Movie Alice(1) Bob (2) Carol (3) Dave (4) Eve (5)
Love at last 5 5 0 0 ?
Romance forever 5 ? ? 0 ?
Cute puppies of love ? 4 0 ? ?
Nonstop car chases 0 0 5 4 ?
Swords vs. karate 0 0 5 ? ?
1
2
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
0 &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
3
2,0 :4 2,0 %#
𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) &
𝐦𝐢𝐧
𝒘(𝟏), ….𝒘 𝒏𝒖
𝒃(𝟏), ….𝒃 𝒏𝒖
𝒙(𝟏), ….𝒙 𝒏𝒎
5
5
?
0
0
5
?
4
0
0
0
?
0
5
5
0
0
?
4
0
?
?
?
?
?
Mean Normalization
For user j, on movie i predict:
User 5 (Eve):
5
5
?
0
0
5
?
4
0
0
0
?
0
5
5
0
0
?
4
0
?
?
?
?
?
2.5
2.5
2
2.25
1.25
𝜇 =
2.5
2.5
?
−2.25
−1.25
2.5
?
2
−2.25
−1.25
−2.5
?
−2
2.75
3.75
−2.5
−2.5
?
1.75
−1.25
?
?
?
?
?
+ 𝜇+
+ 𝜇
Recommender Systems
implementational detail
TensorFlow implementation
Andrew Ng
Gradient descent algorithm
Learning rate
Derivative
Repeat until convergence
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
Derivatives in ML
Andrew Ng
Custom Training Loop
w = tf.Variable(3.0)
x = 1.0
y = 1.0 # target value
alpha = 0.01
iterations = 30
for iter in range(iterations):
# Use TensorFlow’s Gradient tape to record the steps
# used to compute the cost J, to enable auto differentiation.
with tf.GradientTape() as tape:
fwb = w*x
costJ = (fwb - y)**2
# Use the gradient tape to calculate the gradients
# of the cost with respect to the parameter w.
[dJdw] = tape.gradient( costJ, [w] )
# Run one step of gradient descent by updating
# the value of w to reduce the cost.
w.assign_add(-alpha * dJdw)
Fix b = 0 for this example
tf.variables require special function to
modify
Tf.variables are the parameters we want
to optimize
𝑱 = (𝒘𝒙 − 𝟏)𝟐
f(x) y
𝝏
𝝏𝒘
𝑱(𝒘)
f(x)
Auto Diff
Auto Grad
Andrew Ng
Implementation in TensorFlow
Gradient descent algorithm
Repeat until convergence
iterations = 200
for iter in range(iterations):
# Use TensorFlow’s GradientTape
# to record the operations used to compute the cost
with tf.GradientTape() as tape:
# Compute the cost (forward pass is included in cost)
cost_value = cofiCostFuncV(X, W, b, Ynorm, R,
num_users, num_movies, lambda)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to
the loss
grads = tape.gradient( cost_value, [X,W,b] )
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients( zip(grads, [X,W,b]) )
# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)
Dataset credit: Harper and Konstan. 2015. The MovieLens Datasets: History and Context
𝑛E 𝑛F
Collaborative Filtering
Finding related items
Andrew Ng
Finding related items
find item k with 𝒙(𝒌) similar to x(i)
i.e. with smallest
distance
The features x(i) of item i are quite hard to interpret.
𝒙(𝒌) − 𝒙(𝒊) 𝟐
5
8.)
*
𝑥8
(1)
− 𝑥8
(+) &
𝒙(𝒌)
x(i)
To find other items related to it,
Andrew Ng
Limitations of Collaborative Filtering
Cold start problem. How to
• rank new items that few users have rated?
• show something reasonable to new users who have rated few
items?
Use side information about items or users:
• Item: Genre, movie stars, studio, ….
• User: Demographics (age, gender, location), expressed
preferences, …
Content-based Filtering
Collaborative filtering
vs
Content-based filtering
Andrew Ng
Collaborative filtering vs Content-based filtering
Collaborative filtering:
Recommend items to you based on rating of users who
gave similar ratings as you
Content-based filtering:
Recommend items to you based on features of user and item
to find good match
if user j has rated item i
rating given by user j on item i (if defined)
Andrew Ng
Examples of user and item features
Movie features:
• Year
• Genre/Genres
• Reviews
• Average rating
• …
𝐱𝐮
(𝐣)
𝐟𝐨𝐫 𝐮𝐬𝐞𝐫 𝐣
𝐱𝐦
(𝐢)
𝐟𝐨𝐫 𝐦𝐨𝐯𝐢𝐞 𝐢
User features:
• Age
• Gender
• Country
• Movies watched
• Average rating per genre
• …
Vector size
could be
different
Andrew Ng
Content-based filtering: Learning to match
Predict rating of user j on movie i as
computed
from 𝒙𝒖
(𝒋)
computed
from 𝒙𝒎
(𝒊)
Content-based Filtering
Deep learning for
content-based filtering
Andrew Ng
Neural network architecture
User network
128
64 32
Movie network
32
128
256
Prediction :
𝒗𝒖 $ 𝒗𝒎
xu vu
⋮
⋮ ⋮
xm vm
⋮
⋮
⋮
𝒕𝒐 𝒑𝒓𝒆𝒅𝒊𝒄𝒕 𝒕𝒉𝒆 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 that 𝒚(𝒊,𝒋) 𝒊𝒔 𝟏
𝒈
Andrew Ng
Neural network architecture
Cost
function
⋮
𝐽 = 5
+,( :-(+,().)
𝑣!
(() 6 𝑣"
(+) − 𝑦(+,() &
+ NN regularization term
Prediction
vu
xu
⋮
⋮
xm
vm
⋮ ⋮ ⋮
Andrew Ng
Learned user and item vectors:
Note: This can be pre-computed ahead of time
To find movies similar to movie i:
𝒗𝒖
(𝒋)
is a vector of length 32 that describes user j with features 𝒙𝒖
(𝒋)
𝒗𝒎
(𝒊)
is a vector of length 32 that describes movie i with features 𝒙𝒎
(𝒊)
Advanced implementation
Recommending from
a large catalogue
Andrew Ng
vu
xu
⋮
⋮
xm
vm
⋮ ⋮ ⋮
⋮
predictions
How to efficiently find recommendation from
a large set of items?
• Movies 1000+
• Ads 1m+
• Songs 10m+
• Products 10m+
Andrew Ng
Two steps: Retrieval & Ranking
Retrieval:
• Generate large list of plausible item candidates
e.g.
1) For each of the last 10 movies watched by the user,
find 10 most similar movies
2) For most viewed 3 genres, find the top 10 movies
3) Top 20 movies in the country
• Combine retrieved items into list, removing duplicates
and items already watched/purchased
Andrew Ng
Two steps: Retrieval & ranking
• Take list retrieved
and rank using
learned model
• Display ranked items to user
vu
xu
⋮
⋮
xm
vm
⋮ ⋮ ⋮
⋮
predictions
Ranking:
Andrew Ng
Retrieval step
• Retrieving more items results in better performance,
but slower recommendations.
• To analyse/optimize the trade-off, carry out offline experiments
to see if retrieving additional items results in more relevant
recommendations (i.e., 𝑝 𝑦 &,(
= 1 of items displayed to user
are higher).
Advanced implementation
Ethical use of
recommender systems
Andrew Ng
What is the goal of the recommender system?
Recommend:
• Movies most likely to be rated 5 stars by user
• Products most likely to be purchased
• Ads most likely to be clicked on
• Products generating the largest profit
• Video leading to maximum watch time
Andrew Ng
Ethical considerations with recommender systems
Travel industry
More
profitable
Good travel experience
to more users
Bid higher
for ads
Payday loans
Squeeze customers
more
More profit
Bid higher
for ads
Amelioration: Do not accept ads from exploitative businesses
Andrew Ng
Other problematic cases:
• Maximizing user engagement (e.g. watch time) has led to large
social media/video sharing sites to amplify conspiracy theories and
hate/toxicity
Amelioration : Filter out problematic content such as hate speech,
fraud, scams and violent content
• Can a ranking system maximize your profit rather than users’
welfare be presented in a transparent way?
Amelioration : Be transparent with users
Content-based Filtering
TensorFlow Implementation
user_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32)
])
item_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32 )
])
# create the user input and point to the base network
input_user = tf.keras.layers.Input(shape=(num_user_features))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)
# create the item input and point to the base network
input_item = tf.keras.layers.Input(shape=(num_item_features))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)
# measure the similarity of the two vector outputs
output = tf.keras.layers.Dot(axes=1)([vu, vm])
# specify the inputs and output of the model
model = Model([input_user, input_item], output)
# Specify the cost function
cost_fn = tf.keras.losses.MeanSquaredError()
vu
vm
Prediction

More Related Content

Similar to C3_W2.pdf

Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationhyunsung lee
 
Refactoring Ruby Code
Refactoring Ruby CodeRefactoring Ruby Code
Refactoring Ruby CodeCaike Souza
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015ihji
 
Python in one shot
Python in one shotPython in one shot
Python in one shotGulshan76
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Simplilearn
 
Model-based GUI testing using UPPAAL
Model-based GUI testing using UPPAALModel-based GUI testing using UPPAAL
Model-based GUI testing using UPPAALUlrik Hørlyk Hjort
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jMax De Marzi
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionKazuki Fujikawa
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsNAVER Engineering
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIJack Clark
 
Gan seminar
Gan seminarGan seminar
Gan seminarSan Kim
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With RDavid Chiu
 
Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to GremlinMax De Marzi
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 

Similar to C3_W2.pdf (20)

Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Refactoring
RefactoringRefactoring
Refactoring
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorization
 
Refactoring Ruby Code
Refactoring Ruby CodeRefactoring Ruby Code
Refactoring Ruby Code
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015
 
Python in One Shot.docx
Python in One Shot.docxPython in One Shot.docx
Python in One Shot.docx
 
Python in one shot
Python in one shotPython in one shot
Python in one shot
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Model-based GUI testing using UPPAAL
Model-based GUI testing using UPPAALModel-based GUI testing using UPPAAL
Model-based GUI testing using UPPAAL
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4j
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With R
 
Raccomender engines
Raccomender enginesRaccomender engines
Raccomender engines
 
Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to Gremlin
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 

Recently uploaded

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationAadityaSharma884161
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........LeaCamillePacle
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 

Recently uploaded (20)

Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
ROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint PresentationROOT CAUSE ANALYSIS PowerPoint Presentation
ROOT CAUSE ANALYSIS PowerPoint Presentation
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
 
Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........Atmosphere science 7 quarter 4 .........
Atmosphere science 7 quarter 4 .........
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 

C3_W2.pdf

  • 1. Copyright Notice These slides are distributed under the Creative Commons License. DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute these slides for commercial purposes. You may make copies of these slides and use or distribute them for educational purposes as long as you cite DeepLearning.AI as the source of the slides. For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
  • 4. Andrew Ng Predicting movie ratings User rates movies using one to five stars Ratings = no. of users nu = no. of movies nm r(i,j)=1 if user j has rated movie i y(i,j) = rating given by user j to movie i (defined only if r(i,j)=1) 𝑛! = 4 𝑛" = 5 𝑟(1,1) = 1 𝑟(3,1) = 0 𝑦($,&) = 4 Movie Alice(1) Bob(2) Carol(3) Dave(4) Love at last Romance forever Cute puppies of love Nonstop car chases Swords vs. karate 5 5 ? 0 0 5 ? 4 0 0 0 ? 0 5 5 0 0 ? 4 ?
  • 6. Andrew Ng What if we have features of the movies? Movie Alice(1) Bob(2) Carol(3) Dave(4) Love at last 5 5 0 0 Romance forever 5 ? ? 0 Cute puppies of love ? 4 0 ? Nonstop car chases 0 0 5 4 Swords vs. karate 0 0 5 ? For user j: Predict user j’s rating for movie i as For user 1: Predict rating for movie i as: x1 (romance) x2 (action) 0.9 0 1.0 0.01 0.99 0 0.1 1.0 0 0.9 just linear regression 𝑛! = 4 𝑛" = 5 𝑛 = 2 w(1) = ! " 𝑏($) = 0 x(3) = ".' " w(1) $ x(3) + b(1) = 4.95 w(j) $ x(i) + b(j) 𝑥($) = 0.9 0 𝑥(() = 0.99 0 w · x(i) b + (1) (1)
  • 7. Andrew Ng Notation: r(i,j) = 1 if user j has rated movie i (0 otherwise) y(i,j) = rating given by user j on movie i (if defined) w(j) , b(j) = parameters for user j x(i) = feature vector for movie i For user j and movie i, predict rating: w(j) ! x(i) + b(j) m(j) = no. of movies rated by user j To learn w(j), b(j) 1 2𝑚()) + *:,(*,)).$ 𝑤()) $ 𝑥(*) + 𝑏()) − 𝑦(*,)) / J 𝑤 ( , 𝑏 ( = min !(")"(") Cost function + λ 2𝑚($) + &'( ) 𝑤& ($) *
  • 8. Andrew Ng Cost function To learn parameters 𝑤((), 𝑏(() for user j : To learn parameters 𝑤()), 𝑏()), 𝑤(&), 𝑏(&), ⋯ 𝑤 *! , 𝑏 *! for all users : 1 2 + +:-(+,$) 𝑤($) $ 𝑥(+) + 𝑏($) − 𝑦(+,$) * + λ 2 + &'( ) 𝑤& ($) * 1 2 + $'( )$ + +:- +,$ '( 𝑤($) $ 𝑥(+) +𝑏($) − 𝑦(+,$) * + λ 2 + $'( )$ + &'( ) 𝑤& ($) * J 𝑤()), … , 𝑤 *! 𝑏()), … , 𝑏 *! = J 𝑤 ( , 𝑏 ( = 𝑓(𝑥)
  • 10. Problem motivation Movie Alice (1) Bob (2) Carol (3) Dave (4) x1 (romance) x2 (action) Love at last 5 5 0 0 0.9 0 Romance forever 5 ? ? 0 1.0 0.01 Cute puppies of love ? 4 0 ? 0.99 0 Nonstop car chases 0 0 5 4 0.1 1.0 Swords vs. karate 0 0 5 ? 0 0.9
  • 11. Problem motivation 𝑤(#)% 5 0 𝑏(#) = 0 , 𝑤(&)% 5 0 , 𝑏(&) = 0 , 𝑤(')% 0 5 , 𝑏(') = 0 , 𝑤(()% 0 5 , 𝑏(() = 0 𝑥(#) = 1 0 using 𝒘(𝒋)! 𝒙(𝒊) + 𝒃(𝒋) → 𝑤(#) ! 𝑥(#) ≈ 5 𝑤(&) ! 𝑥(#) ≈ 5 𝑤(') ! 𝑥(#) ≈ 0 𝑤(() ! 𝑥(#) ≈ 0 Movie Alice (1) Bob (2) Carol (3) Dave (4) x1 (romance) x2 (action) Love at last 5 5 0 0 ? ? Romance forever 5 ? ? 0 ? ? Cute puppies of love ? 4 0 ? ? ? Nonstop car chases 0 0 5 4 ? ? Swords vs. karate 0 0 5 ? ? ? 𝑥(") 𝑥($)
  • 12. Andrew Ng Cost function Given 𝑤("), 𝑏("), 𝑤($), 𝑏($), ⋯ , 𝑤 %$ , 𝑏 %$ To learn 𝑥(") , 𝑥($) , ⋯ , 𝑥 %% : 1 2 + &'" %% + (:* &,( '" 𝑤(() $ 𝑥(&) + 𝑏(() − 𝑦(&,() $ + λ 2 + &'" %% + ,'" % 𝑥, (&) $ to learn 𝑥 & : J 𝑥 + = J 𝑥()), 𝑥 & , … , 𝑥 *+ = 1 2 5 (:- +,( .) 𝑤(() 6 𝑥(+) + 𝑏(() − 𝑦(+,() & + ,'" % 𝑥, (&) $ + λ 2
  • 13. Andrew Ng Collaborative filtering Cost function to learn 𝑥 " , ⋯ , 𝑥(%%) : Cost function to learn 𝑤("), 𝑏("), ⋯ 𝑤 %$ , 𝑏 %$ : min ,(%),.(%), ⋯, , &' ,. &' 1 2 3 0%# 1' 3 2:4 2,0 %# 𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) & + λ 2 3 0%# 1' 3 5%# 1 𝑤5 (0) & min 6(%), ⋯, 6 (&() 1 2 3 2%# 1( 3 0:4 2,0 %# 𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & Put them together: min ,(%), …, , &' .(%), …, . &' 6(%), …, 6 &( 1 2 3 2,0 :4 2,0 %# 𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) & + λ 2 3 0%# 1' 3 5%# 1 𝑤5 0 & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & 𝐽 𝑤, 𝑏, 𝑥 = Alice Bob Carol Movie1 5 5 ? Movie2 ? 2 3 𝑖 = 1 𝑖 = 2 j=1 j=2 j=3
  • 14. Andrew Ng Gradient Descent Linear regression (course 1) repeat { 𝑤+ = 𝑤+ − 𝛼 < <=> 𝐽 𝑤, 𝑏 𝑏 = 𝑏 − 𝛼 / /0 𝐽 𝑤, 𝑏 } 𝑏(() = 𝑏(() − 𝛼 / /0 ? 𝐽(𝑤, 𝑏, 𝑥) 𝑥1 (+) = 𝑥1 (+) − 𝛼 / /2@ (>) J(w,b,x) w, b, x 𝑤+ (() = 𝑤+ (() − 𝛼 / /3> (?) 𝐽 𝑤, 𝑏, 𝑥
  • 16. Andrew Ng Binary labels Movie Alice(1) Bob(2) Carol(3) Dave(4) Love at last 1 1 0 0 Romance forever 1 ? ? 0 Cute puppies of love ? 1 0 ? Nonstop car chases 0 0 1 1 Swords vs. karate 0 0 1 ?
  • 17. Andrew Ng Example applications 1. Did user j purchase an item after being shown? 2. Did user j fav/like an item? 3. Did user j spend at least 30sec with an item? 4. Did user j click on an item? Meaning of ratings: 1 - engaged after being shown item 0 - did not engage after being shown item ? - item not yet shown
  • 18. Andrew Ng From regression to binary classification Previously: Predict 𝒚 𝒊,𝒋 as 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋) For binary labels: Predict that the probability of 𝒚 𝒊,𝒋 = 𝟏 is given by 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋) where g 𝑧 = 9 9:;"# g
  • 19. Andrew Ng Cost function for binary application Previous cost function: Loss for binary labels 𝑦(+,(): Loss for single example cost for all examples 3 2,0 :4 2,0 %# 𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & + λ 2 3 0%# 1' 3 5%# 1 𝑤5 0 & 1 2 𝐽 𝑤, 𝑏, 𝑥 = + (&,():* &,( '" 𝐿 𝑓(-,.,/) 𝑥 , 𝑦 &,( 𝑓(3,0,2) 𝑥 = 𝑔(𝑤 ( 6 𝑥 + + 𝑏 ( ) 𝐿 𝑓(,,.,6) 𝑥 , 𝑦 2,0 = − 𝑦 2,0 log 𝑓(,,.,6) 𝑥 − 1 − 𝑦 2,0 log 1 − 𝑓(,,.,6) 𝑥 𝑔(𝑤(0) ! 𝑥(2) + 𝑏(0))
  • 21. Users who have not rated any movies Movie Alice(1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 ? Romance forever 5 ? ? 0 ? Cute puppies of love ? 4 0 ? ? Nonstop car chases 0 0 5 4 ? Swords vs. karate 0 0 5 ? ? 1 2 + λ 2 3 0%# 1' 3 5%# 1 𝑤5 0 & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & 3 2,0 :4 2,0 %# 𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) & 𝐦𝐢𝐧 𝒘(𝟏), ….𝒘 𝒏𝒖 𝒃(𝟏), ….𝒃 𝒏𝒖 𝒙(𝟏), ….𝒙 𝒏𝒎 5 5 ? 0 0 5 ? 4 0 0 0 ? 0 5 5 0 0 ? 4 0 ? ? ? ? ?
  • 22. Mean Normalization For user j, on movie i predict: User 5 (Eve): 5 5 ? 0 0 5 ? 4 0 0 0 ? 0 5 5 0 0 ? 4 0 ? ? ? ? ? 2.5 2.5 2 2.25 1.25 𝜇 = 2.5 2.5 ? −2.25 −1.25 2.5 ? 2 −2.25 −1.25 −2.5 ? −2 2.75 3.75 −2.5 −2.5 ? 1.75 −1.25 ? ? ? ? ? + 𝜇+ + 𝜇
  • 24. Andrew Ng Gradient descent algorithm Learning rate Derivative Repeat until convergence 0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 Derivatives in ML
  • 25. Andrew Ng Custom Training Loop w = tf.Variable(3.0) x = 1.0 y = 1.0 # target value alpha = 0.01 iterations = 30 for iter in range(iterations): # Use TensorFlow’s Gradient tape to record the steps # used to compute the cost J, to enable auto differentiation. with tf.GradientTape() as tape: fwb = w*x costJ = (fwb - y)**2 # Use the gradient tape to calculate the gradients # of the cost with respect to the parameter w. [dJdw] = tape.gradient( costJ, [w] ) # Run one step of gradient descent by updating # the value of w to reduce the cost. w.assign_add(-alpha * dJdw) Fix b = 0 for this example tf.variables require special function to modify Tf.variables are the parameters we want to optimize 𝑱 = (𝒘𝒙 − 𝟏)𝟐 f(x) y 𝝏 𝝏𝒘 𝑱(𝒘) f(x) Auto Diff Auto Grad
  • 26. Andrew Ng Implementation in TensorFlow Gradient descent algorithm Repeat until convergence iterations = 200 for iter in range(iterations): # Use TensorFlow’s GradientTape # to record the operations used to compute the cost with tf.GradientTape() as tape: # Compute the cost (forward pass is included in cost) cost_value = cofiCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda) # Use the gradient tape to automatically retrieve # the gradients of the trainable variables with respect to the loss grads = tape.gradient( cost_value, [X,W,b] ) # Run one step of gradient descent by updating # the value of the variables to minimize the loss. optimizer.apply_gradients( zip(grads, [X,W,b]) ) # Instantiate an optimizer. optimizer = keras.optimizers.Adam(learning_rate=1e-1) Dataset credit: Harper and Konstan. 2015. The MovieLens Datasets: History and Context 𝑛E 𝑛F
  • 28. Andrew Ng Finding related items find item k with 𝒙(𝒌) similar to x(i) i.e. with smallest distance The features x(i) of item i are quite hard to interpret. 𝒙(𝒌) − 𝒙(𝒊) 𝟐 5 8.) * 𝑥8 (1) − 𝑥8 (+) & 𝒙(𝒌) x(i) To find other items related to it,
  • 29. Andrew Ng Limitations of Collaborative Filtering Cold start problem. How to • rank new items that few users have rated? • show something reasonable to new users who have rated few items? Use side information about items or users: • Item: Genre, movie stars, studio, …. • User: Demographics (age, gender, location), expressed preferences, …
  • 31. Andrew Ng Collaborative filtering vs Content-based filtering Collaborative filtering: Recommend items to you based on rating of users who gave similar ratings as you Content-based filtering: Recommend items to you based on features of user and item to find good match if user j has rated item i rating given by user j on item i (if defined)
  • 32. Andrew Ng Examples of user and item features Movie features: • Year • Genre/Genres • Reviews • Average rating • … 𝐱𝐮 (𝐣) 𝐟𝐨𝐫 𝐮𝐬𝐞𝐫 𝐣 𝐱𝐦 (𝐢) 𝐟𝐨𝐫 𝐦𝐨𝐯𝐢𝐞 𝐢 User features: • Age • Gender • Country • Movies watched • Average rating per genre • … Vector size could be different
  • 33. Andrew Ng Content-based filtering: Learning to match Predict rating of user j on movie i as computed from 𝒙𝒖 (𝒋) computed from 𝒙𝒎 (𝒊)
  • 34. Content-based Filtering Deep learning for content-based filtering
  • 35. Andrew Ng Neural network architecture User network 128 64 32 Movie network 32 128 256 Prediction : 𝒗𝒖 $ 𝒗𝒎 xu vu ⋮ ⋮ ⋮ xm vm ⋮ ⋮ ⋮ 𝒕𝒐 𝒑𝒓𝒆𝒅𝒊𝒄𝒕 𝒕𝒉𝒆 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 that 𝒚(𝒊,𝒋) 𝒊𝒔 𝟏 𝒈
  • 36. Andrew Ng Neural network architecture Cost function ⋮ 𝐽 = 5 +,( :-(+,().) 𝑣! (() 6 𝑣" (+) − 𝑦(+,() & + NN regularization term Prediction vu xu ⋮ ⋮ xm vm ⋮ ⋮ ⋮
  • 37. Andrew Ng Learned user and item vectors: Note: This can be pre-computed ahead of time To find movies similar to movie i: 𝒗𝒖 (𝒋) is a vector of length 32 that describes user j with features 𝒙𝒖 (𝒋) 𝒗𝒎 (𝒊) is a vector of length 32 that describes movie i with features 𝒙𝒎 (𝒊)
  • 39. Andrew Ng vu xu ⋮ ⋮ xm vm ⋮ ⋮ ⋮ ⋮ predictions How to efficiently find recommendation from a large set of items? • Movies 1000+ • Ads 1m+ • Songs 10m+ • Products 10m+
  • 40. Andrew Ng Two steps: Retrieval & Ranking Retrieval: • Generate large list of plausible item candidates e.g. 1) For each of the last 10 movies watched by the user, find 10 most similar movies 2) For most viewed 3 genres, find the top 10 movies 3) Top 20 movies in the country • Combine retrieved items into list, removing duplicates and items already watched/purchased
  • 41. Andrew Ng Two steps: Retrieval & ranking • Take list retrieved and rank using learned model • Display ranked items to user vu xu ⋮ ⋮ xm vm ⋮ ⋮ ⋮ ⋮ predictions Ranking:
  • 42. Andrew Ng Retrieval step • Retrieving more items results in better performance, but slower recommendations. • To analyse/optimize the trade-off, carry out offline experiments to see if retrieving additional items results in more relevant recommendations (i.e., 𝑝 𝑦 &,( = 1 of items displayed to user are higher).
  • 43. Advanced implementation Ethical use of recommender systems
  • 44. Andrew Ng What is the goal of the recommender system? Recommend: • Movies most likely to be rated 5 stars by user • Products most likely to be purchased • Ads most likely to be clicked on • Products generating the largest profit • Video leading to maximum watch time
  • 45. Andrew Ng Ethical considerations with recommender systems Travel industry More profitable Good travel experience to more users Bid higher for ads Payday loans Squeeze customers more More profit Bid higher for ads Amelioration: Do not accept ads from exploitative businesses
  • 46. Andrew Ng Other problematic cases: • Maximizing user engagement (e.g. watch time) has led to large social media/video sharing sites to amplify conspiracy theories and hate/toxicity Amelioration : Filter out problematic content such as hate speech, fraud, scams and violent content • Can a ranking system maximize your profit rather than users’ welfare be presented in a transparent way? Amelioration : Be transparent with users
  • 48. user_NN = tf.keras.models.Sequential([ tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(32) ]) item_NN = tf.keras.models.Sequential([ tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(32 ) ]) # create the user input and point to the base network input_user = tf.keras.layers.Input(shape=(num_user_features)) vu = user_NN(input_user) vu = tf.linalg.l2_normalize(vu, axis=1) # create the item input and point to the base network input_item = tf.keras.layers.Input(shape=(num_item_features)) vm = item_NN(input_item) vm = tf.linalg.l2_normalize(vm, axis=1) # measure the similarity of the two vector outputs output = tf.keras.layers.Dot(axes=1)([vu, vm]) # specify the inputs and output of the model model = Model([input_user, input_item], output) # Specify the cost function cost_fn = tf.keras.losses.MeanSquaredError() vu vm Prediction