SlideShare a Scribd company logo
Copyright Notice
These slides are distributed under the Creative Commons License.
DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute
these slides for commercial purposes. You may make copies of these slides and use or distribute them for
educational purposes as long as you cite DeepLearning.AI as the source of the slides.
For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
Recommender
Systems
Recommender System
Making recommendations
Andrew Ng
Predicting movie ratings
User rates movies using one to five stars
Ratings
= no. of users
nu
= no. of movies
nm
r(i,j)=1 if user j has
rated movie i
y(i,j) = rating given by
user j to movie i
(defined only if r(i,j)=1)
𝑛! = 4
𝑛" = 5
𝑟(1,1) = 1
𝑟(3,1) = 0 𝑦($,&) = 4
Movie Alice(1) Bob(2) Carol(3) Dave(4)
Love at last
Romance forever
Cute puppies of love
Nonstop car chases
Swords vs. karate
5
5
?
0
0
5
?
4
0
0
0
?
0
5
5
0
0
?
4
?
Collaborative Filtering
Using per-item features
Andrew Ng
What if we have features of the movies?
Movie Alice(1) Bob(2) Carol(3) Dave(4)
Love at last 5 5 0 0
Romance forever 5 ? ? 0
Cute puppies of love ? 4 0 ?
Nonstop car chases 0 0 5 4
Swords vs. karate 0 0 5 ?
For user j: Predict user j’s rating for movie i as
For user 1: Predict rating for movie i as:
x1
(romance)
x2
(action)
0.9 0
1.0 0.01
0.99 0
0.1 1.0
0 0.9
just linear
regression
𝑛! = 4
𝑛" = 5
𝑛 = 2
w(1) = !
"
𝑏($)
= 0 x(3) = ".'
" w(1) $ x(3) + b(1) = 4.95
w(j) $ x(i) + b(j)
𝑥($) =
0.9
0
𝑥(()
=
0.99
0
w · x(i) b
+
(1) (1)
Andrew Ng
Notation:
r(i,j) = 1 if user j has rated movie i (0 otherwise)
y(i,j)
= rating given by user j on movie i (if defined)
w(j)
, b(j)
= parameters for user j
x(i)
= feature vector for movie i
For user j and movie i, predict rating: w(j) ! x(i) + b(j)
m(j)
= no. of movies rated by user j
To learn w(j), b(j)
1
2𝑚())
+
*:,(*,)).$
𝑤())
$ 𝑥(*)
+ 𝑏())
− 𝑦(*,)) /
J 𝑤 ( , 𝑏 ( =
min
!(")"(")
Cost function
+
λ
2𝑚($)
+
&'(
)
𝑤&
($) *
Andrew Ng
Cost function
To learn parameters 𝑤((), 𝑏(() for user j :
To learn parameters 𝑤()), 𝑏()), 𝑤(&), 𝑏(&), ⋯ 𝑤 *! , 𝑏 *! for all users :
1
2
+
+:-(+,$)
𝑤($) $ 𝑥(+) + 𝑏($) − 𝑦(+,$) *
+
λ
2
+
&'(
)
𝑤&
($) *
1
2
+
$'(
)$
+
+:- +,$ '(
𝑤($)
$ 𝑥(+)
+𝑏($)
− 𝑦(+,$) *
+
λ
2
+
$'(
)$
+
&'(
)
𝑤&
($) *
J
𝑤()), … , 𝑤 *!
𝑏()), … , 𝑏 *!
=
J 𝑤 ( , 𝑏 ( =
𝑓(𝑥)
Collaborative Filtering
Collaborative filtering algorithm
Problem motivation
Movie Alice (1) Bob (2) Carol (3) Dave (4) x1
(romance)
x2
(action)
Love at last 5 5 0 0 0.9 0
Romance forever 5 ? ? 0 1.0 0.01
Cute puppies of love ? 4 0 ? 0.99 0
Nonstop car chases 0 0 5 4 0.1 1.0
Swords vs. karate 0 0 5 ? 0 0.9
Problem motivation
𝑤(#)%
5
0
𝑏(#) = 0
, 𝑤(&)%
5
0
, 𝑏(&) = 0
, 𝑤(')%
0
5
, 𝑏(') = 0
, 𝑤(()%
0
5
, 𝑏(() = 0
𝑥(#) =
1
0
using 𝒘(𝒋)! 𝒙(𝒊) + 𝒃(𝒋)
→
𝑤(#) ! 𝑥(#) ≈ 5
𝑤(&) ! 𝑥(#) ≈ 5
𝑤(') ! 𝑥(#) ≈ 0
𝑤(() ! 𝑥(#) ≈ 0
Movie Alice (1) Bob (2) Carol (3) Dave (4) x1
(romance)
x2
(action)
Love at last 5 5 0 0 ? ?
Romance forever 5 ? ? 0 ? ?
Cute puppies of love ? 4 0 ? ? ?
Nonstop car chases 0 0 5 4 ? ?
Swords vs. karate 0 0 5 ? ? ?
𝑥(")
𝑥($)
Andrew Ng
Cost function
Given 𝑤("), 𝑏("), 𝑤($), 𝑏($), ⋯ , 𝑤 %$ , 𝑏 %$
To learn 𝑥(")
, 𝑥($)
, ⋯ , 𝑥 %% :
1
2
+
&'"
%%
+
(:* &,( '"
𝑤(() $ 𝑥(&) + 𝑏(() − 𝑦(&,() $
+
λ
2
+
&'"
%%
+
,'"
%
𝑥,
(&) $
to learn 𝑥 &
:
J 𝑥 + =
J 𝑥()), 𝑥 & , … , 𝑥 *+ =
1
2
5
(:- +,( .)
𝑤(() 6 𝑥(+) + 𝑏(() − 𝑦(+,() &
+
,'"
%
𝑥,
(&) $
+
λ
2
Andrew Ng
Collaborative filtering
Cost function to learn 𝑥 "
, ⋯ , 𝑥(%%)
:
Cost function to learn 𝑤("), 𝑏("), ⋯ 𝑤 %$ , 𝑏 %$ :
min
,(%),.(%), ⋯, , &' ,. &'
1
2
3
0%#
1'
3
2:4 2,0 %#
𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) &
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
(0) &
min
6(%), ⋯, 6
(&()
1
2
3
2%#
1(
3
0:4 2,0 %#
𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
Put them together:
min
,(%), …, , &'
.(%), …, . &'
6(%), …, 6 &(
1
2
3
2,0 :4 2,0 %#
𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) &
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
0 &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
𝐽 𝑤, 𝑏, 𝑥 =
Alice Bob Carol
Movie1 5 5 ?
Movie2 ? 2 3
𝑖 = 1
𝑖 = 2
j=1 j=2 j=3
Andrew Ng
Gradient Descent
Linear regression (course 1)
repeat {
𝑤+ = 𝑤+ − 𝛼 <
<=>
𝐽 𝑤, 𝑏
𝑏 = 𝑏 − 𝛼 /
/0
𝐽 𝑤, 𝑏
}
𝑏(() = 𝑏(() − 𝛼 /
/0 ?
𝐽(𝑤, 𝑏, 𝑥)
𝑥1
(+)
= 𝑥1
(+)
− 𝛼
/
/2@
(>) J(w,b,x)
w, b, x
𝑤+
(()
= 𝑤+
(()
− 𝛼 /
/3>
(?) 𝐽 𝑤, 𝑏, 𝑥
Collaborative Filtering
Binary labels:
favs,
likes and clicks
Andrew Ng
Binary labels
Movie Alice(1) Bob(2) Carol(3) Dave(4)
Love at last 1 1 0 0
Romance forever 1 ? ? 0
Cute puppies of love ? 1 0 ?
Nonstop car chases 0 0 1 1
Swords vs. karate 0 0 1 ?
Andrew Ng
Example applications
1. Did user j purchase an item after being shown?
2. Did user j fav/like an item?
3. Did user j spend at least 30sec with an item?
4. Did user j click on an item?
Meaning of ratings:
1 - engaged after being shown item
0 - did not engage after being shown item
? - item not yet shown
Andrew Ng
From regression to binary classification
Previously:
Predict 𝒚 𝒊,𝒋 as 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋)
For binary labels:
Predict that the probability of 𝒚 𝒊,𝒋 = 𝟏
is given by 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋)
where g 𝑧 =
9
9:;"#
g
Andrew Ng
Cost function for binary application
Previous cost function:
Loss for binary labels 𝑦(+,():
Loss for
single
example
cost for all examples
3
2,0 :4 2,0 %#
𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
0 &
1
2
𝐽 𝑤, 𝑏, 𝑥 = +
(&,():* &,( '"
𝐿 𝑓(-,.,/) 𝑥 , 𝑦 &,(
𝑓(3,0,2) 𝑥 = 𝑔(𝑤 ( 6 𝑥 + + 𝑏 ( )
𝐿 𝑓(,,.,6) 𝑥 , 𝑦 2,0 = − 𝑦 2,0 log 𝑓(,,.,6) 𝑥 − 1 − 𝑦 2,0 log 1 − 𝑓(,,.,6) 𝑥
𝑔(𝑤(0) ! 𝑥(2) + 𝑏(0))
Recommender Systems
implementation
Mean normalization
Users who have not rated any movies
Movie Alice(1) Bob (2) Carol (3) Dave (4) Eve (5)
Love at last 5 5 0 0 ?
Romance forever 5 ? ? 0 ?
Cute puppies of love ? 4 0 ? ?
Nonstop car chases 0 0 5 4 ?
Swords vs. karate 0 0 5 ? ?
1
2
+
λ
2
3
0%#
1'
3
5%#
1
𝑤5
0 &
+
λ
2
3
2%#
1(
3
5%#
1
𝑥5
(2) &
3
2,0 :4 2,0 %#
𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) &
𝐦𝐢𝐧
𝒘(𝟏), ….𝒘 𝒏𝒖
𝒃(𝟏), ….𝒃 𝒏𝒖
𝒙(𝟏), ….𝒙 𝒏𝒎
5
5
?
0
0
5
?
4
0
0
0
?
0
5
5
0
0
?
4
0
?
?
?
?
?
Mean Normalization
For user j, on movie i predict:
User 5 (Eve):
5
5
?
0
0
5
?
4
0
0
0
?
0
5
5
0
0
?
4
0
?
?
?
?
?
2.5
2.5
2
2.25
1.25
𝜇 =
2.5
2.5
?
−2.25
−1.25
2.5
?
2
−2.25
−1.25
−2.5
?
−2
2.75
3.75
−2.5
−2.5
?
1.75
−1.25
?
?
?
?
?
+ 𝜇+
+ 𝜇
Recommender Systems
implementational detail
TensorFlow implementation
Andrew Ng
Gradient descent algorithm
Learning rate
Derivative
Repeat until convergence
0
1
2
3
-0.5 0 0.5 1 1.5 2 2.5
Derivatives in ML
Andrew Ng
Custom Training Loop
w = tf.Variable(3.0)
x = 1.0
y = 1.0 # target value
alpha = 0.01
iterations = 30
for iter in range(iterations):
# Use TensorFlow’s Gradient tape to record the steps
# used to compute the cost J, to enable auto differentiation.
with tf.GradientTape() as tape:
fwb = w*x
costJ = (fwb - y)**2
# Use the gradient tape to calculate the gradients
# of the cost with respect to the parameter w.
[dJdw] = tape.gradient( costJ, [w] )
# Run one step of gradient descent by updating
# the value of w to reduce the cost.
w.assign_add(-alpha * dJdw)
Fix b = 0 for this example
tf.variables require special function to
modify
Tf.variables are the parameters we want
to optimize
𝑱 = (𝒘𝒙 − 𝟏)𝟐
f(x) y
𝝏
𝝏𝒘
𝑱(𝒘)
f(x)
Auto Diff
Auto Grad
Andrew Ng
Implementation in TensorFlow
Gradient descent algorithm
Repeat until convergence
iterations = 200
for iter in range(iterations):
# Use TensorFlow’s GradientTape
# to record the operations used to compute the cost
with tf.GradientTape() as tape:
# Compute the cost (forward pass is included in cost)
cost_value = cofiCostFuncV(X, W, b, Ynorm, R,
num_users, num_movies, lambda)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to
the loss
grads = tape.gradient( cost_value, [X,W,b] )
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients( zip(grads, [X,W,b]) )
# Instantiate an optimizer.
optimizer = keras.optimizers.Adam(learning_rate=1e-1)
Dataset credit: Harper and Konstan. 2015. The MovieLens Datasets: History and Context
𝑛E 𝑛F
Collaborative Filtering
Finding related items
Andrew Ng
Finding related items
find item k with 𝒙(𝒌) similar to x(i)
i.e. with smallest
distance
The features x(i) of item i are quite hard to interpret.
𝒙(𝒌) − 𝒙(𝒊) 𝟐
5
8.)
*
𝑥8
(1)
− 𝑥8
(+) &
𝒙(𝒌)
x(i)
To find other items related to it,
Andrew Ng
Limitations of Collaborative Filtering
Cold start problem. How to
• rank new items that few users have rated?
• show something reasonable to new users who have rated few
items?
Use side information about items or users:
• Item: Genre, movie stars, studio, ….
• User: Demographics (age, gender, location), expressed
preferences, …
Content-based Filtering
Collaborative filtering
vs
Content-based filtering
Andrew Ng
Collaborative filtering vs Content-based filtering
Collaborative filtering:
Recommend items to you based on rating of users who
gave similar ratings as you
Content-based filtering:
Recommend items to you based on features of user and item
to find good match
if user j has rated item i
rating given by user j on item i (if defined)
Andrew Ng
Examples of user and item features
Movie features:
• Year
• Genre/Genres
• Reviews
• Average rating
• …
𝐱𝐮
(𝐣)
𝐟𝐨𝐫 𝐮𝐬𝐞𝐫 𝐣
𝐱𝐦
(𝐢)
𝐟𝐨𝐫 𝐦𝐨𝐯𝐢𝐞 𝐢
User features:
• Age
• Gender
• Country
• Movies watched
• Average rating per genre
• …
Vector size
could be
different
Andrew Ng
Content-based filtering: Learning to match
Predict rating of user j on movie i as
computed
from 𝒙𝒖
(𝒋)
computed
from 𝒙𝒎
(𝒊)
Content-based Filtering
Deep learning for
content-based filtering
Andrew Ng
Neural network architecture
User network
128
64 32
Movie network
32
128
256
Prediction :
𝒗𝒖 $ 𝒗𝒎
xu vu
⋮
⋮ ⋮
xm vm
⋮
⋮
⋮
𝒕𝒐 𝒑𝒓𝒆𝒅𝒊𝒄𝒕 𝒕𝒉𝒆 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 that 𝒚(𝒊,𝒋) 𝒊𝒔 𝟏
𝒈
Andrew Ng
Neural network architecture
Cost
function
⋮
𝐽 = 5
+,( :-(+,().)
𝑣!
(() 6 𝑣"
(+) − 𝑦(+,() &
+ NN regularization term
Prediction
vu
xu
⋮
⋮
xm
vm
⋮ ⋮ ⋮
Andrew Ng
Learned user and item vectors:
Note: This can be pre-computed ahead of time
To find movies similar to movie i:
𝒗𝒖
(𝒋)
is a vector of length 32 that describes user j with features 𝒙𝒖
(𝒋)
𝒗𝒎
(𝒊)
is a vector of length 32 that describes movie i with features 𝒙𝒎
(𝒊)
Advanced implementation
Recommending from
a large catalogue
Andrew Ng
vu
xu
⋮
⋮
xm
vm
⋮ ⋮ ⋮
⋮
predictions
How to efficiently find recommendation from
a large set of items?
• Movies 1000+
• Ads 1m+
• Songs 10m+
• Products 10m+
Andrew Ng
Two steps: Retrieval & Ranking
Retrieval:
• Generate large list of plausible item candidates
e.g.
1) For each of the last 10 movies watched by the user,
find 10 most similar movies
2) For most viewed 3 genres, find the top 10 movies
3) Top 20 movies in the country
• Combine retrieved items into list, removing duplicates
and items already watched/purchased
Andrew Ng
Two steps: Retrieval & ranking
• Take list retrieved
and rank using
learned model
• Display ranked items to user
vu
xu
⋮
⋮
xm
vm
⋮ ⋮ ⋮
⋮
predictions
Ranking:
Andrew Ng
Retrieval step
• Retrieving more items results in better performance,
but slower recommendations.
• To analyse/optimize the trade-off, carry out offline experiments
to see if retrieving additional items results in more relevant
recommendations (i.e., 𝑝 𝑦 &,(
= 1 of items displayed to user
are higher).
Advanced implementation
Ethical use of
recommender systems
Andrew Ng
What is the goal of the recommender system?
Recommend:
• Movies most likely to be rated 5 stars by user
• Products most likely to be purchased
• Ads most likely to be clicked on
• Products generating the largest profit
• Video leading to maximum watch time
Andrew Ng
Ethical considerations with recommender systems
Travel industry
More
profitable
Good travel experience
to more users
Bid higher
for ads
Payday loans
Squeeze customers
more
More profit
Bid higher
for ads
Amelioration: Do not accept ads from exploitative businesses
Andrew Ng
Other problematic cases:
• Maximizing user engagement (e.g. watch time) has led to large
social media/video sharing sites to amplify conspiracy theories and
hate/toxicity
Amelioration : Filter out problematic content such as hate speech,
fraud, scams and violent content
• Can a ranking system maximize your profit rather than users’
welfare be presented in a transparent way?
Amelioration : Be transparent with users
Content-based Filtering
TensorFlow Implementation
user_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32)
])
item_NN = tf.keras.models.Sequential([
tf.keras.layers.Dense(256, activation='relu'),
tf.keras.layers.Dense(128, activation='relu'),
tf.keras.layers.Dense(32 )
])
# create the user input and point to the base network
input_user = tf.keras.layers.Input(shape=(num_user_features))
vu = user_NN(input_user)
vu = tf.linalg.l2_normalize(vu, axis=1)
# create the item input and point to the base network
input_item = tf.keras.layers.Input(shape=(num_item_features))
vm = item_NN(input_item)
vm = tf.linalg.l2_normalize(vm, axis=1)
# measure the similarity of the two vector outputs
output = tf.keras.layers.Dot(axes=1)([vu, vm])
# specify the inputs and output of the model
model = Model([input_user, input_item], output)
# Specify the cost function
cost_fn = tf.keras.losses.MeanSquaredError()
vu
vm
Prediction

More Related Content

Similar to C3_W2.pdf

Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Jason Tsai
 
Astronomical data analysis by python.pdf
Astronomical data analysis by python.pdfAstronomical data analysis by python.pdf
Astronomical data analysis by python.pdf
ZainRahim3
 
Refactoring
RefactoringRefactoring
Refactoring
Caike Souza
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorization
hyunsung lee
 
Refactoring Ruby Code
Refactoring Ruby CodeRefactoring Ruby Code
Refactoring Ruby Code
Caike Souza
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015
ihji
 
Python in One Shot.docx
Python in One Shot.docxPython in One Shot.docx
Python in One Shot.docx
DeepakSingh710536
 
Python in one shot
Python in one shotPython in one shot
Python in one shot
Gulshan76
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Simplilearn
 
Model-based GUI testing using UPPAAL
Model-based GUI testing using UPPAALModel-based GUI testing using UPPAAL
Model-based GUI testing using UPPAAL
Ulrik Hørlyk Hjort
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
NAVER Engineering
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4j
Max De Marzi
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
Kazuki Fujikawa
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
NAVER Engineering
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Big_Data_Ukraine
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Jack Clark
 
Gan seminar
Gan seminarGan seminar
Gan seminar
San Kim
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With R
David Chiu
 
Raccomender engines
Raccomender enginesRaccomender engines
Raccomender engines
Alessio Palma
 
Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to Gremlin
Max De Marzi
 

Similar to C3_W2.pdf (20)

Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
Astronomical data analysis by python.pdf
Astronomical data analysis by python.pdfAstronomical data analysis by python.pdf
Astronomical data analysis by python.pdf
 
Refactoring
RefactoringRefactoring
Refactoring
 
Catching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorizationCatching co occurrence information using word2vec-inspired matrix factorization
Catching co occurrence information using word2vec-inspired matrix factorization
 
Refactoring Ruby Code
Refactoring Ruby CodeRefactoring Ruby Code
Refactoring Ruby Code
 
Yoyak ScalaDays 2015
Yoyak ScalaDays 2015Yoyak ScalaDays 2015
Yoyak ScalaDays 2015
 
Python in One Shot.docx
Python in One Shot.docxPython in One Shot.docx
Python in One Shot.docx
 
Python in one shot
Python in one shotPython in one shot
Python in one shot
 
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
Machine Learning Tutorial Part - 1 | Machine Learning Tutorial For Beginners ...
 
Model-based GUI testing using UPPAAL
Model-based GUI testing using UPPAALModel-based GUI testing using UPPAAL
Model-based GUI testing using UPPAAL
 
그림 그리는 AI
그림 그리는 AI그림 그리는 AI
그림 그리는 AI
 
Windy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4jWindy City DB - Recommendation Engine with Neo4j
Windy City DB - Recommendation Engine with Neo4j
 
NIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph ConvolutionNIPS2017 Few-shot Learning and Graph Convolution
NIPS2017 Few-shot Learning and Graph Convolution
 
Learning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World ProblemsLearning from Computer Simulation to Tackle Real-World Problems
Learning from Computer Simulation to Tackle Real-World Problems
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAIDeep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
Deep Reinforcement Learning Through Policy Optimization, John Schulman, OpenAI
 
Gan seminar
Gan seminarGan seminar
Gan seminar
 
Machine Learning With R
Machine Learning With RMachine Learning With R
Machine Learning With R
 
Raccomender engines
Raccomender enginesRaccomender engines
Raccomender engines
 
Introduction to Gremlin
Introduction to GremlinIntroduction to Gremlin
Introduction to Gremlin
 

Recently uploaded

SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
HajraNaeem15
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
RamseyBerglund
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
nitinpv4ai
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
zuzanka
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
ImMuslim
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Henry Hollis
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
Himanshu Rai
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
Nguyen Thanh Tu Collection
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
iammrhaywood
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
MysoreMuleSoftMeetup
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
RidwanHassanYusuf
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
siemaillard
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
Krassimira Luka
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 

Recently uploaded (20)

SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
How to deliver Powerpoint Presentations.pptx
How to deliver Powerpoint  Presentations.pptxHow to deliver Powerpoint  Presentations.pptx
How to deliver Powerpoint Presentations.pptx
 
Electric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger HuntElectric Fetus - Record Store Scavenger Hunt
Electric Fetus - Record Store Scavenger Hunt
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)Oliver Asks for More by Charles Dickens (9)
Oliver Asks for More by Charles Dickens (9)
 
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptxRESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
RESULTS OF THE EVALUATION QUESTIONNAIRE.pptx
 
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
Geography as a Discipline Chapter 1 __ Class 11 Geography NCERT _ Class Notes...
 
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.pptLevel 3 NCEA - NZ: A  Nation In the Making 1872 - 1900 SML.ppt
Level 3 NCEA - NZ: A Nation In the Making 1872 - 1900 SML.ppt
 
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem studentsRHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
RHEOLOGY Physical pharmaceutics-II notes for B.pharm 4th sem students
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
 
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptxNEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
NEWSPAPERS - QUESTION 1 - REVISION POWERPOINT.pptx
 
Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47Mule event processing models | MuleSoft Mysore Meetup #47
Mule event processing models | MuleSoft Mysore Meetup #47
 
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptxBIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
BIOLOGY NATIONAL EXAMINATION COUNCIL (NECO) 2024 PRACTICAL MANUAL.pptx
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptxPrésentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
Présentationvvvvvvvvvvvvvvvvvvvvvvvvvvvv2.pptx
 
Temple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation resultsTemple of Asclepius in Thrace. Excavation results
Temple of Asclepius in Thrace. Excavation results
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 

C3_W2.pdf

  • 1. Copyright Notice These slides are distributed under the Creative Commons License. DeepLearning.AI makes these slides available for educational purposes. You may not use or distribute these slides for commercial purposes. You may make copies of these slides and use or distribute them for educational purposes as long as you cite DeepLearning.AI as the source of the slides. For the rest of the details of the license, see https://creativecommons.org/licenses/by-sa/2.0/legalcode
  • 4. Andrew Ng Predicting movie ratings User rates movies using one to five stars Ratings = no. of users nu = no. of movies nm r(i,j)=1 if user j has rated movie i y(i,j) = rating given by user j to movie i (defined only if r(i,j)=1) 𝑛! = 4 𝑛" = 5 𝑟(1,1) = 1 𝑟(3,1) = 0 𝑦($,&) = 4 Movie Alice(1) Bob(2) Carol(3) Dave(4) Love at last Romance forever Cute puppies of love Nonstop car chases Swords vs. karate 5 5 ? 0 0 5 ? 4 0 0 0 ? 0 5 5 0 0 ? 4 ?
  • 6. Andrew Ng What if we have features of the movies? Movie Alice(1) Bob(2) Carol(3) Dave(4) Love at last 5 5 0 0 Romance forever 5 ? ? 0 Cute puppies of love ? 4 0 ? Nonstop car chases 0 0 5 4 Swords vs. karate 0 0 5 ? For user j: Predict user j’s rating for movie i as For user 1: Predict rating for movie i as: x1 (romance) x2 (action) 0.9 0 1.0 0.01 0.99 0 0.1 1.0 0 0.9 just linear regression 𝑛! = 4 𝑛" = 5 𝑛 = 2 w(1) = ! " 𝑏($) = 0 x(3) = ".' " w(1) $ x(3) + b(1) = 4.95 w(j) $ x(i) + b(j) 𝑥($) = 0.9 0 𝑥(() = 0.99 0 w · x(i) b + (1) (1)
  • 7. Andrew Ng Notation: r(i,j) = 1 if user j has rated movie i (0 otherwise) y(i,j) = rating given by user j on movie i (if defined) w(j) , b(j) = parameters for user j x(i) = feature vector for movie i For user j and movie i, predict rating: w(j) ! x(i) + b(j) m(j) = no. of movies rated by user j To learn w(j), b(j) 1 2𝑚()) + *:,(*,)).$ 𝑤()) $ 𝑥(*) + 𝑏()) − 𝑦(*,)) / J 𝑤 ( , 𝑏 ( = min !(")"(") Cost function + λ 2𝑚($) + &'( ) 𝑤& ($) *
  • 8. Andrew Ng Cost function To learn parameters 𝑤((), 𝑏(() for user j : To learn parameters 𝑤()), 𝑏()), 𝑤(&), 𝑏(&), ⋯ 𝑤 *! , 𝑏 *! for all users : 1 2 + +:-(+,$) 𝑤($) $ 𝑥(+) + 𝑏($) − 𝑦(+,$) * + λ 2 + &'( ) 𝑤& ($) * 1 2 + $'( )$ + +:- +,$ '( 𝑤($) $ 𝑥(+) +𝑏($) − 𝑦(+,$) * + λ 2 + $'( )$ + &'( ) 𝑤& ($) * J 𝑤()), … , 𝑤 *! 𝑏()), … , 𝑏 *! = J 𝑤 ( , 𝑏 ( = 𝑓(𝑥)
  • 10. Problem motivation Movie Alice (1) Bob (2) Carol (3) Dave (4) x1 (romance) x2 (action) Love at last 5 5 0 0 0.9 0 Romance forever 5 ? ? 0 1.0 0.01 Cute puppies of love ? 4 0 ? 0.99 0 Nonstop car chases 0 0 5 4 0.1 1.0 Swords vs. karate 0 0 5 ? 0 0.9
  • 11. Problem motivation 𝑤(#)% 5 0 𝑏(#) = 0 , 𝑤(&)% 5 0 , 𝑏(&) = 0 , 𝑤(')% 0 5 , 𝑏(') = 0 , 𝑤(()% 0 5 , 𝑏(() = 0 𝑥(#) = 1 0 using 𝒘(𝒋)! 𝒙(𝒊) + 𝒃(𝒋) → 𝑤(#) ! 𝑥(#) ≈ 5 𝑤(&) ! 𝑥(#) ≈ 5 𝑤(') ! 𝑥(#) ≈ 0 𝑤(() ! 𝑥(#) ≈ 0 Movie Alice (1) Bob (2) Carol (3) Dave (4) x1 (romance) x2 (action) Love at last 5 5 0 0 ? ? Romance forever 5 ? ? 0 ? ? Cute puppies of love ? 4 0 ? ? ? Nonstop car chases 0 0 5 4 ? ? Swords vs. karate 0 0 5 ? ? ? 𝑥(") 𝑥($)
  • 12. Andrew Ng Cost function Given 𝑤("), 𝑏("), 𝑤($), 𝑏($), ⋯ , 𝑤 %$ , 𝑏 %$ To learn 𝑥(") , 𝑥($) , ⋯ , 𝑥 %% : 1 2 + &'" %% + (:* &,( '" 𝑤(() $ 𝑥(&) + 𝑏(() − 𝑦(&,() $ + λ 2 + &'" %% + ,'" % 𝑥, (&) $ to learn 𝑥 & : J 𝑥 + = J 𝑥()), 𝑥 & , … , 𝑥 *+ = 1 2 5 (:- +,( .) 𝑤(() 6 𝑥(+) + 𝑏(() − 𝑦(+,() & + ,'" % 𝑥, (&) $ + λ 2
  • 13. Andrew Ng Collaborative filtering Cost function to learn 𝑥 " , ⋯ , 𝑥(%%) : Cost function to learn 𝑤("), 𝑏("), ⋯ 𝑤 %$ , 𝑏 %$ : min ,(%),.(%), ⋯, , &' ,. &' 1 2 3 0%# 1' 3 2:4 2,0 %# 𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) & + λ 2 3 0%# 1' 3 5%# 1 𝑤5 (0) & min 6(%), ⋯, 6 (&() 1 2 3 2%# 1( 3 0:4 2,0 %# 𝑤(0) ! 𝑥(2) +𝑏(0) − 𝑦(2,0) & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & Put them together: min ,(%), …, , &' .(%), …, . &' 6(%), …, 6 &( 1 2 3 2,0 :4 2,0 %# 𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) & + λ 2 3 0%# 1' 3 5%# 1 𝑤5 0 & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & 𝐽 𝑤, 𝑏, 𝑥 = Alice Bob Carol Movie1 5 5 ? Movie2 ? 2 3 𝑖 = 1 𝑖 = 2 j=1 j=2 j=3
  • 14. Andrew Ng Gradient Descent Linear regression (course 1) repeat { 𝑤+ = 𝑤+ − 𝛼 < <=> 𝐽 𝑤, 𝑏 𝑏 = 𝑏 − 𝛼 / /0 𝐽 𝑤, 𝑏 } 𝑏(() = 𝑏(() − 𝛼 / /0 ? 𝐽(𝑤, 𝑏, 𝑥) 𝑥1 (+) = 𝑥1 (+) − 𝛼 / /2@ (>) J(w,b,x) w, b, x 𝑤+ (() = 𝑤+ (() − 𝛼 / /3> (?) 𝐽 𝑤, 𝑏, 𝑥
  • 16. Andrew Ng Binary labels Movie Alice(1) Bob(2) Carol(3) Dave(4) Love at last 1 1 0 0 Romance forever 1 ? ? 0 Cute puppies of love ? 1 0 ? Nonstop car chases 0 0 1 1 Swords vs. karate 0 0 1 ?
  • 17. Andrew Ng Example applications 1. Did user j purchase an item after being shown? 2. Did user j fav/like an item? 3. Did user j spend at least 30sec with an item? 4. Did user j click on an item? Meaning of ratings: 1 - engaged after being shown item 0 - did not engage after being shown item ? - item not yet shown
  • 18. Andrew Ng From regression to binary classification Previously: Predict 𝒚 𝒊,𝒋 as 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋) For binary labels: Predict that the probability of 𝒚 𝒊,𝒋 = 𝟏 is given by 𝒘(𝒋) 6 𝒙(𝒊) + 𝒃(𝒋) where g 𝑧 = 9 9:;"# g
  • 19. Andrew Ng Cost function for binary application Previous cost function: Loss for binary labels 𝑦(+,(): Loss for single example cost for all examples 3 2,0 :4 2,0 %# 𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & + λ 2 3 0%# 1' 3 5%# 1 𝑤5 0 & 1 2 𝐽 𝑤, 𝑏, 𝑥 = + (&,():* &,( '" 𝐿 𝑓(-,.,/) 𝑥 , 𝑦 &,( 𝑓(3,0,2) 𝑥 = 𝑔(𝑤 ( 6 𝑥 + + 𝑏 ( ) 𝐿 𝑓(,,.,6) 𝑥 , 𝑦 2,0 = − 𝑦 2,0 log 𝑓(,,.,6) 𝑥 − 1 − 𝑦 2,0 log 1 − 𝑓(,,.,6) 𝑥 𝑔(𝑤(0) ! 𝑥(2) + 𝑏(0))
  • 21. Users who have not rated any movies Movie Alice(1) Bob (2) Carol (3) Dave (4) Eve (5) Love at last 5 5 0 0 ? Romance forever 5 ? ? 0 ? Cute puppies of love ? 4 0 ? ? Nonstop car chases 0 0 5 4 ? Swords vs. karate 0 0 5 ? ? 1 2 + λ 2 3 0%# 1' 3 5%# 1 𝑤5 0 & + λ 2 3 2%# 1( 3 5%# 1 𝑥5 (2) & 3 2,0 :4 2,0 %# 𝑤(0) ! 𝑥(2) + 𝑏(0) − 𝑦(2,0) & 𝐦𝐢𝐧 𝒘(𝟏), ….𝒘 𝒏𝒖 𝒃(𝟏), ….𝒃 𝒏𝒖 𝒙(𝟏), ….𝒙 𝒏𝒎 5 5 ? 0 0 5 ? 4 0 0 0 ? 0 5 5 0 0 ? 4 0 ? ? ? ? ?
  • 22. Mean Normalization For user j, on movie i predict: User 5 (Eve): 5 5 ? 0 0 5 ? 4 0 0 0 ? 0 5 5 0 0 ? 4 0 ? ? ? ? ? 2.5 2.5 2 2.25 1.25 𝜇 = 2.5 2.5 ? −2.25 −1.25 2.5 ? 2 −2.25 −1.25 −2.5 ? −2 2.75 3.75 −2.5 −2.5 ? 1.75 −1.25 ? ? ? ? ? + 𝜇+ + 𝜇
  • 24. Andrew Ng Gradient descent algorithm Learning rate Derivative Repeat until convergence 0 1 2 3 -0.5 0 0.5 1 1.5 2 2.5 Derivatives in ML
  • 25. Andrew Ng Custom Training Loop w = tf.Variable(3.0) x = 1.0 y = 1.0 # target value alpha = 0.01 iterations = 30 for iter in range(iterations): # Use TensorFlow’s Gradient tape to record the steps # used to compute the cost J, to enable auto differentiation. with tf.GradientTape() as tape: fwb = w*x costJ = (fwb - y)**2 # Use the gradient tape to calculate the gradients # of the cost with respect to the parameter w. [dJdw] = tape.gradient( costJ, [w] ) # Run one step of gradient descent by updating # the value of w to reduce the cost. w.assign_add(-alpha * dJdw) Fix b = 0 for this example tf.variables require special function to modify Tf.variables are the parameters we want to optimize 𝑱 = (𝒘𝒙 − 𝟏)𝟐 f(x) y 𝝏 𝝏𝒘 𝑱(𝒘) f(x) Auto Diff Auto Grad
  • 26. Andrew Ng Implementation in TensorFlow Gradient descent algorithm Repeat until convergence iterations = 200 for iter in range(iterations): # Use TensorFlow’s GradientTape # to record the operations used to compute the cost with tf.GradientTape() as tape: # Compute the cost (forward pass is included in cost) cost_value = cofiCostFuncV(X, W, b, Ynorm, R, num_users, num_movies, lambda) # Use the gradient tape to automatically retrieve # the gradients of the trainable variables with respect to the loss grads = tape.gradient( cost_value, [X,W,b] ) # Run one step of gradient descent by updating # the value of the variables to minimize the loss. optimizer.apply_gradients( zip(grads, [X,W,b]) ) # Instantiate an optimizer. optimizer = keras.optimizers.Adam(learning_rate=1e-1) Dataset credit: Harper and Konstan. 2015. The MovieLens Datasets: History and Context 𝑛E 𝑛F
  • 28. Andrew Ng Finding related items find item k with 𝒙(𝒌) similar to x(i) i.e. with smallest distance The features x(i) of item i are quite hard to interpret. 𝒙(𝒌) − 𝒙(𝒊) 𝟐 5 8.) * 𝑥8 (1) − 𝑥8 (+) & 𝒙(𝒌) x(i) To find other items related to it,
  • 29. Andrew Ng Limitations of Collaborative Filtering Cold start problem. How to • rank new items that few users have rated? • show something reasonable to new users who have rated few items? Use side information about items or users: • Item: Genre, movie stars, studio, …. • User: Demographics (age, gender, location), expressed preferences, …
  • 31. Andrew Ng Collaborative filtering vs Content-based filtering Collaborative filtering: Recommend items to you based on rating of users who gave similar ratings as you Content-based filtering: Recommend items to you based on features of user and item to find good match if user j has rated item i rating given by user j on item i (if defined)
  • 32. Andrew Ng Examples of user and item features Movie features: • Year • Genre/Genres • Reviews • Average rating • … 𝐱𝐮 (𝐣) 𝐟𝐨𝐫 𝐮𝐬𝐞𝐫 𝐣 𝐱𝐦 (𝐢) 𝐟𝐨𝐫 𝐦𝐨𝐯𝐢𝐞 𝐢 User features: • Age • Gender • Country • Movies watched • Average rating per genre • … Vector size could be different
  • 33. Andrew Ng Content-based filtering: Learning to match Predict rating of user j on movie i as computed from 𝒙𝒖 (𝒋) computed from 𝒙𝒎 (𝒊)
  • 34. Content-based Filtering Deep learning for content-based filtering
  • 35. Andrew Ng Neural network architecture User network 128 64 32 Movie network 32 128 256 Prediction : 𝒗𝒖 $ 𝒗𝒎 xu vu ⋮ ⋮ ⋮ xm vm ⋮ ⋮ ⋮ 𝒕𝒐 𝒑𝒓𝒆𝒅𝒊𝒄𝒕 𝒕𝒉𝒆 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒚 that 𝒚(𝒊,𝒋) 𝒊𝒔 𝟏 𝒈
  • 36. Andrew Ng Neural network architecture Cost function ⋮ 𝐽 = 5 +,( :-(+,().) 𝑣! (() 6 𝑣" (+) − 𝑦(+,() & + NN regularization term Prediction vu xu ⋮ ⋮ xm vm ⋮ ⋮ ⋮
  • 37. Andrew Ng Learned user and item vectors: Note: This can be pre-computed ahead of time To find movies similar to movie i: 𝒗𝒖 (𝒋) is a vector of length 32 that describes user j with features 𝒙𝒖 (𝒋) 𝒗𝒎 (𝒊) is a vector of length 32 that describes movie i with features 𝒙𝒎 (𝒊)
  • 39. Andrew Ng vu xu ⋮ ⋮ xm vm ⋮ ⋮ ⋮ ⋮ predictions How to efficiently find recommendation from a large set of items? • Movies 1000+ • Ads 1m+ • Songs 10m+ • Products 10m+
  • 40. Andrew Ng Two steps: Retrieval & Ranking Retrieval: • Generate large list of plausible item candidates e.g. 1) For each of the last 10 movies watched by the user, find 10 most similar movies 2) For most viewed 3 genres, find the top 10 movies 3) Top 20 movies in the country • Combine retrieved items into list, removing duplicates and items already watched/purchased
  • 41. Andrew Ng Two steps: Retrieval & ranking • Take list retrieved and rank using learned model • Display ranked items to user vu xu ⋮ ⋮ xm vm ⋮ ⋮ ⋮ ⋮ predictions Ranking:
  • 42. Andrew Ng Retrieval step • Retrieving more items results in better performance, but slower recommendations. • To analyse/optimize the trade-off, carry out offline experiments to see if retrieving additional items results in more relevant recommendations (i.e., 𝑝 𝑦 &,( = 1 of items displayed to user are higher).
  • 43. Advanced implementation Ethical use of recommender systems
  • 44. Andrew Ng What is the goal of the recommender system? Recommend: • Movies most likely to be rated 5 stars by user • Products most likely to be purchased • Ads most likely to be clicked on • Products generating the largest profit • Video leading to maximum watch time
  • 45. Andrew Ng Ethical considerations with recommender systems Travel industry More profitable Good travel experience to more users Bid higher for ads Payday loans Squeeze customers more More profit Bid higher for ads Amelioration: Do not accept ads from exploitative businesses
  • 46. Andrew Ng Other problematic cases: • Maximizing user engagement (e.g. watch time) has led to large social media/video sharing sites to amplify conspiracy theories and hate/toxicity Amelioration : Filter out problematic content such as hate speech, fraud, scams and violent content • Can a ranking system maximize your profit rather than users’ welfare be presented in a transparent way? Amelioration : Be transparent with users
  • 48. user_NN = tf.keras.models.Sequential([ tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(32) ]) item_NN = tf.keras.models.Sequential([ tf.keras.layers.Dense(256, activation='relu'), tf.keras.layers.Dense(128, activation='relu'), tf.keras.layers.Dense(32 ) ]) # create the user input and point to the base network input_user = tf.keras.layers.Input(shape=(num_user_features)) vu = user_NN(input_user) vu = tf.linalg.l2_normalize(vu, axis=1) # create the item input and point to the base network input_item = tf.keras.layers.Input(shape=(num_item_features)) vm = item_NN(input_item) vm = tf.linalg.l2_normalize(vm, axis=1) # measure the similarity of the two vector outputs output = tf.keras.layers.Dot(axes=1)([vu, vm]) # specify the inputs and output of the model model = Model([input_user, input_item], output) # Specify the cost function cost_fn = tf.keras.losses.MeanSquaredError() vu vm Prediction